Table of Contents |
---|
Documentation
All the documentation of this project lives in Librdfa-rdf4j documentation.
Code
The code resulting as part of GSoC 2018 lives in GitHub in the repository of the ASF.
Proposal: ANY23-295 Implement ability to use librdfa
Description
In 2012, the Any23 community decided to migrate from its own RDFa parser implementation to Semargl[1] as discussed in [2]. Semargl is a modular framework for crawling linked data from structured documents [9] which provides a RDFa parser compatible with RDF4J through an integration module [3]. Since that issue [4] was closed, Semargl turned into the official RDFa parser for Any23.
...
In this context, the present proposal aims to accomplish the aforementioned objective and provide an seamless integration between Any23 and librdfa parser, which allows to conduct a fair performance comparison between Semargl and librdfa within Any23.
Student
Mentor
JIRA Issue
https://issues.apache.org/jira/browse/ANY23-295
Proposal Title : Integrate and evaluate librdfa RDFa parser into Any23 via JNI (Java Native Interface) [10].
...
JIRA Issues: https://issues.apache.org/jira/browse/ANY23-295 Implement ability to use librdfa
Project Deliverables
New standalone module with a new RDFa parser compatible with RDF4J using librdfa.
JNI bridge to librdfa including interfaces and middleware utilities.
Unit tests for the new librdfa module
Benchmark tests comparing Semargl and librdfa.
Self-maintaining Any23 Website documentation which will vizualize integration test results in addition to Any23 compliance against the http://rdfa.info/test-suite/
Detailed description
Scope for the project
This project will be involved in the implementation of a new RDFa parser for Any23, which serves as a wrapper for the librdfa library. The project will also include a evaluation phase for measuring the improvements or drawbacks of using such parser as the main Any23 RDFa processor.
Design
The implementation process will rely on the pre existing parsers infrastructure of Any23 which is provided by RDF4J and will use JNI as integration mechanisms for librdfa. The development of the project will be divided in three phases.
Implementation Approach
Bridge: This phase will tackle the communication issues between Any23 and librdfa and will be mainly focused on:
...
Finally, it is worth to mention that every component coded during each phase will be accompanied with corresponding documentation in the Wiki [8].
Time Frame
Time Period | Expected Outcome |
---|---|
March 01 - April 23 | Understanding the task and preparing proposal |
April 24 - April 30 | Community bounding |
May 01 - June 10 | Phase 1: Bridge. |
June 11 - June 15 | GSoC Evaluation 1 |
June 16 - June 29 | Phase 2: Wrapper. |
June 30 - July 8 | Phase 3: Evaluation |
July 9 - July 13 | GSoC Evaluation 2. |
July 14 - July 25 | Camera-ready documentation and sharing results with the community. |
July 26 - August 5 | Receive feedback and fix minor issues. |
August 6 - August 14 | GSoC Final evaluation. |
About Myself
I am Julio Caguano an undergraduate student of Computer Science at the University of Cuenca in Ecuador, I’m currently in my final year of college.
...
My main motivation for applying this project was my background on Linked Data technologies, because I used Any23 in the past and I liked how it works. Also, I found the code pretty comprehensible and readable. On top of that, I personally always liked integration challenges I found them interesting because you have to push yourself out of your comfort zone and learn new technologies and how to interact with them.
Commitment
I estimate I could assign 25 hours per week to this project during the coding period (Including weekends and midweek free time). Nevertheless, It could be increased depending on the progress of the project or suggestions of my mentor. I would split my time into my studies and this project, which hopefully will not be a problem taking into consideration that the project will take place at the beginning of my school semester when the assignments load is small. In addition, I will be posting a weekly report on the GSoC section of the project´s wiki in order to share my progress in the planned tasks.
References
[1] https://github.com/semarglproject/semargl
...
[10] https://es.wikipedia.org/wiki/Java_Native_Interface
Project Reports
29/04/2018
Project description
Getting started with Any23 source code and start working on Any23-231.
Review of Previous Actions
N/A
Objectives
Currently, Any23 REST API for JSON has some issues regarding indentation and syntax. Make JSON Reporting output pretty print.
Future Actions
Discuss with the community and create a patch for the issue.
Mentors Comments
06/05/2018
Project description
Working on Any23-231 and add new output format (JSON-LD) for Any23.
Review of Previous Actions
N/A
Objectives
Submit PR to Any23-231, Fix formatting issues in the JSON Writer and add a JSON-LD Writer.
Future Actions
Close Any23-231
Mentors Comments
13/05/2018
Project description
Close Any23-231 and research about tools for JNI development.
Review of Previous Actions
N/A
Objectives
Update Any23 documentation about JSON-LD format, close Any23-231 and research about JNI tools : SWIG, JAVA CPP.
Future Actions
Start working on the Librdfa - Any23 bridge.
Mentors Comments
20/05/2018
Project description
Understanding of JNI and investigate maven to build automation.
Review of Previous Actions
Any23-231 was merged with the development branch.
Objectives
Get a broad understanding of JNI and make a small example to see the interaction Java/C. In addition, JNI makes some calls that must be executed in console, this has to be automated with maven in order to interact with the actual pipeline that Any23 uses.
Future Actions
Build and install librdfa. Choose a tool for the bridge between librdfa and any23; we can use JNI, JNA, JAVA CPP, or SWIG.
Mentors Comments
27/05/2018
Project description
Build and install librdfa.
Review of Previous Actions
N/A
Objectives
Install librdfa and familiarise with the code base. Construct a small C program to interact with the pipeline used in librdfa to parse a XTML file into triples.
Future Actions
Start working on the Librdfa - Any23 bridge.
Mentors Comments
03/06/2018
Project description
Use JNI for communication between librdfa and any23
Review of Previous Actions
N/A
Objectives
Work in the communication of librdfa and any23
Future Actions
Connect Any23 with librdfa.
Mentors Comments
10/06/2018
Project description
Develop callbacks to interact between C (librdfa) and Java
Review of Previous Actions
N/A
Objectives
Implement basic interfaces Java/C.
Future Actions
Integrate librdfa bridge with any23
Mentors Comments
17/06/2018
Project description
Version of bridge between Java/C
Review of Previous Actions
N/A
Objectives
Find a work around of complex types .
Future Actions
Integrate code with maven
Mentors Comments
24/06/2018
Project description
Integrate project with maven build
Review of Previous Actions
N/A
Objectives
Make a build pipeline with maven and ease the integration with any23 .
Future Actions
Integrate code with any23
Mentors Comments
01/07/2018
Project description
Implement librdfa with Rio
Review of Previous Actions
N/A
Objectives
Use the default API for parsing RDF in RDF4J..
Future Actions
Implement module for parsing RDFa in RDF4J
Mentors Comments
08/07/2018
Project description
Implement librdfa with Rio, tests, and benchmarking
Review of Previous Actions
I found some memory issues that I am still working on.
Objectives
Use the default API for parsing RDF in RDF4J.
Future Actions
Integrate librdfa-rdf4j module with any23. More tests need to be added and a broader benchmarking analysis is needed. I am using semargl-rdf4j as baseline.
Mentors Comments
15/07/2018
Project description
Extractor for librdfa
Review of Previous Actions
N/A
Objectives
Integrate any23 with librdfa-rdf4j
Future Actions
Generate tests for any23 new extractor and test all functionality
Mentors Comments
22/07/2018
Project description
Integration of librdfa-rdf4j with ANY23
Review of Previous Actions
testBasic() is failing, it needs to be fixed.
Objectives
Generate tests for any23 new extractor and test all functionality
Future Actions
Complete integration and make librdfa extrator configurable.
Mentors Comments
29/07/2018
Project description
I fixed a memory problem that I found out in librdfa-rdf4j while making the tests. Also, I added the tests of RDFa 1.0/1.1 extractors since librdfa supports both. Finally, the librdfa extractor is configurable.
Review of Previous Actions
Objectives
Complete integration and make librdfa extrator configurable.
Future Actions
Write documentation
Mentors Comments
05/08/2018
Project description
Write documentation and provide a final PR. The PR includes the librdfa extractor and the bridge between librdfa and java.
Review of Previous Actions
Objectives
Write documentation and provide final pull request.
Future Actions
Make changes according to mentor suggestions.
Mentors Comments
12/08/2018
Project description
Submit final suggestions reviewed by my mentor. Correction of code and documentation according to mentor suggestions that I will be providing as a result of GSoC.
Review of Previous Actions
Objectives
Make changes according to mentor suggestions.
Future Actions
Mentors Comments