librdfa-rdf4j
Description
librdfa-rdf4j is a RDFParser that is built on top of librdfa, which claims to be the fastest RDFa processor. The librdfa processor is written in C for XML and HTML languages. In order to connect Java code with the C processor, the RDF4J parser uses SWIG. In project folder, there is an annotated C header file that creates wrapper code, making available the librdfa library in Java. If you want to see the C code developed for the bridge between librdfa and Java, please check the c folder of the parser.
In order to use the parser, you can just add the dependency as a normal maven project. Later on, you can use the Rio API when providing the RDFFormat.RDFA.
Performance
There is a benchmark on the test folder of the project. We compare librdfa-rdf4j with Semargl. We can compare the performance in terms of number of triples. Here are some conclusions that we have observed:
- For a small number of triples (<1000), librdfa-rdf4j is faster. Results with 900 triples: Semargl 2.70 ms; 1.94 ms.
- For a mid size number of triples (>1000, <5000), Semargl is faster but there is a small difference. Results with 3000 triples: Semargl 3.96 ms; librdfa-rdf4j 4.13 ms.
- For a big number of triples (>5000), Semargl is faster. Results with 15000: Semargl 31.31 ms; librdfa-rdf4j 45.97 ms.
In general, librdfa is faster than Semargl, but there is some slowness because of the implementation of Rio. Rio loads the dataset into an InputStream before parsing it. However, librdfa parses the triples as they arrive. As a result of this, librdfa-rdf4j first needs to load the dataset into an InputStream and later send the data the C buffer through the Java-C bridge.
Requirements
librdfa-rdf4j uses librdfa library. So, librdfa needs to be installed beforehand. Please follow the installation steps in the librdfa repository.
In general, you need to clone the repository
git clone https://github.com/rdfa/librdfa
And install the library (make sure to have all the libraries that librdfa uses).
./autogen.sh ./configure make make install
Building from source
In order to compile librdfa-rdf4j, change into the source directory and execute install using maven.
mvn clean install
Install
You can install libdrfa-rdf4j adding the following maven dependency (make sure to have installed the the librdfa library):
<dependency> <groupId>org.apache.any23</groupId> <artifactId>apache-any23-librdfa</artifactId> <version>${librdfa.rdf4j.version}</version> </dependency>
Use
Once you have installed librdfa-rdf4j, you can use the parser with the Rio API. For example:
RDFParser rdfParser = Rio.createParser(RDFFormat.RDFA); Model model = new LinkedHashModel(); rdfParser.setRDFHandler(new StatementCollector(model)); rdfParser.parse(in, "http://www.example.org./");