You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

librdfa-rdf4j

Description

librdfa-rdf4j is a RDFParser that is built on top of librdfa, which claims to be the fastest RDFa processor. The librdfa processor is written in C for XML and HTML languages. In order to connect Java code with the C processor, the RDF4J parser uses SWIG. In project folder, there is an annotated C header file that creates wrapper code, making available the librdfa library in Java. If you want to see the C code developed for the bridge between librdfa and Java, please check the c folder of the parser.

In order to use the parser, you can just add the dependency as a normal maven project. Later on, you can use the Rio API when providing the RDFFormat.RDFA.

Performance

There is a benchmark on the test folder of the project. We compare librdfa-rdf4j with Semargl. We can compare the performance in terms of number of triples. Here are some conclusions that we have observed:

  • For a small number of triples (<1000),  librdfa-rdf4j is faster. Results with 900 triples: Semargl 2.70 ms;  1.94 ms.
  • For a mid size number of triples (>1000, <5000), Semargl is faster but there is a small difference. Results with 3000 triples: Semargl 3.96 ms; librdfa-rdf4j 4.13 ms.
  • For a big number of triples (>5000),  Semargl is faster. Results with 15000: Semargl 31.31 ms; librdfa-rdf4j 45.97 ms.

In general, librdfa is faster than Semargl, but there is some slowness because of the implementation of Rio. Rio loads the dataset into an InputStream before parsing it. However, librdfa parses the triples as they arrive. As a result of this, librdfa-rdf4j first needs to load the dataset into an InputStream and later send the data the C buffer through the Java-C bridge.

Requirements

librdfa-rdf4j uses librdfa library. So, librdfa needs to be installed beforehand. Please follow the installation steps in the librdfa repository.

In general, you need to clone the repository

git clone https://github.com/rdfa/librdfa


Building from source

Install

Use

librdfa extractor

  • No labels