Table of Contents | ||
---|---|---|
|
librdfa-rdf4j
Description
librdfa-rdf4j is a RDFParser that is built on top of librdfa, which claims to be the fastest RDFa processor. The librdfa processor is written in C for XML and HTML languages. In order to connect Java code with the C processor, the RDF4J parser uses SWIG. In project folder, there is an annotated C header file that creates wrapper code, making available the librdfa library in Java. If you want to see the C code developed for the bridge between librdfa and Java, please check the c folder of the parser.
In order to use the parser, you can just add the dependency as a normal maven project. Later on, you can use the Rio API when providing the RDFFormat.RDFA.
Performance
There is a benchmark on the test folder of the project. We compare librdfa-rdf4j with Semargl. We can compare the performance in terms of number of triples. Here are some conclusions that we have observed:
- For a small number of triples (<1000), librdfa-rdf4j is faster. Results with 900 triples: Semargl 2.70 ms; 1.94 ms.
- For a mid size number of triples (>1000, <5000), Semargl is faster but there is a small difference. Results with 3000 triples: Semargl 3.96 ms; librdfa-rdf4j 4.13 ms.
- For a big number of triples (>5000), Semargl is faster. Results with 15000: Semargl 31.31 ms; librdfa-rdf4j 45.97 ms.
In general, librdfa is faster than Semargl, but there is some slowness because of the implementation of Rio. Rio loads the dataset into an InputStream before parsing it. However, librdfa parses the triples as they arrive. As a result of this, librdfa-rdf4j first needs to load the dataset into an InputStream and later send the data the C buffer through the Java-C bridge.
Requirements
librdfa-rdf4j uses librdfa library. So, librdfa needs to be installed beforehand. Please follow the installation steps in the librdfa repository.
In general, you need to clone the repository
Code Block | ||
---|---|---|
| ||
git clone https://github.com/rdfa/librdfa |
And install the library (make sure to have all the libraries that librdfa uses).
Code Block | ||
---|---|---|
| ||
./autogen.sh
./configure
make
make install |
Building from source
In order to compile librdfa-rdf4j, change into the source directory and execute install using maven.
Code Block |
---|
mvn clean install |
Install
You can install libdrfa-rdf4j adding the following maven dependency (make sure to have installed the the librdfa library):
Code Block | ||
---|---|---|
| ||
<dependency>
<groupId>org.apache.any23</groupId>
<artifactId>apache-any23-librdfa</artifactId>
<version>${librdfa.rdf4j.version}</version>
</dependency> |
Use
Once you have installed librdfa-rdf4j, you can use the parser with the Rio API. For example:
Code Block | ||
---|---|---|
| ||
RDFParser rdfParser = Rio.createParser(RDFFormat.RDFA);
Model model = new LinkedHashModel();
rdfParser.setRDFHandler(new StatementCollector(model));
rdfParser.parse(in, "http://www.example.org./"); |
librdfa extractor
Any23 uses by default Semargl with the standard RDFa 1.1. However, you can change it setting the property any23.extraction.rdfa.programmatic to off to use Semargl with the standard RDFa 1.0. And in order to use librdfa extractor you just need to set the property any23.extraction.rdfa.librdfa. If the librdfa property is set, it will override the Semargl property without regard the value that is set. By default the librdfa property is off. After, you change the extractor you can use Any23 as usual.
Info |
---|
Remember to install librdfa library to use the librdfa extractor. |
In order to change the property, you can set the ANY23_OPTS environmental variable or setting the property in the Configuration class. Check the official documentation for more details.
Info | ||
---|---|---|
| ||
sadasd |
librdfa-rdf4j
Description
Building from source
Install
Use
...