Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Grobid Quantities

...

Grobid Quantities is a Java library used to recognize is a module of Grobid specialised on the recognition of any expressions of measurements (e.g. pressure, temperature, etc.) in textual documents , parse, normalize and finally convert the measurements into SI units. It can be used on technical and scientific articles (text, XML and PDF input) and patents (text and XML input). such as PDF publications.
Measurements are parsed normalised and converted into SI units. 
To use its capabilities with Tika, one must install the server endpoint created for Grobid Quantities to extract measurement units from text passed to it.

Installation

Steps to install: Install Grobid Quantities by following the steps from github and make sure the quantity model is trained as per the instructions provided

After installing and training the model, start the REST server using the following command

Start Grobid Quantities Server

...

Table of Contents

Installing Grobid-quantities

The best approach is to run Grobid-quantities via docker.

TLDR: The following command will start the grobid-quantities image on port 8060 (the default port for grobid-quantities):

docker run -t --rm --init -p 8060:8060 lfoppiano/grobid-quantities:${latest_grobid_quantities_version}

...

The server starts by default on port number 8080 8060 and the server can be seen running on http://127.0.0.1:80808060.

Preparing resources for Grobid

...

-quantities in Tika-App

...

The resources to be created are 2 files: tika-config.xml and GrobidServer.properties

Create Tika-config.xml

In order to use any of the NamedEntityParser implementations in Tika, the parser responsible for handling the name recognition task needs to be enabled.
This can be done

...

by creating the tika-config.xml file, as follows:

No Format
 <?xml version="1.0" encoding="UTF-8"?>
 <properties>
     <parsers>
         <parser class="org.apache.tika.parser.ner.NamedEntityParser">
             <mime>text/plain</mime>
             <mime>text/html</mime>
             <mime>application/xhtml+xml</mime>
         </parser>
     </parsers>
 </properties>
 

This configuration has to be supplied

...

later

...

.

...

Create GrobidServer.properties

...

It is imperative that Tika should know on what host you are running the grobid-quantities-server. By default, Tika will assume your server runs on port

...

8060.
In order to specify any other port, you must supply a GrobidServer.properties file. Sample GrobidServer.properties file. My file looks like the following:

No Format
grobid.server.url=http://localhost:

...

8060
grobid.endpoint.text=/processQuantityText
 


In a nutshell

No Format
 #Create a directory for keeping the config and properties file.
 export GROBID_QUANTITIES_RES=$HOME/GrobidQuantitiesRest-resources
 mkdir -p $GROBID_QUANTITIES_RES
 cd $GROBID_QUANTITIES_RES
 #config file must be stored in this directory
 pwd

 export PATH_PREFIX="$GROBID_QUANTITIES_RES/org/apache/tika/parser/ner/grobid"
 mkdir -p $PATH_PREFIX
 #create and edit the properties file
 vim $PATH_PREFIX/GrobidServer.properties
 



Running Grobid Quantities with Tika

No Format

export TIKA_APP={your/path/to/tika-app}/target/tika-app-1.13-SNAPSHOT.jar

#set the system property to use GrobidNERecogniser class
java -Dner.impl.class=org.apache.tika.parser.ner.grobid.GrobidNERecogniser -classpath $GROBID_QUANTITIES_RES:$TIKA_APP org.apache.tika.cli.TikaCLI --config=$GROBID_QUANTITIES_RES/tika-config.xml -m  https://en.wikipedia.org/wiki/Time

...