Page History

Table of Contents

The GrobidJournalParser uses the GROBID (or Grobid) GeneRation Of Bibliographic Data machine learning framework to parse PDF documents and to extract structured informations such as title, abstract, authors, affiliations, keywords, etc, from journal publications. The parser has been integrated into Tika. You can follow this guide to get it working on your system.

Table of Contents

Installing GROBID

The best approach is to run Grobid via docker.

...

cd $HOME && git clone https://github.com/chrismattmann/grobidparser-resources.git
modify the file grobidparser-resources/org/apache/tika/parser/journal/GrobidExtractor.properties

Both tika-server and tika-parser-nlp are required for calling Grobid.

Running Grobid with Tika Server

...

No Format
java -cp grobidparser-resources/:tika-server-standard-2.8.0.jar:tika-parser-nlp-package-2.8.0.jar org.apache.tika.server.core.TikaServerCli --config grobidparser-resources/tika-config.xml

...

No Format
java -cp grobidparser-resources/:tika-app-2.8.0.jar:tika-parser-nlp-package-2.8.0.jar org.apache.tika.cli.TikaCLI --config=grobidparser-resources/tika-config.xml -J PATH_TO_YOUR_PDF_FILE

...

Page tree

Versions Compared

Old Version 3

New Version Current

Key

Installing GROBID

Running Grobid with Tika Server