Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Modularization – We've modularized tika-server:
    • tika-server-core includes all of the functionality of tika-server, but with no bundled parsers.  Users might want this if they are only parsing a few file formats or want to use only their custom parsers.
    • tika-server-standard is what most people will want to use.  As with the tika-parsers-standard module, this includes most of the common file format parsers. If needed, users may also add the tika-parser-scientific-package and tika-parser-sqlite3-package to the class path.  In 1.x, the first was included in tika-server 1.x by default, and the second was included only if users added xerial's sqlite3 jar on the classpath.
  • --spawnChild mode is now default.  In Tika 1.x, users had to specify this on the commandline to force tika-server to fork a process that did the actual parsing.  This option is far more robust against timeouts, OOMs, crashes and other mishaps; the forking process monitors the forked process and will restart on timeouts, etc. NOTE: Client code needs to be able to handle the times when tika-server is restarting and is not available; this typically only takes a few seconds.  To disable this mode, use --noFork on the commandline.
  • Configuring tika-server in Tika 2.x.  See below.  We've moved most configuration options into tika-config.xml and dramatically limited the commandline options.
  • The namespace has changed slightly for TikaServerCli to org.apache.tika.server.core.TikaServerCli. If adding optional jars to the class path in, say, a bin/ directory, start tika-server with: java -cp "bin/*" org.apache.tika.server.core.TikaServerCli -c tika-config.xml
  • enableFileUrl -- We have removed this capability from tika-server in 2.x.  We have replaced it with the FileSystemFetcher, which is available in tika-core.  See FetchersInClassicServerEndpoints.

...

As with other components, in Tika 2.x, we moved configuration into tika-config.xml.  We have left only a few commandline options available (to see the options: java -jar tika-server-standard-2.x.x.jar --help).   Please note that all command-line option values will override their counterparts in the xml configuration file.

  • -h, --host – hostname
  • -p, --port – which port to bind to.  Can specify ranges, e.g. 9990-9999, and Tika will launch 10 servers in forked processes on each of those ports. Can also specify a comma-delimited list, e.g. (9996,9998,9999).
  • -?, --help
  • -c, --config – specify the tika-config.xml file to use for this tika-server and its forked processes.
  • -i, --id – specify the id for this server.  This is used in logging and in the /status endpoint.
  • --noFork – run tika-server in legacy mode without forking a process.

...