Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Loading the Wikinews Corpus

The wikinews dumps can be downloaded from Wikipedia, here is some general information about the dumps:
http://meta.wikimedia.org/wiki/Data_dumps

The dump itself can be downloaded here: http://dumps.wikimedia.org/

Choose a mirror near you and then go to "Database backup dumps". The file you need to download is
called like this (the date will be different): enwikinews-20120727-pages-articles.xml.bz2

The current version of the parser only works well for the English wikinews dump. Contributions to fix this for other
languages are very welcome.

Checkout the wikinews parser:
svn co https://svn.apache.org/repos/asf/opennlp/sandbox/wikinews-importer/

And compile it with this command: mvn clean install

Opening an article in the Cas Editor

...