...
Choose a mirror near you and then go to "Database backup dumps". The file you need to download is
called like this (the date will be different): enwikinews-20120727-pages-articles.xml.bz2
After the download decompress the file, e.g. with bunzip2 on Linux.
The current version of the parser only works well for the English wikinews dump. Contributions to fix this for other
languages are very welcome.
...
And compile it with this command: mvn clean install
The xml file can now be parsed:
bin/converter /home/blue/Downloads/enwikinews-20120727-pages-articles.xml articles
This command will take a while to run, when its done there is one xmi file for each
article in the articles folder.
To load the articles in the corpus server a corpus must be created first.
This is done with the corpus-server-tools.
Checkout the tools
svn co https://svn.apache.org/repos/asf/opennlp/sandbox/corpus-server-tools
and build them with mvn clean install.
Now create the wikinews corpus in the previously started Corpus Server
can be created:
bin/cs-tools CreateCorpus http://localhost:8080/rest/corpora enwikinews ../wikinews-importer/samples/TypeSystem.xml ../wikinews-importer/samples/wikinews.xml
And import the article files:
bin/cs-tools CASImporter http://localhost:8080/rest/corpora/enwikinews ../wikinews-importer/articles
Opening an article in the Cas Editor
...