THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
Complete, concise instructions to build, train, Instructions to train and run a simple Natural Language Parsing parts-of-speech (PoS) tagger program. Instructions are for Unix, but can easily adapt adaptable for Windows. For the purposes of these instructions, all downloads are presumed to be in
Unless otherwise specified, save downloads to $HOME/archives
.
- Download and install Java.
- Download and install Maven.
- Download OpenNLP.
- Download a PoS Treebank training set into
$HOME/archives/pos
.- ERG from DELPH-IN - $0
- Maxent from OpenNLP v1.5 - $0
- Perceptron from OpenNLP 1.5 - $0
- NLTK Files from NLTK - $0
- CDT Files from Copenhagen Treebank - $0
- Penn Treebank 3 from LDC - $3000
- Create development, library, and data directoriesCreate development area:
mkdir -p $HOME/dev/java/nlp/lib/
mkdir -p $HOME/dev/java/nlp/data/
- Change to development areadirectory:
cd $HOME/dev/java/nlp/
- Extract files:
tar zxf $HOME/archives/apache-opennlp-*-incubating-src.tar.gz
- Rename directory:
mv apache-opennlp-*-incubating-src opennlp
- Build Java Archive (JAR) files (5 to 10 minutes, depending):
cd opennlp/opennlp
mvn install > build.log
- Change to OpenNLP development directory:
cd $HOME/dev/java/nlp/opennlp/
- Move library files to library directory:
mv opennlp-uima/target/dependency/* ../lib/.
- Move training data to data directory:
mv $HOME/archives/pos/en-pos-maxent.bin $HOME/dev/java/nlp/data/.
- Change to development directory:
cd $HOME/dev/java/nlp/
- Copy HelloWorld Source Code to
$HOME/dev/java/nlp/HelloWorld.java
. - Compile
HelloWorld.java
:
javac -cp $(echo lib/*.jar | tr ' ' ':') HelloWorld.java
- Run
HelloWorld.java
:
java -cp .:$(echo lib/*.jar | tr ' ' ':') HelloWorld data/en-pos-maxent.bin "Earlier today, we compiled a program."
Output:
Code Block |
---|
Earlier => JJR @ 0.2182545923597446
today, => NN @ 0.666361706870189
we => PRP @ 0.8324059729613176
compiled => VBN @ 0.028125261823754893
a => DT @ 0.9145975161653905
program. => NN @ 0.8841759649076423
|