...
To pass the test the event hash and the model output must be identical.
Component | Model | Training Time 1.5.2 | Training Time 1.5.3 | Tester | Passed | Comment | |
---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin |
|
| Jörn | yes |
| |
Tokenizer | en-token.bin |
|
| Jörn | yes |
| |
POS Tagger | en-pos-maxent.bin |
|
| Jörn | yes |
| |
POS Tagger | en-pos-perceptron.bin |
|
| Jörn | yes |
| |
Parser | en-parser-chunking.bin |
|
| Jörn |
| yes | Tested on 10k sentences |
Note: Time was measured with the time command, the value is the "real" time value.
...
Component | Data | Tester | Tagging Perf 1.5.2 | Tagging Perf 1.5.3 | Comment | |||
---|---|---|---|---|---|---|---|---|
Sentence Detector |
|
|
|
|
| |||
Tokenizer |
|
|
|
|
| |||
Name Finder | CONLL 2002 Dutch Person ned.testa | jkosin | Name Finder | CONLL 2002 Dutch Person ned.testa | jkosin | Precision: 0.7552941176470588 | Precision: 0.7552941176470588 |
|
Name Finder | CONLL 2002 Dutch Person ned.testb | jkosin | Precision: 0.8505025125628141 | Precision: 0.8505025125628141 |
| |||
Name Finder | CONLL 2002 Dutch Organization ned.testa | jkosin | Precision: 0.8561872909698997 | Precision: 0.8561872909698997 |
| |||
Name Finder | CONLL 2002 Dutch Organization ned.testb | jkosin | Precision: 0.7830374753451677 | Precision: 0.7830374753451677 |
| |||
Name Finder | CONLL 2002 Dutch Location ned.testa | jkosin | Precision: 0.8458333333333333 | Precision: 0.8458333333333333 |
| |||
Name Finder | CONLL 2002 Dutch Location ned.testb | jkosin | Precision: 0.8816326530612245 | Precision: 0.8816326530612245 |
| |||
Name Finder | CONLL 2002 Dutch Misc ned.testa | jkosin | Precision: 0.8354114713216958 | Precision: 0.8354114713216958 |
| |||
Name Finder | CONLL 2002 Dutch Misc ned.testb | jkosin | Precision: 0.8264984227129337 | Precision: 0.8264984227129337 |
| |||
Name Finder | CONLL 2002 Combined ned.testa | jkosin | Precision: 0.6509695290858726 | Precision: 0.664424218440839 | 1000 iterations | |||
Name Finder | CONLL 2002 Dutch Combined ned.testb | jkosin | Precision: 0.6869929337869668 | Precision: 0.7006019366657943 | 1000 iterations | |||
Name Finder | CONLL 2002 Spanish Person esp.testa | jkosin | Precision: 0.9010695187165776 | Precision: 0.9010695187165776 |
| |||
Name Finder | CONLL 2002 Spanish Person esp.testb | jkosin | Precision: 0.9195205479452054 | Precision: 0.9195205479452054 |
| |||
Name Finder | CONLL 2002 Spanish Organization esp.testa | jkosin | Precision: 0.8288942695722357 | Precision: 0.8288942695722357 |
| |||
Name Finder | CONLL 2002 Spanish Organization esp.testb | jkosin | Precision: 0.8036277602523659 | Precision: 0.8036277602523659 |
| |||
Name Finder | CONLL 2002 Spanish Location esp.testa | jkosin | Precision: 0.7743016759776536 | Precision: 0.7743016759776536 |
| |||
Name Finder | CONLL 2002 Spanish Location esp.testb | jkosin | Precision: 0.8301886792452831 | Precision: 0.8301886792452831 |
| |||
Name Finder | CONLL 2002 Spanish Misc esp.testa | jkosin | Precision: 0.6492890995260664 | Precision: 0.6492890995260664 |
| |||
Name Finder | CONLL 2002 Spanish Misc esp.testb | jkosin | Precision: 0.686046511627907 | Precision: 0.686046511627907 |
| |||
Name Finder | CONLL 2002 Spanish Combined esp.testa | jkosin | Precision: 0.7005423249233671 | Precision: 0.7047866069323273 | 1000 iterations | |||
Name Finder | CONLL 2002 Spanish Combined esp.testb | jkosin | Precision: 0.756635931824532 | Precision: 0.7588711930706902 | 1000 iterations | |||
Name Finder | CONLL 2003 English Person eng.testa | jkosin | Precision: 0.9523195876288659 | Precision: 0.9523195876288659 |
| |||
Name Finder | CONLL 2003 English Person eng.testb | jkosin | Precision: 0.9391727493917275 | Precision: 0.9391727493917275 |
| |||
Name Finder | CONLL 2003 English Organization eng.testa | jkosin | Precision: 0.8768046198267565 | Precision: 0.8768046198267565 |
| |||
Name Finder | CONLL 2003 English Organization eng.testb | jkosin | Precision: 0.8435980551053485 | Precision: 0.8435980551053485 |
| |||
Name Finder | CONLL 2003 English Location eng.testa | jkosin | Precision: 0.9361421988150099 | Precision: 0.9361421988150099 |
| |||
Name Finder | CONLL 2003 English Location eng.testb | jkosin | Precision: 0.9206349206349206 | Precision: 0.9206349206349206 |
| |||
Name Finder | CONLL 2003 English Misc eng.testa | jkosin | Precision: 0.9027982326951399 | Precision: 0.9027982326951399 |
| |||
Name Finder | CONLL 2003 English Misc eng.testb | jkosin | Precision: 0.8592436974789915 | Precision: 0.8592436974789915 |
| |||
Name Finder | CONLL 2003 English Combined eng.testa | jkosin | Precision: 0.861812521618817 | Precision: 0.8640608785887236 | 1000 iterations | |||
Name Finder | CONLL 2003 English Combined eng.testb | jkosin | Precision: 0.8041311831853597 | Precision: 0.8064866823699945 | 1000 iterations | |||
Name Finder | CONLL 2003 German Person deu.testa | jkosin | Precision: 0.9132653061224489 | Precision: 0.9132653061224489 |
| |||
Name Finder | CONLL 2003 German Person deu.testb | jkosin | Precision: 0.8732106339468303 | Precision: 0.8732106339468303 |
| |||
Name Finder | CONLL 2003 German Organization deu.testa | jkosin | Precision: 0.8407224958949097 | Precision: 0.8407224958949097 |
| |||
Name Finder | CONLL 2003 German Organization deu.testb | jkosin | Precision: 0.8014705882352942 | Precision: 0.8014705882352942 |
| |||
Name Finder | CONLL 2003 German Location deu.testa | jkosin | Precision: 0.7816326530612245 | Precision: 0.7816326530612245 |
| |||
Name Finder | CONLL 2003 German Location deu.testb | jkosin | Precision: 0.8033826638477801 | Precision: 0.8033826638477801 |
| |||
Name Finder | CONLL 2003 German Misc deu.testa | jkosin | Precision: 0.7055555555555556 | Precision: 0.7055555555555556 |
| |||
Name Finder | CONLL 2003 German Misc deu.testb | jkosin | Precision: 0.6601307189542484 | Precision: 0.6601307189542484 |
| |||
Name Finder | CONLL 2003 German Combined deu.testa | jkosin | Precision: 0.7718859429714857 | Precision: 0.7783891945972986 | OPENNLP-417 | |||
Name Finder | CONLL 2003 German Combined deu.testb | jkosin | Precision: 0.7467566165023353 | Precision: 0.749351323300467 | OPENNLP-417 | |||
POS Tagger | CONLL 2006 Danish | Jörn / ? | Accuracy: 0.9511278195488722 |
| Accuracy: 0.9512987012987013 | Jörn: Same result as other tester | ||
POS Tagger | CONLL 2006 Dutch | Jörn | Accuracy: 0.9324977618621307 | Accuracy: 0.9324977618621307 |
| |||
POS Tagger | CONLL 2006 Portuguese | Jörn / ? | Accuracy: 0.9659110277825124 |
| Accuracy: 0.9659110277825124 | Jörn: Same result as other tester | ||
POS Tagger | CONLL 2006 Swedish | Jörn | Accuracy: 0.9275106082036775 | Accuracy: 0.9275106082036775 |
| |||
Chunker | CONLL 2000 | William | CONLL 2000 | William | Precision: 0.9257575757575758 | Precision: 0.9257575757575758 |
| |
Sentence Detector | Arvores Deitadas | William |
| Precision: 0.9891491491491492 | PERCEPTRON Cutoff 0 | |||
Tokenizer | Arvores Deitadas | William |
| Precision: 0.9257575757575758 9995231988260895 | PERCEPTRON Cutoff 0 .9257575757575758 | |||
Chunker | Arvores Deitadas | William | Precision: 0.9404684925220583 | Precision: 0.9562405864042575 | OPENNLP-541, OPENNLP-423 |
...
Analysis Engine | Tester | Passed | Comment |
---|---|---|---|
Sentence Detector |
|
|
|
Sentence Detector Trainer |
|
|
|
Tokenizer ME |
|
|
|
Tokenizer Trainer |
|
|
|
Name Finder |
|
|
|
Name Finder Trainer |
|
|
|
Chunker |
|
|
|
Chunker Trainer |
|
|
|
POS Tagger |
|
|
|
POS Tagger Trainer |
|
|
|
Parser |
|
|
|
createPear.sh | Jörn | yes |
|
Sample PEAR | Jörn | yes |
|
Distribution Review
Please ensure that the listed files below are included in the distributions
and are in a good state.
Package | File or Test | Tester | Passed | Comment | |
---|---|---|---|---|---|
Binary | LICENSE | Jörn | Yes | AL 2.0 and BSD for JWNL | |
Binary | NOTICE | Jörn | Yes | standard notice, dates are correct. JWNL is mentioned | |
Binary | README | Jörn | Yes | File was reviewed on the dev list. | |
Binary | RELEASE_NOTES.html | Jörn | Yes |
| issue list is generated correctly |
Binary | Test signatures: .md5, .sha1, .asc | Jörn | Yes rc4 | tested for rc3 | |
Binary | JIRA issue list created | William | No | Yes | Minor issue: the project.version was not filled. The list is empty |
Binary | Contains maxent, tools, uima and jwnl jars | Jörn | Yes |
| |
Source | LICENSE | Jörn | Yes | standard AL 2.0 file | |
Source | NOTICE | Jörn | Yes | standard notice, dates are correct | |
Source | Test signatures: .md5, .sha1, .asc | Jörn | rc1 | tested for rc3 | |
Source | Can build from source? | Jörn | Yes | Test should be done without jwnl and opennlp in local m2 repo. |
Notes about testing
Compatibility tests
The following commands can be used to reproduce the compatibility tests with Leipzig corpus.
Code Block |
---|
# Corpus preparation: the following command will create documents from the corpus. Sed is used to remove the language prefix
sh bin/opennlp DoccatConverter leipzig -data ../eng_news_2010_300K-text/eng_news_2010_300K-sentences.txt -encoding UTF-8 -lang en | sed -E 's/^en[[:space:]]//g' > ../out-tokenized-documents.test
# Corpus preparation: this forces the detokenization of the documents
sh bin/opennlp SentenceDetectorConverter namefinder -data ../out-tokenized-documents.test -encoding UTF-8 -detokenizer trunk/opennlp-tools/lang/en/tokenizer/en-detokenizer.xml > ../out-documents.test
# Now the actually tests. Execute it for the previous release and for the current RC. Compare the output using diff:
time sh bin/opennlp SentenceDetector ../models/en-sent.bin < ../out-documents.test > ../out-sentences_1.5.2.test
time sh bin/opennlp TokenizerME ../models/en-token.bin < ../out-sentences_1.5.2.test > ../out-toks_1.5.2.test
time sh bin/opennlp TokenNameFinder ../models/en-ner-person.bin < ../out-toks_1.5.2.test > ../out-ner_1.5.2.test
time sh bin/opennlp POSTagger ../models/en-pos-maxent.bin < ../out-toks_1.5.2.test > ../out-pos_maxent_1.5.2.test
time sh bin/opennlp POSTagger ../models/en-pos-perceptron.bin < ../out-toks_1.5.2.test > ../out-pos_pers_1.5.2.test
time sh bin/opennlp ChunkerME ../models/en-chunker.bin < ../out-pos_pers_1.5.2.test > ../out-chk_1.5.2.test
time sh bin/opennlp Parser ../models/en-parser-chunking.bin < ../out-toks_1.5.2.test > ../out-parse_1.5.2.test
|