...
To pass the test the event hash and the model output must be identical.
Component | Model | Training Time 1.5.2 | Training Time 1.5.3 | Tester | Passed | Comment | |
---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin |
|
| Jörn | yes |
| |
Tokenizer | en-token.bin |
|
| Jörn | yes |
| |
POS Tagger | en-pos-maxent.bin |
|
| Jörn | yes |
| |
POS Tagger | en-pos-perceptron.bin |
|
| Jörn | yes |
| |
Parser | en-parser-chunking.bin |
|
| Jörn |
| yes | Tested on 10k sentences |
Note: Time was measured with the time command, the value is the "real" time value.
...
Component | Data | Tester | Tagging Perf 1.5.2 | Tagging Perf 1.5.3 | Comment | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector |
|
|
|
|
| ||||||||
Tokenizer |
|
|
|
|
| ||||||||
Name Finder | CONLL 2002 Dutch Person ned.testa | jkosin | Precision: 0.7552941176470588 | Name Finder | CONLL 2002 Dutch Person ned.testa |
| Precision: 0.7552941176470588 |
| |||||
Name Finder | CONLL 2002 Dutch Person ned.testb | jkosin | Precision: 0.8505025125628141 |
|
| Precision: 0.8505025125628141 |
| ||||||
Name Finder Name Finder | CONLL 2002 Dutch Organization ned.testa | jkosin | Precision: 0.8561872909698997 | Precision: 0.8561872909698997 |
| ||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testb | jkosin | Precision: 0.7830374753451677 | Precision: 0.7830374753451677 |
| ||||||||
Name Finder | CONLL 2002 Dutch Location ned.testa | jkosin | Precision: 0.8458333333333333 |
| Precision: 0.8458333333333333 |
| |||||||
Name Finder | CONLL 2002 Dutch Location ned.testb | jkosin | Precision: 0.8816326530612245 | Precision: 0.8816326530612245 |
| ||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testa | jkosin | Precision: 0.8354114713216958 | Precision: 0.8354114713216958 |
| ||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testb | jkosin | Precision: 0.8264984227129337 |
| Precision: 0.8264984227129337 |
| |||||||
Name Finder | CONLL 2002 Combined ned.testa | jkosin | Precision: 0.6509695290858726 |
| 1000 iterations | Name Finder | CONLL 2002 Dutch Combined ned.testb |
| Precision: 0.6869929337869668 664424218440839 | 1000 iterations | |||
Name Finder | CONLL 2002 Spanish Person esp.testa Dutch Combined ned.testb | jkosin | Precision: 0.9010695187165776 6869929337869668 |
|
| Name Finder | CONLL 2002 Spanish Person esp.testb | 6763720690543674 | Precision: 0.9195205479452054 7006019366657943 |
| 1000 iterations | ||
Name Finder | CONLL 2002 Spanish Organization Person esp.testa | jkosin | Precision: 0.8288942695722357 9010695187165776 |
|
| Name Finder | CONLL 2002 Spanish Organization esp.testb | 684263959390863 | Precision: 0.8036277602523659 9010695187165776 |
| |||
Name Finder | CONLL 2002 Spanish Location Person esp.testa testb | jkosin | Precision: 0.7743016759776536 9195205479452054 |
|
| Name Finder | CONLL 2002 Spanish Location esp.testb | 8142532221379833 | Precision: 0.8301886792452831 9195205479452054 |
| |||
Name Finder | CONLL 2002 Spanish Misc Organization esp.testa | jkosin | Precision: 0.6492890995260664 8288942695722357 |
|
| 6988771691051379 | Precision: 0.8288942695722357 |
| |||||
Name Name Finder | CONLL 2002 Spanish Misc Organization esp.testb | jkosin | Precision: 0.686046511627907 8036277602523659 |
|
| Name Finder | CONLL 2002 Spanish Combined esp.testa | 7638680659670164 | Precision: 0.7005423249233671 8036277602523659 | 1000 iterations | |||
Name Finder | CONLL 2002 Spanish Combined Location esp.testb testa | jkosin | Precision: 0.756635931824532 7743016759776536 |
| 7376263970196913 | Precision: 0.7743016759776536 | 1000 iterations | ||||||
Name Finder | CONLL 2003 English Person eng.testa 2002 Spanish Location esp.testb | jkosin | Precision: 0.9523195876288659 8301886792452831 | Precision: 0.95231958762886598301886792452831 |
| ||||||||
Name Finder | CONLL 2003 English Person eng.testb 2002 Spanish Misc esp.testa | jkosin | Precision: 0.9391727493917275 6492890995260664 | Precision: 0.93917274939172756492890995260664 |
| ||||||||
Name Finder | CONLL 2003 English Organization eng.testa 2002 Spanish Misc esp.testb | jkosin | Precision: 0.8768046198267565 686046511627907 | Precision: 0.8768046198267565686046511627907 |
| ||||||||
Name Finder | CONLL 2003 English Organization eng.testb 2002 Spanish Combined esp.testa | jkosin | Precision: 0.8435980551053485 7005423249233671 | Precision: 0.84359805510534857047866069323273 | 1000 iterations | ||||||||
Name Finder | CONLL 2003 English Location eng.testa 2002 Spanish Combined esp.testb | jkosin | Precision: 0.9361421988150099 756635931824532 | Precision: 0.93614219881500997588711930706902 | 1000 iterations | ||||||||
Name Finder | CONLL 2003 English Location Person eng.testb testa | jkosin | Precision: 0.9206349206349206 9523195876288659 | Precision: 0.92063492063492069523195876288659 |
| ||||||||
Name Finder | CONLL 2003 English Misc Person eng.testa testb | jkosin | Precision: 0.9027982326951399 9391727493917275 | Precision: 0.90279823269513999391727493917275 |
| ||||||||
Name Finder | CONLL 2003 English Misc Organization eng.testb testa | jkosin | Precision: 0.8592436974789915 8768046198267565 | Precision: 0.85924369747899158768046198267565 |
| ||||||||
Name Finder | CONLL 2003 English Combined Organization eng.testa testb | jkosin | Precision: 0.861812521618817 8435980551053485 | Precision: 0.86406087858872368435980551053485 | 1000 iterations | ||||||||
Name Finder | CONLL 2003 English Combined Location eng.testb testa | jkosin | Precision: 0.8041311831853597 9361421988150099 | Precision: 0.80648668236999459361421988150099 | 1000 iterations | ||||||||
Name Finder | CONLL 2003 German Person deu.testa English Location eng.testb | jkosin | Precision: 0.9132653061224489 9206349206349206 | Precision: 0.91326530612244899206349206349206 |
| ||||||||
Name Finder | CONLL 2003 German Person deu.testb English Misc eng.testa | jkosin | Precision: 0.8732106339468303 9027982326951399 | Precision: 0.87321063394683039027982326951399 |
| ||||||||
Name Finder | CONLL 2003 German Organization deu.testa English Misc eng.testb | jkosin | Precision: 0.8407224958949097 8592436974789915 | Precision: 0.84072249589490978592436974789915 |
| ||||||||
Name Finder | CONLL 2003 German Organization deu.testb English Combined eng.testa | jkosin | Precision: 0.8014705882352942 861812521618817 | Precision: 0.80147058823529428640608785887236 |
| 1000 iterations | |||||||
Name Name Finder | CONLL 2003 German Location deu.testa English Combined eng.testb | jkosin | Precision: 0.7816326530612245 8041311831853597 | Precision: 0.78163265306122458064866823699945 | 1000 iterations | ||||||||
Name Finder | CONLL 2003 German Location Person deu.testb testa | jkosin | Precision: 0.8033826638477801 9132653061224489 | Precision: 0.80338266384778019132653061224489 |
| ||||||||
Name Finder | CONLL 2003 German Misc Person deu.testa testb | jkosin | Precision: 0.7055555555555556 8732106339468303 | Precision: 0.70555555555555568732106339468303 |
| ||||||||
Name Finder | CONLL 2003 German Misc Organization deu.testb testa | jkosin | Precision: 0.6601307189542484 8407224958949097 | Precision: 0.66013071895424848407224958949097 |
| ||||||||
Name Finder | CONLL 2003 German Combined Organization deu.testa testb | jkosin | Precision: 0.7718859429714857 8014705882352942 | Precision: 0.8014705882352942 |
| ||||||||
Name Finder | CONLL 2003 German Combined Location deu.testb testa | jkosin | Precision: 0.7467566165023353 7816326530612245 |
|
| POS Tagger | CONLL 2006 Danish |
| 45840813883901854 | Precision: 0.7816326530612245 |
|
| |
POS Tagger Name Finder | CONLL 2006 Dutch |
| 2003 German Location deu.testb | jkosin | Precision Accuracy: 0.9324977618621307 |
|
| POS Tagger | CONLL 2006 Portuguese |
| Accuracy8033826638477801 | Precision: 0.8033826638477801 | POS |
Tagger Name Finder | CONLL 2006 Swedish |
| 2003 German Misc deu.testa | jkosin | Precision Accuracy: 0.9275106082036775 |
|
| 7055555555555556 | Precision: 0.7055555555555556 |
| |||
Name Finder | CONLL 2003 German Misc deu.testb | jkosin | Precision: 0.6601307189542484 | Precision: 0.6601307189542484 |
| ||||||||
Name Finder | CONLL 2003 German Combined deu.testa | jkosin | Precision: 0.7718859429714857 | Precision: 0.7783891945972986 | OPENNLP-417 | ||||||||
Name Finder | CONLL 2003 German Combined deu.testb | jkosin | Precision: 0.7467566165023353 | Precision: 0.749351323300467 | OPENNLP-417 | ||||||||
POS Tagger | CONLL 2006 Danish | Jörn / ? | Accuracy: 0.9511278195488722 | Accuracy: 0.9512987012987013 | Jörn: Same result as other tester | ||||||||
POS Tagger | CONLL 2006 Dutch | Jörn | Accuracy: 0.9324977618621307 | Accuracy: 0.9324977618621307 |
| ||||||||
POS Tagger | CONLL 2006 Portuguese | Jörn / ? | Accuracy: 0.9659110277825124 | Accuracy: 0.9659110277825124 | Jörn: Same result as other tester | ||||||||
POS Tagger | CONLL 2006 Swedish | Jörn | Accuracy: 0.9275106082036775 | Accuracy: 0.9275106082036775 |
| ||||||||
Chunker | CONLL 2000 | William | Precision: 0.9257575757575758 | Precision: 0.9257575757575758 |
| ||||||||
Sentence Detector | Arvores Deitadas | William |
| Precision: 0.9891491491491492 | PERCEPTRON Cutoff 0 | ||||||||
Tokenizer | Arvores Deitadas | William |
| Precision: 0.9995231988260895 | Chunker | CONLL 2000 | William | Precision: 0.9257575757575758 | PERCEPTRON Cutoff 0 .9257575757575758 | ||||
Chunker | Arvores Deitadas | William | Precision: 0.9404684925220583 | Precision: 0.9562405864042575 | OPENNLP-541, OPENNLP-423 |
...
Analysis Engine | Tester | Passed | Comment |
---|---|---|---|
Sentence Detector |
|
|
|
Sentence Detector Trainer |
|
|
|
Tokenizer ME |
|
|
|
Tokenizer Trainer |
|
|
|
Name Finder |
|
|
|
Name Finder Trainer |
|
|
|
Chunker |
|
|
|
Chunker Trainer |
|
|
|
POS Tagger |
|
|
|
POS Tagger Trainer |
|
|
|
Parser |
|
|
|
createPear.sh | Jörn | yes |
|
Sample PEAR | Jörn | yes |
|
Distribution Review
Please ensure that the listed files below are included in the distributions
and are in a good state.
Package | File or Test | Tester | Passed | Comment | |
---|---|---|---|---|---|
Binary | LICENSE | Jörn | Yes | AL 2.0 and BSD for JWNL | |
Binary | NOTICE | Jörn | Yes | standard notice, dates are correct. JWNL is mentioned | |
Binary | README | Jörn | Yes | File was reviewed on the dev list. | |
Binary | RELEASE_NOTES.html | Jörn | Yes | issue list is generated correctly | |
Binary | Test signatures: .md5, .sha1, .asc | Jörn | Yes rc4 | tested for rc3 | |
Binary | JIRA issue list created | William | No | Yes | Minor issue: the project.version was not filled. The list is empty |
Binary | Contains maxent, tools, uima and jwnl jars | Jörn | Yes |
| |
Source | LICENSE | Jörn | Yes | standard AL 2.0 file | |
Source | NOTICE | Jörn | Yes | standard notice, dates are correct | |
Source | Test signatures: .md5, .sha1, .asc | Jörn | rc1 | tested for rc3 | |
Source | Can build from source? | Jörn | Yes | Test should be done without jwnl and opennlp in local m2 repo. |
Notes about testing
Compatibility tests
The following commands can be used to reproduce the compatibility tests with Leipzig corpus.
Code Block |
---|
# Corpus preparation: the following command will create documents from the corpus. Sed is used to remove the language prefix
sh bin/opennlp DoccatConverter leipzig -data ../eng_news_2010_300K-text/eng_news_2010_300K-sentences.txt -encoding UTF-8 -lang en | sed -E 's/^en[[:space:]]//g' > ../out-tokenized-documents.test
# Corpus preparation: this forces the detokenization of the documents
sh bin/opennlp SentenceDetectorConverter namefinder -data ../out-tokenized-documents.test -encoding UTF-8 -detokenizer trunk/opennlp-tools/lang/en/tokenizer/en-detokenizer.xml > ../out-documents.test
# Now the actually tests. Execute it for the previous release and for the current RC. Compare the output using diff:
time sh bin/opennlp SentenceDetector ../models/en-sent.bin < ../out-documents.test > ../out-sentences_1.5.2.test
time sh bin/opennlp TokenizerME ../models/en-token.bin < ../out-sentences_1.5.2.test > ../out-toks_1.5.2.test
time sh bin/opennlp TokenNameFinder ../models/en-ner-person.bin < ../out-toks_1.5.2.test > ../out-ner_1.5.2.test
time sh bin/opennlp POSTagger ../models/en-pos-maxent.bin < ../out-toks_1.5.2.test > ../out-pos_maxent_1.5.2.test
time sh bin/opennlp POSTagger ../models/en-pos-perceptron.bin < ../out-toks_1.5.2.test > ../out-pos_pers_1.5.2.test
time sh bin/opennlp ChunkerME ../models/en-chunker.bin < ../out-pos_pers_1.5.2.test > ../out-chk_1.5.2.test
time sh bin/opennlp Parser ../models/en-parser-chunking.bin < ../out-toks_1.5.2.test > ../out-parse_1.5.2.test
|