...
This page contains the test plan for the 1.5.2 release.
The 1.5.1 2 release does not introduce any changes to the feature
generation expect for the name finder which might generate different
token class features for words with special letters.
...
The 1.5.0 SourceForge models must be fully compatible with the 1.5.2
release. In this test all the English models are tested for compatibility
on the English 300k sentences Leipzig Corpus. It is tested that
the output produced with the same model by both versions has the same md5 hash.
Component | Model | Perf 1.5.01 | Perf 1.5.12 | Tester | Passed | Comment | ||
---|---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin 42565.4 sent/s | 42186.7 sent/s |
| joern | yes | no | It did not pass because of OPENNLP-202. | |
Tokenizer | en-token.bin | 3059.5 sent/s | 3091.8 sent/s | 2300.4 sent/s | joern | yes |
| |
Name Finder | en-ner-person.bin | 290 614.7 4 sent/s s | 487 650.1 6 sent/s | joern | no | yes | output identical, measurement was done on a idle system, | |
POS Tagger | en-pos-maxent.bin | 721 732.3 1 sent/s | 816.1 9 sent/s | joern | yes |
| ||
POS Tagger | en-pos-perceptron.bin | 1097.7 sent/s | 1110.6 sent/s |
| joern |
| no | Perceptron normalization was changed. OPENNLP-155 might improve accuracy a little |
Chunker | en-chunker.bin | 169 167,5 3 sent/s | 166.4 sent/s | colen joern | yes | computerB, tested with CONLL2000 (2012 sentences) | ||
Parser | en-parser-chunking.bin | 4.3 sent/s | 11.6 sent/s |
| joern | yes | no | A very few sentences are parsed differently due to OPENNLP-233. |
Note: Test was done on MacBook Pro 13" 7.1, 2.66 GHz Core 2 Duo, 8GB Ram, 256GB SSD running OS X 10.6.6
and Java 1.6.0_22 26 64-Bit Server.The performance varies because light weight tasks have been performed in the background while testing.
Note: computerB is a DualCore T8100, 4GB Ram, 250GB HD running Ubuntu 10.10 64-Bit and Java 1.6.0_20Note: "Concurrent" in the comment means that both tests where started at the same time.
...
Component | Model | Training Time 1.5.01 | Training Time 1.5.12 | Tester | Passed | Comment | ||||
---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin | 0m12.847s | 0m11.255s |
| joern | yes | no | The new version is more accurate due to OPENLP-202. | ||
Tokenizer | en-token.bin | 2m16 2m30.694s 115s | 1m35.115s 414s | joern | yes | Re-test tagging was very slow, only 250 sent/s |
| |||
POS Tagger Name Finder | en-nerpos-datemaxent.bin |
|
| joern | yes | Test is still done, because tagdict is not tested with public data | ||||
POS Tagger | no | OPENNLP-138 | Name Finder | en-nerpos-locationperceptron.bin |
|
| joern | no | OPENNLP-138 | Perceptron code was changed |
Parser Name Finder | en-nerparser-moneychunking.bin | 138m9.045s |
| joern | no | There are small differences due to OPENNLP-138 | ||||
Name Finder | en-ner-organization.bin |
|
| joern | no | OPENNLP-138 | ||||
Name Finder | en-ner-percentage.bin |
|
| joern | no | OPENNLP-138 | ||||
Name Finder | en-ner-person.bin |
|
| joern | no | OPENNLP-138 | ||||
POS Tagger | en-pos-maxent.bin |
|
| joern |
|
| ||||
POS Tagger | en-pos-perceptron.bin |
|
| joern |
|
| ||||
Chunker | en-chunker.bin |
|
| joern |
|
| ||||
Parser | en-parser-chunking.bin | 110m8.712s | 138m9.045s | joern | yes |
|
Note: Time was measured with the time command, the value is the "real" time value.
Performance test with public data
Test the tagging performance with all the publicly available training
and test data for various languages.
It is assumed that the training will be done with a cutoff of 5 and 100 iterations,
if different values are used please write them into the comment.
233. |
Note: Time was measured with the time command, the value is the "real" time value.
Performance test with public data
Test the tagging performance with all the publicly available training
and test data for various languages.
It is assumed that the training will be done with a cutoff of 5 and 100 iterations,
if different values are used please write them into the comment.
Component | Data | Tester | Tagging Perf 1.5.1 | Tagging Perf 1.5.2 | Comment | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector |
|
|
|
|
| ||||||||
Tokenizer |
|
|
|
|
| ||||||||
Name Finder | CONLL 2002 Dutch Person ned.testa | jkosin | Precision: 0.7906976744186046 | Precision: 0.7552941176470588 | Performance Change due to OPENNLP-294 and more... | ||||||||
Name Finder | CONLL 2002 Dutch Person ned.testb | jkosin | Precision: 0.8527980535279805 | Precision: 0.8505025125628141 |
| ||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testa | jkosin | Precision: 0.8386075949367089 | Precision: 0.8561872909698997 |
| ||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testb | jkosin | Precision: 0.7784200385356455 | Precision: 0.7830374753451677 |
| ||||||||
Name Finder | CONLL 2002 Dutch Location ned.testa | jkosin | Precision: 0.8362831858407079 | Precision: 0.8458333333333333 |
| ||||||||
Name Finder | CONLL 2002 Dutch Location ned.testb | jkosin | Precision: 0.854251012145749 | Precision: 0.8816326530612245 |
| ||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testa | jkosin | Precision: 0.8300492610837439 | Precision: 0.8354114713216958 |
| ||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testb | jkosin | Precision: 0.8373205741626795 | Precision: 0.8264984227129337 |
| ||||||||
Name Finder | CONLL 2002 Combined ned.testa | jkosin | Precision: 0.7906976744186046 | Precision: 0.6509695290858726 | 1000 iterations | ||||||||
Name Finder | CONLL 2002 Dutch Combined ned.testb | jkosin | |||||||||||
Component | Data | Tester | Tagging Perf 1.5.0 | Tagging Perf 1.5.1 | Comment | ||||||||
Sentence Detector |
|
|
|
| Will not be done in this release. | ||||||||
Tokenizer |
|
|
|
| We need a de-tokenizer dictionary first, will be done in next release. | ||||||||
Name Finder | CONLL 2002 Dutch Person ned.testa | joern |
| Precision: 0.7906976744186046 |
| ||||||||
Name Finder | CONLL 2002 Dutch Person ned.testb | joern |
| Precision: 0.8527980535279805 |
| Name Finder | CONLL 2002 Dutch Organization ned.testa | joern | .7302083333333333 | Precision: 0.8386075949367089 6869929337869668 | 1000 iterations | ||
Name Finder | CONLL 2002 Dutch Organization ned.testb | joern | Spanish Person esp.testa | jkosin | Precision: 0.7784200385356455 8982630272952854 |
| Name Finder | CONLL 2002 Dutch Location ned.testa | joern |
| 7140039447731755 | Precision: 0.8362831858407079 9010695187165776 |
|
Name Finder | CONLL 2002 Dutch Location nedSpanish Person esp.testb | joern | jkosin | Precision: 0.854251012145749 9008 |
| Name Finder | CONLL 2002 Dutch Misc ned.testa | joern |
| 8279411764705882 | Precision: 0.8300492610837439 9195205479452054 |
| |
Name Finder | CONLL 2002 Dutch Misc ned.testb | joern | Spanish Organization esp.testa | jkosin | Precision: 0.8373205741626795 8216258879242304 |
| Name Finder | CONLL 2002 Combined ned.testa | joern | .7017189079878665 | Precision: 0.7906976744186046 8288942695722357 |
| |
Name Finder | CONLL 2002 Dutch Combined nedSpanish Organization esp.testb | joern | jkosin | Precision: 0.8527980535279805 8009331259720062 |
| Name Finder | CONLL 2002 Spanish Person esp.testa | joern |
| 7669396872673119 | Precision: 0.8982630272952854 8036277602523659 |
| |
Name Finder | CONLL 2002 Spanish Person esp.testb | joern | Location esp.testa | jkosin | Precision: 0.7481789802289281 | Precision: 0.9008 7743016759776536 |
| ||||||
Name Finder | CONLL 2002 Spanish Organization Location esp.testa testb | joern jkosin |
| Precision: 0.8216258879242304 8226221079691517 |
| Name Finder | CONLL 2002 Spanish Organization esp.testb | joern | 6874328678839956 | Precision: 0.8009331259720062 8301886792452831 |
| ||
Name Finder | CONLL 2002 Spanish Location Misc esp.testa | joern | jkosin | Precision: 0.7481789802289281 6446886446886447 |
| Name Finder | CONLL 2002 Spanish Location esp.testb | joern |
| 49025069637883006 | Precision: 0.8226221079691517 6492890995260664 |
| |
Name Finder | CONLL 2002 Spanish Misc esp.testa testb | joern jkosin |
| Precision: 0.6446886446886447 6595744680851063 |
| Name Finder | CONLL 2002 Spanish Misc esp.testb | joern | 4705882352941176 | Precision: 0.6595744680851063 686046511627907 |
| ||
Name Finder | CONLL 2002 Spanish Combined esp.testa | joern | jkosin | Precision: 0.8982630272952854 | Precision: 0.8982630272952854 7005423249233671 | 1000 iterations | |||||||
Name Finder | CONLL 2002 Spanish Combined esp.testb | joern | jkosin | Precision: 0.9008 | Precision: 0.9008 756635931824532 | 1000 iterations | |||||||
Name Finder | CONLL 2003 English Person eng.testa | jkosin | Precision: 00.901992661721591 9352201257861635 | Precision: 0.9352201257861635 9523195876288659 |
| ||||||||
Name Finder | CONLL 2003 English Person eng.testb | jkosin | Precision: 00.8977988745723299 8873546511627907 | Precision: 0.8873546511627907 9391727493917275 |
| ||||||||
Name Finder | CONLL 2003 English Organization eng.testa | jkosin | Precision: 00.8290322580645161 8528584817244611 | Precision: 0.8528584817244611 8768046198267565 |
| ||||||||
Name Finder | CONLL 2003 English Organization eng.testb | jkosin | Precision: 00.818058934847256 8263422818791947 | Precision: 0.8263422818791947 8435980551053485 |
| ||||||||
Name Finder | CONLL 2003 English Location eng.testa | jkosin | Precision: 00.9584186939820742 9283837056504599 | Precision: 0.9283837056504599 9361421988150099 |
| ||||||||
Name Finder | CONLL 2003 English Location eng.testb | jkosin | Precision: 00.9485177151120753 9156180606957809 | Precision: 0.9156180606957809 9206349206349206 |
| ||||||||
Name Finder | CONLL 2003 English Misc eng.testa | jkosin | Precision: 00.8492613111726685 8539007092198582 | Precision: 0.8539007092198582 9027982326951399 |
| ||||||||
Name Finder | CONLL 2003 English Misc eng.testb | jkosin | Precision: 00.8979300499643112 8599137931034483 | Precision: 0.8599137931034483 8592436974789915 |
| ||||||||
Name Finder | CONLL 2003 English Combined eng.testa | jkosin | Precision: 0.8230655223984119 8601818493738206 | Precision: 0.8601818493738206 861812521618817 | 1000 iterations | ||||||||
Name Finder | CONLL 2003 English Combined eng.testb | jkosin | Precision: 0.7849405582672956 8036415565869333 | Precision: 0.8036415565869333 8041311831853597 | 1000 iterations | ||||||||
Name Finder | CONLL 2003 German Person deu.testa | joern | Precision: 0.8272041489863272 8602620087336245 | Precision: 0.8602620087336245 9132653061224489 |
| ||||||||
Name Finder | CONLL 2003 German Person deu.testb | joern | Precision: 0.7535042735042735 878 | Precision: 0.878 8732106339468303 |
| ||||||||
Name Finder | CONLL 2003 German Organization deu.testa | joern | Precision: 0.6615148726058698 8365695792880259 | Precision: 0.8365695792880259 8407224958949097 |
| ||||||||
Name Finder | CONLL 2003 German Organization deu.testb | joern | Precision: 0.690884820747521 7942583732057417 | Precision: 0.7942583732057417 8014705882352942 |
| ||||||||
Name Finder | CONLL 2003 German Location deu.testa | joern | Precision: 0.8779137529137528 7362637362637363 | Precision: 0.7362637362637363 7816326530612245 |
| ||||||||
Name Finder | CONLL 2003 German Location deu.testb | joern | Precision: 0.741636798088411 75 | Precision: 0.75 8033826638477801 |
| ||||||||
Name Finder | CONLL 2003 German Misc deu.testa | joern | Precision: 0.8151658767772512 7213930348258707 | Precision: 0.7213930348258707 7055555555555556 |
| ||||||||
Name Finder | CONLL 2003 German Misc deu.testb | joern | Precision: 0.8125 6198830409356725 | Precision: 0.6198830409356725 6601307189542484 |
| ||||||||
Name Finder | CONLL 2003 German Combined deu.testa | joern | Precision: 0.6622805891862553 7675205413243112 | Precision: 0.7675205413243112 7718859429714857 |
| ||||||||
Name Finder | CONLL 2003 German Combined deu.testb | joern | Precision: 0.6632526799570968 7553418803418803 | Precision: 0.7553418803418803 7467566165023353 |
| ||||||||
POS Tagger | CONLL 2006 Danish | joern | Accuracy: 0.9511278195488722 | Accuracy: 0.9511278195488722 |
| ||||||||
POS Tagger | CONLL 2006 Dutch | joern | Accuracy: 0.9324977618621307 | Accuracy: 0.9324977618621307 |
| ||||||||
POS Tagger | CONLL 2006 Portuguese | joern | Accuracy: 0.9659110277825124 | Accuracy: 0.9659110277825124 |
| ||||||||
POS Tagger | CONLL 2006 Swedish | joern | Accuracy: 0.9275106082036775 | Accuracy: 0.9275106082036775 |
| ||||||||
Chunker | CONLL 2000 | colen | Precision: 0.9255923572240226 | Precision: 0.9257575757575758 | Perf change due to OPENNLP-242 | ||||||||
Chunker | Arvores Deitadas | colen |
| Precision: 0.9406086044071353 9413606010016694 | Precision: 0.9385404669668097 AD format for Chunker was not available for 1.5.0 9403445830378374 | Perf change due to OPENNLP-242 and OPENNLP-186 |
The results of the tagging performance might differ compared to the
1.5.0 release since a precision bug in the calculation of the score has been fixed:
https://issues.apache.org/jira/browse/OPENNLP-59
A problem was corrected for the CoNLL 02 data being improperly converted to the wrong encoding.
Test UIMA Integration
The test ensures that the Analysis Engine can run and not not
crash trough simple runtime time code errors. We need to add
more sophisticated testing with the next releases.
Analysis Engine | Tester | Passed | Comment | |
---|---|---|---|---|
Sentence Detector | joern | yes | Used to process millions of news articles | |
Sentence Detector Trainer | Tommaso joern | yes | Trained and tested with cmd line tool with a UIMA pipeline | |
Tokenizer ME | joern | yes | Used to process millions of news articles | |
Tokenizer Trainer | Tommaso joern | yes | Trained and tested with cmd line tool with a UIMA pipeline | |
Name Finder | joern | yes | Used to process millions of news articles | |
Name Finder Trainer | Tommaso joern | yes | Trained and tested with cmd line tool with a UIMA pipeline | |
Chunker | joern | yes | as part of sample pear | |
Chunker Trainer |
|
|
| |
POS Tagger | joern | yes | as part of sample pear | |
POS Tagger Trainer | Tommaso | yes | Trained and tested with cmd line tool | |
Parser |
|
|
| |
createPear.sh | joern | no, retest with RC5 | yes | Test that pear is build and works. Now fixed after OPENNLP-143. |
Sample PEAR | joern | yes | installed and run over sample text |
...
Package | File or Test | Tester | Passed | Comment |
---|---|---|---|---|
Binary | LICENSE | joern | yes | AL 2.0 and BSD for JWNL |
Binary | NOTICE | joern | yes | standard notice, dates are correct. JWNL is mentioned |
Binary | README | colen, jason, james, joern | yes | File was reviewed on the dev list. |
Binary | RELEASE_NOTES.html | joern, james | yes | issue list is generated correctly |
Binary | Test signatures: .md5, .sha1, .asc | joern | yes | rc4 |
Binary | JIRA issue list created | joern | yes |
|
Binary | Contains maxent, tools, uima and jwnl jars | joern | yes | rc7 |
Source | LICENSE | joern | yes | standard AL 2.0 file |
Source | NOTICE | joern | yes | standard notice, dates are correct |
Source | Test signatures: .md5, .sha1, .asc | joern | yes | rc7 rc4 |
Source | Can build from source? | joern | yes | Test should be done without jwnl and opennlp in local m2 repo. |