...
Component | Model | Perf 1.5.1 | Perf 1.5.2 | Tester | Passed | Comment |
---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin | 42186.7 sent/s |
| joern | no | It is assumed it did not pass because of OPENNLP-202. |
Tokenizer | en-token.bin | 3091.8 sent/s | 2300.4 sent/s | joern | yes |
|
Name Finder | en-ner-person.bin | 614.4 sent/s | 650.6 sent/s | joern | yes | output identical, measurement was done on a idle system, |
POS Tagger | en-pos-maxent.bin | 732.1 sent/s | 816.9 sent/s | joern | yes |
|
POS Tagger | en-pos-perceptron.bin | 1110.6 sent/s |
| joern | no | Perceptron normalization was changed. |
Chunker | en-chunker.bin | 167,3 sent/s | 166.4 sent/s | joern | yes |
|
Parser | en-parser-chunking.bin | 11.6 sent/s |
| joern | no Could be a regression, reason must be identified! | A very few sentences are parsed differently due to OPENNLP-233. |
Note: Test was done on MacBook Pro 13" 7.1, 2.66 GHz Core 2 Duo, 8GB Ram, 256GB SSD running OS X 10.6.6
and Java 1.6.0_26 64-Bit Server.The performance varies because light weight tasks have been performed in the background while testing.
...
Component | Model | Training Time 1.5.1 | Training Time 1.5.2 | Tester | Passed | Comment | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin | 0m11.255s |
| joern |
|
| no | The new version is more accurate due to OPENLP-202. | |||||||||
Tokenizer | en- | Tokenizer | en-token.bin | 2m30.115s |
| joern |
|
| Name Finder | en-ner-person.bin |
|
| joern | 1m35.414s | joern | yes |
|
POS Tagger | en-pos-maxent.bin |
|
| joern |
| yes | Test is still done, because tagdict is not tested with public data | ||||||||||
POS Tagger | en-pos-perceptron.bin |
|
| joern |
| no | Perceptron code was changed | ||||||||||
Parser | en-parser-chunking.bin | 138m9.045s |
| joern |
| no | There are small differences due to OPENNLP-233. |
Note: Time was measured with the time command, the value is the "real" time value.
...
Component | Data | Tester | Tagging Perf 1.5.1 | Tagging Perf 1.5.2 | Comment | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector |
|
|
|
|
| |||||||||
Tokenizer |
|
|
|
|
| |||||||||
Name Finder | CONLL 2002 Dutch Person ned.testa | jkosin | Precision: 0.7906976744186046 |
|
| Name Finder | CONLL 2002 Dutch Person ned.testb |
| Precision: 0.8527980535279805 7552941176470588 | Performance Change due to OPENNLP-294 and more... | ||||
|
| Name Finder | CONLL 2002 Dutch Organization Person ned.testa testb | jkosin | Precision: 0.8386075949367089 8527980535279805 |
|
| Name Finder | CONLL 2002 Dutch Organization ned.testb |
| 7302083333333333 | Precision: 0.7784200385356455 8505025125628141 |
|
|
Name Finder | CONLL 2002 Dutch Location Organization ned.testa | jkosin | Precision: 0.8362831858407079 8386075949367089 |
| 5289421157684631 | Precision: 0.8561872909698997 |
| |||||||
Name Finder | CONLL 2002 Dutch Location nedOrganization ned.testb | jkosin | Precision: 0.854251012145749 7784200385356455 |
|
| Name Finder | CONLL 2002 Dutch Misc ned.testa |
| 5767309064953604 | Precision: 0.8300492610837439 7830374753451677 |
| |||
Name Finder | CONLL 2002 Dutch Misc Location ned.testb testa | jkosin | Precision: 0.8373205741626795 8362831858407079 |
|
| Name Finder | CONLL 2002 Combined ned.testa | 5361702127659574 | Precision: 0.7906976744186046 8458333333333333 |
| ||||
Name Finder | CONLL 2002 Dutch Combined Location ned.testb | jkosin | Precision: 0.8527980535279805 854251012145749 |
|
| Name Finder | CONLL 2002 Spanish Person esp.testa |
| 665615141955836 | Precision: 0.8982630272952854 8816326530612245 |
| |||
Name Finder | CONLL 2002 Spanish Person esp.testb | jkosin | Precision: 0.9008 8300492610837439 |
| Name Finder | CONLL 2002 Spanish Organization esp.testa | 5840554592720971 | Precision: 0.8216258879242304 8354114713216958 |
| |||||
Name Finder | CONLL 2002 Spanish Organization espDutch Misc ned.testb | jkosin | Precision: 0.8009331259720062 8373205741626795 |
|
| 5788313120176405 | Precision: 0.8264984227129337 |
| ||||||
Name Finder | CONLL 2002 Spanish Location espCombined ned.testa | jkosin | Precision: 0.7481789802289281 7906976744186046 |
| 6001765225066196 | Precision: 0.6509695290858726 | 1000 iterations | |||||||
Name Finder | CONLL 2002 Dutch Combined ned.testb | jkosin | Precision: 0.8527980535279805 | Precision: 0.6869929337869668 | 1000 iterations | |||||||||
Name Finder | CONLL 2002 Spanish Person esp.testa | jkosin | Precision: 0.8982630272952854 | Precision: 0.9010695187165776 |
| |||||||||
Name Finder | CONLL 2002 Spanish Person esp.testb | jkosin | Precision: 0.9008 | Precision: 0.9195205479452054 |
| |||||||||
Name Finder | CONLL 2002 Spanish Organization esp.testa | jkosin | Precision: 0.8216258879242304 | Precision: 0.8288942695722357 |
| |||||||||
Name Finder | CONLL 2002 Spanish Organization esp.testb | jkosin | Precision: 0.8009331259720062 | Precision: 0.8036277602523659 |
| |||||||||
Name Finder | CONLL 2002 Spanish Location esp.testa | jkosin | Precision: 0.7481789802289281 | Precision: 0.7743016759776536 |
| |||||||||
Name Finder | CONLL 2002 Spanish Location esp.testb | jkosin | Precision: 0.8226221079691517 | Precision: 0.8301886792452831 |
| |||||||||
Name Finder | CONLL 2002 Spanish Misc esp.testa | jkosin | Precision: 0.6446886446886447 | Precision: 0.6492890995260664 |
| |||||||||
Name Finder | CONLL 2002 Spanish Misc esp.testb | jkosin | Precision: 0.6595744680851063 | Precision: 0.686046511627907 |
| |||||||||
Name Finder | CONLL 2002 Spanish Combined esp.testa | jkosin | Precision: 0.8982630272952854 | Precision: 0.7005423249233671 | 1000 iterations | |||||||||
Name Finder | CONLL 2002 Spanish Combined esp.testb | jkosin | Precision: 0.9008 | Precision: 0.756635931824532 | 1000 iterations | |||||||||
Name Finder | CONLL 2003 English Person eng.testa | jkosin | Precision: 0.9352201257861635 | Precision: 0.9523195876288659 |
| |||||||||
Name Finder | CONLL 2003 English Person eng.testb | jkosin | Precision: 0.8873546511627907 | Precision: 0.9391727493917275 | Name Finder | CONLL 2002 Spanish Location esp.testb | Precision: 0.8226221079691517 |
| ||||||
Name Finder | CONLL 2002 Spanish Misc esp2003 English Organization eng.testa | jkosin | Precision: 0.6446886446886447 8528584817244611 |
| Name Finder | CONLL 2002 Spanish Misc esp.testb | 7558139534883722 | Precision: 0.6595744680851063 8768046198267565 |
| |||||
Name Finder | CONLL 2002 Spanish Combined esp.testa 2003 English Organization eng.testb | jkosin | Precision: 0.8982630272952854 8263422818791947 |
| Name Finder | CONLL 2002 Spanish Combined esp.testb 6905012267788293 | Precision: 0.9008 8435980551053485 |
| ||||||
Name Finder | CONLL 2003 English Person Location eng.testa | jkosin | Precision: 0.9352201257861635 9283837056504599 | Precision: 0.9523195876288659 9361421988150099 | Performance Change due to OPENNLP-294 and more... | |||||||||
Name Finder | CONLL 2003 English Person Location eng.testb | jkosin | Precision: 0.8873546511627907 9156180606957809 | Precision: 0.9391727493917275 9206349206349206 |
| |||||||||
Name Finder | CONLL 2003 English Organization Misc eng.testa | jkosin | Precision: 0.8528584817244611 8539007092198582 | Precision: 0.8768046198267565 9027982326951399 |
| |||||||||
Name Finder | CONLL 2003 English Organization Misc eng.testb | jkosin | Precision: 0.8263422818791947 8599137931034483 | Precision: 0.8435980551053485 8592436974789915 |
| |||||||||
Name Finder | CONLL 2003 English Location Combined eng.testa | jkosin | Precision: 0.9283837056504599 8601818493738206 | Precision: 0.9361421988150099 861812521618817 | 1000 iterations | |||||||||
Name Finder | CONLL 2003 English Location Combined eng.testb | jkosin | Precision: 0.9156180606957809 8036415565869333 | Precision: 0.9206349206349206 8041311831853597 | 1000 iterations | |||||||||
Name Finder | CONLL 2003 English Misc engGerman Person deu.testa | jkosin joern | Precision: 0.8539007092198582 8602620087336245 | Precision: 0.9027982326951399 9132653061224489 |
| |||||||||
Name Finder | CONLL 2003 English Misc engGerman Person deu.testb | jkosin joern | Precision: 0.8599137931034483 878 | Precision: 0.8592436974789915 8732106339468303 |
| |||||||||
Name Finder | CONLL 2003 English Combined engGerman Organization deu.testa | jkosin joern | Precision: 0.8601818493738206 8365695792880259 | Precision: 0.861812521618817 8407224958949097 | 1000 iterations | |||||||||
Name Finder | CONLL 2003 English Combined engGerman Organization deu.testb | jkosin joern | Precision: 0.8036415565869333 7942583732057417 | Precision: 0.8041311831853597 8014705882352942 | 1000 iterations | |||||||||
Name Finder | CONLL 2003 German Person Location deu.testa | joern joern | Precision: 0.8602620087336245 |
|
| Name Finder | CONLL 2003 German Person deu.testb | 7362637362637363 | Precision: 0.878 7816326530612245 |
| ||||
| Name Finder | CONLL 2003 German Organization Location deu.testa testb | joern joern | Precision: 0.8365695792880259 75 |
|
| Name Finder | CONLL 2003 German Organization deu.testb | joern joern | 4912280701754385 | Precision: 0.7942583732057417 8033826638477801 |
| ||
Name Finder | CONLL 2003 German Location Misc deu.testa | joern joern | Precision: 0.7362637362637363 7213930348258707 |
|
| Name Finder | CONLL 2003 German Location deu.testb | joern joern | 2394715111478117 | Precision: 0.75 7055555555555556 |
| |||
Name Finder | CONLL 2003 German Misc deu.testa testb | joern joern | Precision: 0.7213930348258707 6198830409356725 |
|
| 0.2520808561236623 | Name Finder | CONLL 2003 German Misc deu.testb | joern joern | Precision: 0.6198830409356725 6601307189542484 |
| |||
Name Finder | CONLL 2003 German Combined deu.testa | joern joern | Precision: 0.7675205413243112 | Precision: 0.7718859429714857 |
| |||||||||
Name Finder | CONLL 2003 German Combined deu.testb | joern joern | Precision: 0.7553418803418803 | Precision: 0.7553418803418803 7467566165023353 |
| |||||||||
POS Tagger | CONLL 2006 Danish | joern joern | Accuracy: 0.9511278195488722 | Accuracy: 0.9511278195488722 |
| |||||||||
POS Tagger | CONLL 2006 Dutch | joern joern | Accuracy: 0.9324977618621307 | Accuracy: 0.9324977618621307 |
| |||||||||
POS Tagger | CONLL 2006 Portuguese | joern joern | Accuracy: 0.9659110277825124 | Accuracy: 0.9659110277825124 |
| |||||||||
POS Tagger | CONLL 2006 Swedish | joern joern | Accuracy: 0.9275106082036775 | Accuracy: 0.9275106082036775 |
| |||||||||
Chunker | CONLL 2000 | colen colen | Precision: 0.9255923572240226 | Precision: 0.9257575757575758 | Perf change due to OPENNLP-242 | |||||||||
Chunker | Arvores Deitadas | colen colen | Precision: 0.9406086044071353 9413606010016694 |
| 9396742073907428 | Precision: 0.9403445830378374 | Perf change due to OPENNLP-242 and OPENNLP-186 |
The results of the tagging performance might differ compared to the
1.5.0 release since a precision bug in the calculation of the score has been fixed:
https://issues.apache.org/jira/browse/OPENNLP-59
The results of the tagging performance may differ compared to the 1.5.1 release, since a bug was corrected in the event filtering.
(TODO: put jira issue here) A problem was corrected for the CoNLL 02 data being improperly converted to the wrong encoding.
Test UIMA Integration
The test ensures that the Analysis Engine can run and not not
crash trough simple runtime time code errors. We need to add
more sophisticated testing with the next releases.
Analysis Engine | Tester | Passed | Comment |
---|---|---|---|
Sentence Detector | joern | yes |
|
Sentence Detector Trainer | joern | yes | Trained and tested with cmd line tool with a UIMA pipeline |
Tokenizer ME | joern | yes |
|
Tokenizer Trainer | joern | yes | Trained and tested with cmd line tool with a UIMA pipeline |
Name Finder | joern | yes |
|
Name Finder Trainer | joern | yes | Trained and tested with cmd line tool with a UIMA pipeline |
Chunker | joern | yes | as part of sample pear |
Chunker Trainer |
|
|
|
POS Tagger | joern | yes | as part of sample pear |
POS Tagger Trainer |
|
| Trained and tested with cmd line tool |
Parser |
|
|
|
createPear.sh | joern | yes |
|
Sample PEAR | joern | yes | installed and run over sample text |
...
Package | File or Test | Tester | Passed | Comment | ||
---|---|---|---|---|---|---|
Binary | LICENSE | joern | yes | AL 2.0 and BSD for JWNL | ||
Binary | NOTICE | joern | yes | standard notice, dates are correct. JWNL is mentioned | ||
Binary | README | colen, jason, james, joern | yes | File was reviewed on the dev list. | ||
Binary | RELEASE_NOTES.html | joern, james | yes | issue list is generated correctly | ||
Binary | Test signatures: .md5, .sha1, .asc | joern | yes | rc4 | ||
Binary | JIRA issue list created | joern | no | yes |
| |
Binary | Contains maxent, tools, uima and jwnl jars | joern | yes | generation failed! | ||
Source | LICENSE | joern | yes | standard AL 2.0 file | ||
Source | NOTICE | joern | yes | standard notice, dates are correct | ||
Source | Test signatures: .md5, .sha1, .asc | joern | yes | rc4 | ||
Source | Can build from source? | joern | yes | Test should be done without jwnl and opennlp in local m2 repo. |