...
Component | Model | Perf 1.5.1 | Perf 1.5.2 | Tester | Passed | Comment | ||
---|---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin | 42186.7 sent/s |
| joern |
| no | It did not pass because of OPENNLP-202. | |
Tokenizer | en-token.bin | 3091.8 sent/s | 2300.4 sent/s | joern | yes |
| ||
Name Finder | en-ner-person.bin | 614.4 sent/s | 650.6 sent/s | joern | yes | output identical, measurement was done on a idle system, | ||
POS Tagger | en-pos-maxent.bin | 732.1 sent/s | 816.9 sent/s | joern | yes |
| ||
POS Tagger | en-pos-perceptron.bin | 1110.6 sent/s |
| joern | no | Perceptron normalization was changed. | ||
Chunker | en-chunker.bin | 167,3 sent/s | 166.4 sent/s | joern | yes |
| ||
Parser | en-parser-chunking.bin | 11.6 sent/s |
| joern |
|
| no | A very few sentences are parsed differently due to OPENNLP-233. |
Note: Test Note: Test was done on MacBook Pro 13" 7.1, 2.66 GHz Core 2 Duo, 8GB Ram, 256GB SSD running OS X 10.6.6
and Java 1.6.0_26 64-Bit Server.The performance varies because light weight tasks have been performed in the background while testing.
...
Component | Model | Training Time 1.5.1 | Training Time 1.5.2 | Tester | Passed | Comment | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin | 0m11.255s |
| joern |
| no | The new version is more accurate due to OPENLP-202. | ||||
Tokenizer | en-token.bin | 2m30.115s | 1m35.414s | joern | yes | Name | |||||
Finder POS Tagger | en-nerpos-datemaxent.bin |
|
| joern | yes | Test is still done, because tagdict is not tested with public data | |||||
POS Tagger |
|
| Name Finder | en-nerpos-locationperceptron.bin |
|
| joern |
|
| no | Perceptron code was changed |
Parser Name Finder | en-nerparser-moneychunking.bin | 138m9.045s |
| joern |
|
| |||||
Name Finder | en-ner-organization.bin |
|
| joern |
|
| |||||
Name Finder | en-ner-percentage.bin |
|
| joern |
|
| |||||
Name Finder | en-ner-person.bin |
|
| joern |
|
| |||||
POS Tagger | en-pos-maxent.bin |
|
| joern |
|
| |||||
POS Tagger | en-pos-perceptron.bin |
|
| joern |
|
| |||||
Parser | en-parser-chunking.bin | 138m9.045s |
| joern |
|
|
Note: Time was measured with the time command, the value is the "real" time value.
Performance test with public data
Test the tagging performance with all the publicly available training
and test data for various languages.
It is assumed that the training will be done with a cutoff of 5 and 100 iterations,
if different values are used please write them into the comment.
no | There are small differences due to OPENNLP-233. |
Note: Time was measured with the time command, the value is the "real" time value.
Performance test with public data
Test the tagging performance with all the publicly available training
and test data for various languages.
It is assumed that the training will be done with a cutoff of 5 and 100 iterations,
if different values are used please write them into the comment.
Component | Data | Tester | Tagging Perf 1.5.1 | Tagging Perf 1.5.2 | Comment | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector |
|
|
|
|
| |||||||||||
Tokenizer |
|
|
|
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Person ned.testa | jkosin | Precision: 0.7906976744186046 | Precision: 0.7552941176470588 | Performance Change due to OPENNLP-294 and more... | |||||||||||
Name Finder | CONLL 2002 Dutch Person ned.testb | jkosin | Precision: 0.8527980535279805 | Precision: 0.8505025125628141 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testa | jkosin | Precision: 0.8386075949367089 | Precision: 0.8561872909698997 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testb | jkosin | Precision: 0.7784200385356455 | Precision: 0.7830374753451677 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Location ned.testa | jkosin | Precision: 0.8362831858407079 | Precision: 0.8458333333333333 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Location ned.testb | jkosin | Precision: 0.854251012145749 | Precision: 0.8816326530612245 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testa | jkosin | Precision: 0.8300492610837439 | Precision: 0.8354114713216958 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testb | jkosin | Precision: 0.8373205741626795 | Precision: 0.8264984227129337 |
| |||||||||||
Name Finder | CONLL 2002 Combined ned.testa | jkosin | Precision: 0.7906976744186046 | Precision: 0.6509695290858726 | 1000 iterations | |||||||||||
Name Finder | CONLL 2002 Dutch Combined ned.testb | jkosin | Precision: 0.8527980535279805 | Precision: 0.6869929337869668 | 1000 iterations | |||||||||||
Name Finder | CONLL 2002 Spanish Person esp.testa | jkosin | Precision: 0.8982630272952854 | Precision: 0.9010695187165776 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Person esp.testb | jkosin | Precision: 0.9008 | Precision: 0.9195205479452054 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Organization esp.testa | jkosin | Precision: 0.8216258879242304 | Precision: 0.8288942695722357 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Organization esp.testb | jkosin | Precision: 0.8009331259720062 | Precision: 0.8036277602523659 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Location esp.testa | jkosin | Precision: 0.7481789802289281 | Precision: 0.7743016759776536 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Location esp.testb | jkosin | Precision: 0.8226221079691517 | Precision: 0.8301886792452831 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Misc esp.testa | jkosin | Precision: 0.6446886446886447 | Precision: 0.6492890995260664 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Misc esp.testb | jkosin | Precision: 0.6595744680851063 | Precision: 0.686046511627907 | ||||||||||||
Component | Data | Tester | Tagging Perf 1.5.1 | Tagging Perf 1.5.2 | Comment | |||||||||||
Sentence Detector |
|
|
|
|
| |||||||||||
Tokenizer |
|
|
|
|
| Name Finder | CONLL 2002 Dutch Person ned.testa | Precision: 0.7906976744186046 |
|
| ||||||
Name Finder | CONLL 2002 Dutch Person ned.testb |
| Precision: 0.8527980535279805 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testa |
| Precision: 0.8386075949367089 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testb |
| Precision: 0.7784200385356455 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Location ned.testa |
| Precision: 0.8362831858407079 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Location ned.testb |
| Precision: 0.854251012145749 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testa |
| Precision: 0.8300492610837439 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testb |
| Precision: 0.8373205741626795 |
|
| |||||||||||
Name Finder | CONLL 2002 Combined ned.testa |
| Precision: 0.7906976744186046 |
| Name Finder | CONLL 2002 Dutch Combined ned.testb |
| Precision: 0.8527980535279805 |
|
| ||||||
Name Finder | CONLL 2002 Spanish Person Combined esp.testa | jkosin | Precision: 0.8982630272952854 8982630272952854 |
| Name Finder | CONLL 2002 Spanish Person esp.testb | | Precision: 0.9008 7005423249233671 | 1000 iterations | |||||||
Name Finder | CONLL 2002 Spanish Organization Combined esp.testa testb | jkosin | Precision: 0.8216258879242304 9008 |
| Name Finder | CONLL 2002 Spanish Organization esp.testb | 8279411764705882 | Precision: 0.8009331259720062 756635931824532 | 1000 iterations |
| ||||||
Name Finder | CONLL 2002 Spanish Location esp2003 English Person eng.testa | jkosin | Precision: 0.7481789802289281 9352201257861635 |
| Name Finder | CONLL 2002 Spanish Location esp.testb | 8665501165501166 | Precision: 0.8226221079691517 9523195876288659 |
| |||||||
Name Finder | CONLL 2002 Spanish Misc esp.testa 2003 English Person eng.testb | jkosin | Precision: 0.6446886446886447 8873546511627907 |
| Name Finder | CONLL 2002 Spanish Misc esp.testb 8159037754761109 | Precision: 0.6595744680851063 9391727493917275 |
| ||||||||
Name Finder | CONLL 2002 Spanish Combined esp2003 English Organization eng.testa | jkosin | Precision: 0.8982630272952854 8528584817244611 |
| Name Finder | CONLL 2002 Spanish Combined esp.testb | 7558139534883722 | Precision: 0.9008 8768046198267565 |
| |||||||
Name Finder | CONLL 2003 English Person Organization eng.testa testb | jkosin | Precision: 0.9352201257861635 8263422818791947 | Precision: 0.9365179132620993 8435980551053485 | Must be re-done for rc2! | |||||||||||
Name Finder | CONLL 2003 English Person Location eng.testb testa | jkosin | Precision: 0.8873546511627907 9283837056504599 | Precision: 0.887762490948588 9361421988150099 | Must be re-done for rc2! | |||||||||||
Name Finder | CONLL 2003 English Organization Location eng.testa testb | jkosin | Precision: 0.8528584817244611 9156180606957809 | Precision: 0.8515037593984962 9206349206349206 | Must be re-done for rc2! | |||||||||||
Name Finder | CONLL 2003 English Organization Misc eng.testb testa | jkosin | Precision: 0.8263422818791947 8539007092198582 | Precision: 0.8291873963515755 9027982326951399 | Must be re-done for rc2! | |||||||||||
Name Finder | CONLL 2003 English Location Misc eng.testa testb | jkosin | Precision: 0.9283837056504599 8599137931034483 | Precision: 0.9256625727213963 8592436974789915 | Must be re-done for rc2! | |||||||||||
Name Finder | CONLL 2003 English Location Combined eng.testb testa | jkosin | Precision: 0.9156180606957809 8601818493738206 | Precision: 0.9160475482912332 861812521618817 | 1000 iterations Must be re-done for rc2! | |||||||||||
Name Finder | CONLL 2003 English Misc Combined eng.testa testb | jkosin | Precision: 0.8539007092198582 8036415565869333 | Precision: 0.8500707213578501 8041311831853597 | 1000 iterations Must be re-done for rc2! | |||||||||||
Name Finder | CONLL 2003 English Misc eng.testb German Person deu.testa | joern jkosin | Precision: 0.8599137931034483 8602620087336245 | Precision: 0.8655097613882863 9132653061224489 | Must be re-done for rc2! | |||||||||||
Name Finder | CONLL 2003 English Combined eng.testa German Person deu.testb | joern jkosin | Precision: 0.8601818493738206 878 | Precision: 0.8650412087912088 8732106339468303 | Must be re-done for rc2! | |||||||||||
Name Finder | CONLL 2003 English Combined eng.testb German Organization deu.testa | joern jkosin | Precision: 0.8036415565869333 8365695792880259 | Precision: 0.8049519059494122 8407224958949097 | Must be re-done for rc2! | 5535135135135135 |
| |||||||||
Name Finder | CONLL 2003 German Person deu.testa Organization deu.testb | joern | Precision: 0.7942583732057417 | Precision: 0.8602620087336245 8014705882352942 |
| |||||||||||
Name Finder | CONLL 2003 German Person Location deu.testb testa | joern | Precision: 0.878 7362637362637363 |
|
| Name Finder | CONLL 2003 German Organization deu.testa |
| 4655471916618414 | Precision: 0.8365695792880259 7816326530612245 |
|
| ||||
Name Finder | CONLL 2003 German Organization Location deu.testb | joern | Precision: 0.7942583732057417 75 |
|
| Name Finder | CONLL 2003 German Location deu.testa |
| 4912280701754385 | Precision: 0.7362637362637363 8033826638477801 |
| |||||
Name Finder | CONLL 2003 German Location Misc deu.testb testa | joern | Precision: 0.75 7213930348258707 |
|
| Name Finder | CONLL 2003 German Misc deu.testa | 0.2394715111478117 | Precision: 0.7213930348258707 7055555555555556 |
|
| |||||
Name Finder | CONLL 2003 German Misc deu.testb | joern | Precision: 0.6198830409356725 | Precision: 0.6601307189542484 |
| |||||||||||
Name Finder | CONLL 2003 German Combined deu.testa | joern | Precision: 0.7675205413243112 | Precision: 0.7718859429714857 |
| |||||||||||
Name Finder | CONLL 2003 German Combined deu.testb | joern | Precision: 0.7553418803418803 | Precision: 0.7467566165023353 |
| |||||||||||
POS Tagger | CONLL 2006 Danish | joern | Accuracy: 0.9511278195488722 | Accuracy: 0.9511278195488722 |
| |||||||||||
POS Tagger | CONLL 2006 Dutch | joern | Accuracy: 0.9324977618621307 | Accuracy: 0.9324977618621307 |
| |||||||||||
POS Tagger | CONLL 2006 Portuguese | joern | Accuracy: 0.9324977618621307 9659110277825124 | Accuracy: 0.9659110277825124 |
| |||||||||||
POS Tagger | CONLL 2006 Portuguese Swedish | joern | Accuracy: 0.9659110277825124 9275106082036775 | Accuracy: 0.9275106082036775 |
| |||||||||||
POS Tagger | CONLL 2006 Swedish |
| Accuracy: 0.9275106082036775 |
|
| |||||||||||
Chunker | CONLL 2000 | colen | Precision: 0.9255923572240226 | Precision: 0.9257575757575758 | Perf change due to OPENNLP-242 | |||||||||||
Chunker | Arvores Deitadas | colen | Chunker | CONLL 2000 |
| Precision: 0.9255923572240226 9413606010016694 |
|
| Chunker | Arvores Deitadas |
| 9396742073907428 | Precision: 0.9406086044071353 9403445830378374 |
| 9388269348910339 | Perf change due to OPENNLP-242 and OPENNLP-186 |
The results of the tagging performance might differ compared to the
1.5.0 release since a precision bug in the calculation of the score has been fixed:
https://issues.apache.org/jira/browse/OPENNLP-59
The results of the tagging performance may differ compared to the 1.5.1 release, since a bug was corrected in the event filtering.
(TODO: put jira issue here) A problem was corrected for the CoNLL 02 data being improperly converted to the wrong encoding.
Test UIMA Integration
The test ensures that the Analysis Engine can run and not not
crash trough simple runtime time code errors. We need to add
more sophisticated testing with the next releases.
Analysis Engine | Tester | Passed | Comment |
---|---|---|---|
Sentence Detector | joern | yes |
|
Sentence Detector Trainer | joern | yes | Trained and tested with cmd line tool with a UIMA pipeline |
Tokenizer ME | joern | yes |
|
Tokenizer Trainer | joern | yes | Trained and tested with cmd line tool with a UIMA pipeline |
Name Finder | joern | yes |
|
Name Finder Trainer | joern | yes | Trained and tested with cmd line tool with a UIMA pipeline |
Chunker | joern | yes | as part of sample pear |
Chunker Trainer |
|
|
|
POS Tagger | joern | yes | as part of sample pear |
POS Tagger Trainer |
|
| Trained and tested with cmd line tool |
Parser |
|
|
|
createPear.sh | joern | yes |
|
Sample PEAR | joern | yes | installed and run over sample text |
...
Package | File or Test | Tester | Passed | Comment | ||
---|---|---|---|---|---|---|
Binary | LICENSE | joern | yes | AL 2.0 and BSD for JWNL | ||
Binary | NOTICE | joern | yes | standard notice, dates are correct. JWNL is mentioned | ||
Binary | README | colen, jason, james, joern | yes | File was reviewed on the dev list. | ||
Binary | RELEASE_NOTES.html | joern, james | yes | issue list is generated correctly | ||
Binary | Test signatures: .md5, .sha1, .asc | joern | yes | rc4 | ||
Binary | JIRA issue list created | joern | no | yes |
| |
Binary | Contains maxent, tools, uima and jwnl jars | joern | yes | generation failed! | ||
Source | LICENSE | joern | yes | standard AL 2.0 file | ||
Source | NOTICE | joern | yes | standard notice, dates are correct | ||
Source | Test signatures: .md5, .sha1, .asc | joern | yes | rc4 | ||
Source | Can build from source? | joern | yes | Test should be done without jwnl and opennlp in local m2 repo. |