...
Component | Model | Perf 1.5.1 | Perf 1.5.2 | Tester | Passed | Comment | ||||
---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin | 42186.7 sent/s |
| joern |
| no | It did not pass because of OPENNLP-202. | |||
Tokenizer | en-token.bin | 3091.8 sent/s | 2300.4 sent/s | joern | yes |
| ||||
Name Finder | en-ner-person.bin | 614.4 sent/s | 650.6 487.1 sent/s | joern | joern |
| yes | output identical, measurement was done on a idle system, | ||
POS Tagger | en-pos-maxent.bin | 732.1 sent/s | 816.9 sent/s | joern | yes |
| ||||
POS Tagger | en-pos-perceptron.bin | 1110.6 sent/s |
| joern |
| no | Perceptron normalization was changed. | |||
Chunker | en-chunker.bin | 167,3 sent/s |
| 166.4 sent/s | joern | yes colen |
| computerB, tested with CONLL2000 (2012 sentences) | ||
Parser | en | Parser | en-parser-chunking.bin | 11.6 sent/s |
| joern |
| no | A very few sentences are parsed differently due to OPENNLP-233. |
Note: Test was done on MacBook Pro 13" 7.1, 2.66 GHz Core 2 Duo, 8GB Ram, 256GB SSD running OS X 10.6.6
and Java 1.6.0_22 26 64-Bit Server.The performance varies because light weight tasks have been performed in the background while testing.
Note: computerB is a DualCore T8100, 4GB Ram, 250GB HD running Ubuntu 10.10 64-Bit and Java 1.6.0_20Note: "Concurrent" in the comment "Concurrent" in the comment means that both tests where started at the same time.
...
Component | Model | Training Time 1.5.1 | Training Time 1.5.2 | Tester | Passed | Comment | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin | 0m11.255s |
| joern |
| no | The new version is more accurate due to OPENLP-202. | ||||
Tokenizer | en-token.bin | 2m30.115s | 1m35.414s | joern | yes | Name | |||||
Finder POS Tagger | en-nerpos-datemaxent.bin |
|
| joern | yes | Test is still done, because tagdict is not tested with public data | |||||
POS Tagger |
|
| Name Finder | en-nerpos-locationperceptron.bin |
|
| joern |
|
| no | Perceptron code was changed |
Parser Name Finder | en-nerparser-moneychunking.bin | 138m9.045s |
| joern |
|
| |||||
Name Finder | en-ner-organization.bin |
|
| joern |
|
| |||||
Name Finder | en-ner-percentage.bin |
|
| joern |
|
| |||||
Name Finder | en-ner-person.bin |
|
| joern |
|
| |||||
POS Tagger | en-pos-maxent.bin |
|
| joern |
|
| |||||
POS Tagger | en-pos-perceptron.bin |
|
| joern |
|
| |||||
Chunker | en-chunker.bin |
|
|
|
| Note: Remove here, its CONLL 2000 anyway | |||||
Parser | en-parser-chunking.bin | 138m9.045s |
| joern |
|
|
Note: Time was measured with the time command, the value is the "real" time value.
Performance test with public data
Test the tagging performance with all the publicly available training
and test data for various languages.
It is assumed that the training will be done with a cutoff of 5 and 100 iterations,
if different values are used please write them into the comment.
no | There are small differences due to OPENNLP-233. |
Note: Time was measured with the time command, the value is the "real" time value.
Performance test with public data
Test the tagging performance with all the publicly available training
and test data for various languages.
It is assumed that the training will be done with a cutoff of 5 and 100 iterations,
if different values are used please write them into the comment.
Component | Data | Tester | Tagging Perf 1.5.1 | Tagging Perf 1.5.2 | Comment | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector |
|
|
|
|
| |||||||||||
Tokenizer |
|
|
|
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Person ned.testa | jkosin | Precision: 0.7906976744186046 | Precision: 0.7552941176470588 | Performance Change due to OPENNLP-294 and more... | |||||||||||
Name Finder | CONLL 2002 Dutch Person ned.testb | jkosin | Precision: 0.8527980535279805 | Precision: 0.8505025125628141 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testa | jkosin | Precision: 0.8386075949367089 | Precision: 0.8561872909698997 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testb | jkosin | Precision: 0.7784200385356455 | Precision: 0.7830374753451677 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Location ned.testa | jkosin | Precision: 0.8362831858407079 | Precision: 0.8458333333333333 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Location ned.testb | jkosin | Precision: 0.854251012145749 | Precision: 0.8816326530612245 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testa | jkosin | Precision: 0.8300492610837439 | Precision: 0.8354114713216958 |
| |||||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testb | jkosin | Precision: 0.8373205741626795 | Precision: 0.8264984227129337 |
| |||||||||||
Name Finder | CONLL 2002 Combined ned.testa | jkosin | Precision: 0.7906976744186046 | Precision: 0.6509695290858726 | 1000 iterations | |||||||||||
Name Finder | CONLL 2002 Dutch Combined ned.testb | jkosin | Precision: 0.8527980535279805 | Precision: 0.6869929337869668 | 1000 iterations | |||||||||||
Name Finder | CONLL 2002 Spanish Person esp.testa | jkosin | Precision: 0.8982630272952854 | Precision: 0.9010695187165776 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Person esp.testb | jkosin | Precision: 0.9008 | Precision: 0.9195205479452054 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Organization esp.testa | jkosin | Precision: 0.8216258879242304 | Precision: 0.8288942695722357 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Organization esp.testb | jkosin | Precision: 0.8009331259720062 | Precision: 0.8036277602523659 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Location esp.testa | jkosin | Precision: 0.7481789802289281 | Precision: 0.7743016759776536 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Location esp.testb | jkosin | Precision: 0.8226221079691517 | Precision: 0.8301886792452831 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Misc esp.testa | jkosin | Precision: 0.6446886446886447 | Precision: 0.6492890995260664 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Misc esp.testb | jkosin | Precision: 0.6595744680851063 | Precision: 0.686046511627907 |
| |||||||||||
Name Finder | CONLL 2002 Spanish Combined esp.testa | jkosin | Precision: 0.8982630272952854 | Precision: 0.7005423249233671 | 1000 iterations | |||||||||||
Name Finder | CONLL 2002 Spanish Combined esp.testb | jkosin | Precision: 0.9008 | Precision: 0.756635931824532 | 1000 iterations | |||||||||||
Name Finder | CONLL 2003 English Person eng.testa | jkosin | Precision: 0.9352201257861635 | Precision: 0.9523195876288659 |
| |||||||||||
Name Finder | CONLL 2003 English Person eng.testb | jkosin | Precision: 0.8873546511627907 | Precision: 0.9391727493917275 |
| |||||||||||
Name Finder | CONLL 2003 English Organization eng.testa | jkosin | Precision: 0.8528584817244611 | Precision: 0.8768046198267565 |
| |||||||||||
Name Finder | CONLL 2003 English Organization eng.testb | jkosin | Precision: 0.8263422818791947 | Precision: 0.8435980551053485 |
| |||||||||||
Name Finder | CONLL 2003 English Location eng.testa | jkosin | Precision: 0.9283837056504599 | Precision: 0.9361421988150099 |
| |||||||||||
Name Finder | CONLL 2003 English Location eng.testb | jkosin | Precision: 0.9156180606957809 | Precision: 0.9206349206349206 |
| |||||||||||
Name Finder | CONLL 2003 English Misc eng.testa | jkosin | Precision: 0.8539007092198582 | Precision: 0.9027982326951399 |
| |||||||||||
Name Finder | CONLL 2003 English Misc eng.testb | jkosin | Precision: 0.8599137931034483 | Precision: 0.8592436974789915 |
| |||||||||||
Name Finder | CONLL 2003 English Combined eng.testa | jkosin | Precision: 0.8601818493738206 | Precision: 0.861812521618817 | 1000 iterations | |||||||||||
Name Finder | CONLL 2003 English Combined eng.testb | jkosin | Precision: 0.8036415565869333 | Precision: 0.8041311831853597 | 1000 iterations | |||||||||||
Name Finder | CONLL 2003 German Person deu.testa | joern | Precision: 0.8602620087336245 | Precision: 0.9132653061224489 |
| |||||||||||
Name Finder | CONLL 2003 German Person deu.testb | joern | Precision: 0.878 | Precision: 0.8732106339468303 |
| |||||||||||
Name Finder | CONLL 2003 German Organization deu.testa | joern | Precision: 0.8365695792880259 | Precision: 0.8407224958949097 |
| |||||||||||
Name Finder | CONLL 2003 German Organization deu.testb | joern | Precision: 0.7942583732057417 | Precision: 0.8014705882352942 |
| |||||||||||
Name Finder | CONLL 2003 German Location deu.testa | joern | Precision: 0.7362637362637363 | Precision: 0.7816326530612245 |
| |||||||||||
Name Finder | CONLL 2003 German Location deu.testb | joern | Precision: 0.75 | Precision: 0.8033826638477801 |
| |||||||||||
Name Finder | CONLL 2003 German Misc | |||||||||||||||
Component | Data | Tester | Tagging Perf 1.5.1 | Tagging Perf 1.5.2 | Comment | |||||||||||
Sentence Detector |
| joern |
|
| Will not be done in this release. | |||||||||||
Tokenizer |
| joern |
|
| We need a de-tokenizer dictionary first, will be done in next release. | |||||||||||
Name Finder | CONLL 2002 Dutch Person ned.testa | joern | Precision: 0.7906976744186046 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Person ned.testb | joern | Precision: 0.8527980535279805 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testa | joern | Precision: 0.8386075949367089 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Organization ned.testb | joern | Precision: 0.7784200385356455 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Location ned.testa | joern | Precision: 0.8362831858407079 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Location ned.testb | joern | Precision: 0.854251012145749 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testa | joern | Precision: 0.8300492610837439 |
|
| |||||||||||
Name Finder | CONLL 2002 Dutch Misc ned.testb | joern | Precision: 0.8373205741626795 |
|
| Name Finder | CONLL 2002 Combined ned.testa | joern | Precision: 0.7906976744186046 |
| ||||||
Name Finder | CONLL 2002 Dutch Combined ned.testb | joern | Precision: 0.8527980535279805 |
|
| Name Finder | CONLL 2002 Spanish Person esp.testa | joern | Precision: 0.8982630272952854 |
| Name Finder | CONLL 2002 Spanish Person esp.testb | joern | Precision: 0.9008 |
| Name Finder | CONLL 2002 Spanish Organization esp.testa | joern | Precision: 0.8216258879242304 |
|
Name Finder | CONLL 2002 Spanish Organization esp.testb | joern | Precision: 0.8009331259720062 |
|
| Name Finder | CONLL 2002 Spanish Location esp.testa | joern | Precision: 0.7481789802289281 |
| Name Finder | CONLL 2002 Spanish Location esp.testb | joern | Precision: 0.8226221079691517 |
| Name Finder | CONLL 2002 Spanish Misc esp.testa | joern | Precision: 0.6446886446886447 |
| Name Finder | CONLL 2002 Spanish Misc esp.testb | joern | Precision: 0.6595744680851063 |
| Name Finder | CONLL 2002 Spanish Combined esp.testa | joern | Precision: 0.8982630272952854 |
| Name Finder | CONLL 2002 Spanish Combined esp.testb | joern | Precision: 0.9008 |
|
Name Finder | CONLL 2003 English Person eng.testa | jkosin | Precision: 0.9352201257861635 |
|
| |||||||||||
Name Finder | CONLL 2003 English Person eng.testb | jkosin | Precision: 0.8873546511627907 |
|
| |||||||||||
Name Finder | CONLL 2003 English Organization eng.testa | jkosin | Precision: 0.8528584817244611 |
|
| Name Finder | CONLL 2003 English Organization eng.testb | jkosin | Precision: 0.8263422818791947 |
| ||||||
Name Finder | CONLL 2003 English Location eng.testa | jkosin | Precision: 0.9283837056504599 |
|
| |||||||||||
Name Finder | CONLL 2003 English Location eng.testb | jkosin | Precision: 0.9156180606957809 |
|
| |||||||||||
Name Finder | CONLL 2003 English Misc eng.testa | jkosin | Precision: 0.8539007092198582 |
|
| |||||||||||
Name Finder | CONLL 2003 English Misc eng.testb | jkosin | Precision: 0.8599137931034483 |
|
| |||||||||||
Name Finder | CONLL 2003 English Combined eng.testa | jkosin | Precision: 0.8601818493738206 |
| 1000 iterations | |||||||||||
Name Finder | CONLL 2003 English Combined eng.testb | jkosin | Precision: 0.8036415565869333 |
| 1000 iterations | |||||||||||
Name Finder | CONLL 2003 German Person deu.testa | joern | Precision: 0.8602620087336245 |
|
| |||||||||||
Name Finder | CONLL 2003 German Person deu.testb | joern | Precision: 0.878 |
|
| |||||||||||
Name Finder | CONLL 2003 German Organization deu.testa | joern | Precision: 0.83656957928802597213930348258707 |
|
| Name Finder | CONLL 2003 German Organization deu.testb | joern | 2394715111478117 | Precision: 0.79425837320574177055555555555556 |
| |||||
Name Finder | CONLL 2003 German Location Misc deu.testa testb | joern | Precision: 0.73626373626373636198830409356725 |
|
| Name Finder | CONLL 2003 German Location deu.testb | 2520808561236623 joern | Precision: 0.75 6601307189542484 |
| ||||||
| Name Finder | CONLL 2003 German Misc Combined deu.testa | joern | Precision: 0.72139303482587077675205413243112 | Precision: 0.7718859429714857 |
| ||||||||||
Name Finder | CONLL 2003 German Misc Combined deu.testb | joern | Precision: 0.61988304093567257553418803418803 |
|
| Name Finder | CONLL 2003 German Combined deu.testa | : 0.5100090171325519 joern | Precision: 0.76752054132431127467566165023353 |
| ||||||
POS Tagger | Name Finder | CONLL 2003 German Combined deu.testb CONLL 2006 Danish | joern | Precision Accuracy: 0.75534188034188039511278195488722 Recall | Accuracy: 0.9511278195488722 |
| ||||||||||
POS Tagger | CONLL 2006 Dutch | joern | Accuracy3849714130138851 | Accuracy: 0.9324977618621307 |
| |||||||||||
POS Tagger | CONLL 2006 Danish Portuguese | joern | Accuracy: 0.9511278195488722 9659110277825124 | Accuracy: 0.9659110277825124 |
| |||||||||||
POS Tagger | CONLL 2006 Dutch Swedish | joern | Accuracy: 0.9324977618621307 9275106082036775 | Accuracy: 0.9275106082036775 |
| |||||||||||
POS Tagger Chunker | CONLL 2006 Portuguese 2000 | joern colen | Accuracy Precision: 0 0.9659110277825124 |
|
| |||||||||||
POS Tagger | CONLL 2006 Swedish | joern | Accuracy: 0.9275106082036775 |
|
| |||||||||||
9255923572240226 | Precision: 0.9257575757575758 | Perf change due to OPENNLP-242 | ||||||||||||||
Chunker | Arvores Deitadas | Chunker | CONLL 2000 | colen | Precision: 0.92559235722402269413606010016694 |
|
| Chunker | Arvores Deitadas | colen | 9396742073907428 | Precision: 0.94060860440713539403445830378374 |
|
| 9388269348910339 | Perf change due to OPENNLP-242 and OPENNLP-186 |
The results of the tagging performance might differ compared to the
1.5.0 release since a precision bug in the calculation of the score has been fixed:
https://issues.apache.org/jira/browse/OPENNLP-59
A problem was corrected for the CoNLL 02 data being improperly converted to the wrong encoding.
Test UIMA Integration
The test ensures that the Analysis Engine can run and not not
crash trough simple runtime time code errors. We need to add
more sophisticated testing with the next releases.
Analysis Engine | Tester | Passed | Comment |
---|---|---|---|
Sentence Detector | joern | yes |
|
Sentence Detector Trainer | Tommaso joern | yes | Trained and tested with cmd line tool with a UIMA pipeline |
Tokenizer ME | joern | yes |
|
Tokenizer Trainer | Tommaso joern | yes | Trained and tested with cmd line tool with a UIMA pipeline |
Name Finder | joern | yes |
|
Name Finder Trainer | Tommaso joern | yes | Trained and tested with cmd line tool with a UIMA pipeline |
Chunker | joern | yes | as part of sample pear |
Chunker Trainer |
|
|
|
POS Tagger | joern | yes | as part of sample pear |
POS Tagger Trainer | Tommaso |
| Trained and tested with cmd line tool |
Parser |
|
|
|
createPear.sh | joern | yes |
|
Sample PEAR | joern | yes | installed and run over sample text |
...
Package | File or Test | Tester | Passed | Comment | ||
---|---|---|---|---|---|---|
Binary | LICENSE | joern | yes | AL 2.0 and BSD for JWNL | ||
Binary | NOTICE | joern | yes | standard notice, dates are correct. JWNL is mentioned | ||
Binary | README |
|
| colen, jason, james, joern | yes | File was reviewed on the dev list. |
Binary | RELEASE_NOTES.html | joern, james | yes | issue list is generated correctly | ||
Binary | Test signatures: .md5, .sha1, .asc | joern | yes | rc4 | ||
Binary | JIRA issue list created | joern | yes |
| ||
Binary | Contains maxent, tools, uima and jwnl jars | joern | yes |
| ||
Source | LICENSE | joern | yes | standard AL 2.0 file | ||
Source | NOTICE | joern | yes | standard notice, dates are correct | ||
Source | Test signatures: .md5, .sha1, .asc | joern | yes | rc4 | ||
Source | Can build from source? | joern | yes | Test should be done without jwnl and opennlp in local m2 repo. |