...
This page contains the test plan for the 1.5.2 release.
The 1.5.1 2 release does not introduce any changes to the feature
generation expect for the name finder which might generate different
token class features for words with special letters.
...
The 1.5.0 SourceForge models must be fully compatible with the 1.5.2
release. In this test all the English models are tested for compatibility
on the English 300k sentences Leipzig Corpus. It is tested that
the output produced with the same model by both versions has the same md5 hash.
Component | Model | Perf 1.5.01 | Perf 1.5.12 | Tester | Passed | Comment | ||
---|---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin | 42565.4 sent/s | 42186.7 sent/s |
| joern | yes |
| |
Tokenizer | en-token.bin 3059.5 sent/s | 3091.8 sent/s |
| joern | yes |
| ||
Name Finder | en-ner-person.bin | 290.7 sent/s | 487.1 sent/s |
| joern | no |
| OPENNLP-138, feature-gen fix |
POS Tagger | en-pos-maxent.bin 721.3 sent/s | 732.1 sent/s |
| joern | yes |
| ||
POS Tagger | en-pos-perceptron.bin | 1097.7 sent/s | 1110.6 sent/s |
| joern |
| OPENNLP-155 might improve accuracy a little | |
Chunker | en-chunker.bin 169,5 sent/s | 167,3 sent/s |
| colen | yes | computerB, tested with CONLL2000 (2012 sentences) | ||
Parser | en-parser-chunking.bin | 4.3 sent/s | 11.6 sent/s |
| joern | yes |
| Macbook was sleeping a little while doing 1.5.0 |
Note: Test was done on MacBook Pro 13" 7.1, 2.66 GHz Core 2 Duo, 8GB Ram, 256GB SSD running OS X 10.6.6
and Java 1.6.0_22 64-Bit Server.The performance varies because light weight tasks have been performed in the background while testing.
...
Component | Model | Training Time 1.5.01 | Training Time 1.5.12 | Tester | Passed | Comment | ||
---|---|---|---|---|---|---|---|---|
Sentence Detector | en-sent.bin | 0m12.847s | 0m11.255s |
| joern | yes |
| |
Tokenizer | en-token.bin | 2m16.694s | 2m30.115s |
| joern | yes |
| Re-test tagging was very slow, only 250 sent/s |
Name Finder | en-ner-date.bin |
|
| joern | no | OPENNLP-138 | ||
Name Finder | en-ner-location.bin |
|
| joern | no | OPENNLP-138 | ||
Name Finder | en-ner-money.bin |
|
| joern | no | OPENNLP-138 | ||
Name Finder | en-ner-organization.bin |
|
| joern | no | OPENNLP-138 | ||
Name Finder | en-ner-percentage.bin |
|
| joern | no | OPENNLP-138 | ||
Name Finder | en-ner-person.bin |
|
| joern | no | OPENNLP-138 | ||
POS Tagger | en-pos-maxent.bin |
|
| joern |
|
| ||
POS Tagger | en-pos-perceptron.bin |
|
| joern |
|
| ||
Chunker | en-chunker.bin |
|
| joern |
| Note: Remove here, its CONLL 2000 anyway | ||
Parser | en-parser-chunking.bin 110m8.712s | 138m9.045s |
| joern | yes |
|
Note: Time was measured with the time command, the value is the "real" time value.
...
Component | Data | Tester | Tagging Perf 1.5.0 1 | Tagging Perf 1.5.1 2 | Comment | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Sentence Detector |
| joern |
|
| Will not be done in this release. | ||||||
Tokenizer |
| joern |
|
| We need a de-tokenizer dictionary first, will be done in next release. | ||||||
Name Finder | CONLL 2002 Dutch Person ned.testa | joern | Precision: 0.7906976744186046 |
|
| ||||||
Name Finder | CONLL 2002 Dutch Person ned.testb | joern | Precision: 0.8527980535279805 |
|
| ||||||
Name Finder | CONLL 2002 Dutch Organization ned.testa | joern | Precision: 0.8386075949367089 |
|
| ||||||
Name Finder | CONLL 2002 Dutch Organization ned.testb | joern |
| Precision: 0.7784200385356455 |
|
| |||||
Name Finder | CONLL 2002 Dutch Location ned.testa | joern | Precision: 0.8362831858407079 |
|
| ||||||
Name Finder | CONLL 2002 Dutch Location ned.testb | joern | Precision: 0.854251012145749 854251012145749 |
|
| ||||||
Name Finder | CONLL 2002 Dutch Misc ned.testa | joern |
| Precision: 0.8300492610837439 |
|
| |||||
Name Finder | CONLL 2002 Dutch Misc ned.testb | joern | Precision: 0.8373205741626795 |
|
| ||||||
Name Finder | CONLL 2002 Combined ned.testa | joern | Precision: 0.7906976744186046 |
| |||||||
Name Finder | CONLL 2002 Dutch Combined ned.testb | joern | Precision: 0.8527980535279805 |
|
| ||||||
Name Finder | CONLL 2002 Spanish Person esp.testa | joern |
| Precision: 0.8982630272952854 |
| ||||||
Name Finder | CONLL 2002 Spanish Person esp.testb | joern | Precision: 0.9008 9008 |
| |||||||
Name Finder | CONLL 2002 Spanish Organization esp.testa | joern | Precision: 0.8216258879242304 |
| |||||||
Name Finder | CONLL 2002 Spanish Organization esp.testb | joern |
| Precision: 0.8009331259720062 |
|
| |||||
Name Finder | CONLL 2002 Spanish Location esp.testa | joern | Precision: 0.7481789802289281 |
| |||||||
Name Finder | CONLL 2002 Spanish Location esp.testb | joern |
| Precision: 0.8226221079691517 |
| ||||||
Name Finder | CONLL 2002 Spanish Misc esp.testa | joern |
| Precision: 0.6446886446886447 |
| ||||||
Name Finder | CONLL 2002 Spanish Misc esp.testb | joern |
| Precision: 0.6595744680851063 |
| ||||||
Name Finder | CONLL 2002 Spanish Combined esp.testa | joern |
| Precision: 0.8982630272952854 8982630272952854 |
| ||||||
Name Finder | CONLL 2002 Spanish Combined esp.testb | joern |
| Precision: 0.9008 9008 |
| ||||||
Name Finder | CONLL 2003 English Person eng.testa | jkosin Precision: 0.901992661721591 | Precision: 0.9352201257861635 |
|
| ||||||
Name Finder | CONLL 2003 English Person eng.testb | jkosin | Precision: 0.8977988745723299 |
|
| ||||||
Name Finder | CONLL 2003 English Organization eng.testa | jkosin | Precision: 0.8290322580645161 |
|
| ||||||
Name Finder | CONLL 2003 English Organization eng.testb | jkosin | Precision: 00.818058934847256 8263422818791947 |
| |||||||
Name Finder | CONLL 2003 English Location eng.testa | jkosin | Precision: 0.9584186939820742 |
|
| ||||||
Name Finder | CONLL 2003 English Location eng.testb | jkosin | Precision: 00.9485177151120753 9156180606957809 | Precision: 0.9156180606957809 | 8194766478966545 |
|
| ||||
Name Finder | CONLL 2003 English Misc eng.testa | jkosin | Precision: 0.8492613111726685 |
|
| ||||||
Name Finder | CONLL 2003 English Misc eng.testb | jkosin | Precision: 00.8979300499643112 8599137931034483 | Precision: 0.8599137931034483 | 6843910806174958 |
|
| ||||
Name Finder | CONLL 2003 English Combined eng.testa | jkosin | Precision: 0.8230655223984119 |
| 1000 iterations | ||||||
Name Finder | CONLL 2003 English Combined eng.testb | jkosin | Precision: 0.7849405582672956 8036415565869333 | Precision: 0.8036415565869333 | 8003555555555556 |
| 1000 1000 iterations | ||||
Name Finder | CONLL 2003 German Person deu.testa | joern | Precision: 0.8272041489863272 8602620087336245 |
|
| ||||||
Name Finder | CONLL 2003 German Person deu.testb | joern | Precision: 0.8602620087336245 878 |
|
| ||||||
Name Finder | CONLL 2003 German Person Organization deu.testb testa | joern | Precision: 0.7535042735042735 8365695792880259 |
|
| ||||||
Name Finder | CONLL 2003 German Organization deu.testb | joern | Precision: 0.878 7942583732057417 |
|
| ||||||
Name Finder | CONLL 2003 German Organization Location deu.testa | joern | Precision: 0.6615148726058698 7362637362637363 |
|
| ||||||
Name Finder | CONLL 2003 German Organization Location deu.testb | joern | Precision: 0.690884820747521 75 |
|
| ||||||
Name Finder | CONLL 2003 German Location Misc deu.testa | joern | Precision: 0.8779137529137528 7213930348258707 |
|
| ||||||
Name Finder | CONLL 2003 German Location Misc deu.testb | joern | Precision: 0.741636798088411 6198830409356725 |
|
| ||||||
Name Finder | CONLL 2003 German Misc Combined deu.testa | joern | Precision: 0.8151658767772512 7675205413243112 |
|
| ||||||
Name Finder | CONLL 2003 German Misc Combined deu.testb | joern | Precision: 0.8125 7553418803418803 |
|
| ||||||
POS Tagger | CONLL 2006 Danish | joern | Accuracy: 0.1582089552238806 |
| Name | ||||||
Finder POS Tagger | CONLL 2003 German Combined deu.testa 2006 Dutch | joern | Precision Accuracy: 0.6622805891862553 | Precision: 0.7675205413243112 |
| ||||||
Name Finder | CONLL 2003 German Combined deu.testb | joern | Precision: 0.6632526799570968 | Precision: 0.7553418803418803 |
| ||||||
POS Tagger | CONLL 2006 Danish | joern | Accuracy: 0.9511278195488722 | Accuracy: 0.9511278195488722 |
| ||||||
POS Tagger | CONLL 2006 Dutch | joern | Accuracy: 0.9324977618621307 | Accuracy: 0.9324977618621307 |
| ||||||
POS Tagger | CONLL 2006 Portuguese | joern | Accuracy: 0.9659110277825124 | Accuracy: 0.9659110277825124 |
| ||||||
9324977618621307 |
|
| |||||||||
POS Tagger | CONLL 2006 Portuguese | joern | Accuracy: 0.9659110277825124 |
|
| ||||||
POS Tagger | CONLL 2006 Swedish | joern | Accuracy: 0.9275106082036775 |
|
| POS Tagger | CONLL 2006 Swedish | joern | Accuracy: 0.9275106082036775 | Accuracy: 0.9275106082036775 |
|
Chunker | CONLL 2000 | colen | Precision: 0.9255923572240226 |
| Evaluator was not available in 1.5.0. To evaluate if something changed I compared the output of 1.5.0 and 1.5.1. The output changed a little because of a bug fixed in 1.5.1 (missing trailing closing bracket) | ||||||
Chunker | Arvores Deitadas | colen |
| Precision: 0.9406086044071353 |
| AD format for Chunker was not available for 1.5.0 |
The results of the tagging performance might differ compared to the
1.5.0 release since a precision bug in the calculation of the score has been fixed:
https://issues.apache.org/jira/browse/OPENNLP-59
...
Analysis Engine | Tester | Passed | Comment | |
---|---|---|---|---|
Sentence Detector | joern | yes |
| Used to process millions of news articles |
Sentence Detector Trainer | Tommaso | yes | Trained and tested with cmd line tool | |
Tokenizer ME | joern | yes |
| Used to process millions of news articles |
Tokenizer Trainer | Tommaso |
| Trained and tested with cmd line tool | |
Name Finder | joern | yes |
| Used to process millions of news articles |
Name Finder Trainer | Tommaso | yes | Trained and tested with cmd line tool | |
Chunker | joern | yes | as part of sample pear | |
Chunker Trainer |
|
|
| |
POS Tagger | joern | yes | as part of sample pear | |
POS Tagger Trainer | Tommaso | yes | Trained and tested with cmd line tool | |
Parser |
|
|
| |
createPear.sh | joern | no, retest with RC5 | Test that pear is build and works. Now fixed after OPENNLP-143. | |
| ||||
createPear.sh | joern |
|
| |
Sample PEAR | joern | yes | installed and run over sample text |
...
Package | File or Test | Tester | Passed | Comment |
---|---|---|---|---|
Binary | LICENSE | joern | yes | AL 2.0 and BSD for JWNL |
Binary | NOTICE | joern | yes | standard notice, dates are correct |
Binary | README | joern | yes |
|
Binary | RELEASE_NOTES.html | joern | yes | issue list is generated correctly |
Binary | Test signatures: .md5, .sha1, .asc | joern | yes | rc7 |
Source | LICENSE | joern | yes | standard AL 2.0 file |
Source | NOTICE | joern | yes | standard notice, dates are correct |
Source | Test signatures: .md5, .sha1, .asc | joern | yes | rc7 |
Source | Can build from source? | joern | yes | Test should be done without jwnl and opennlp in local m2 repo. |