Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page contains the test plan for the 1.5.2 release.

The 1.5.1 2 release does not introduce any changes to the feature
generation expect for the name finder which might generate different
token class features for words with special letters.

...

The 1.5.0 SourceForge models must be fully compatible with the 1.5.2
release. In this test all the English models are tested for compatibility
on the English 300k sentences Leipzig Corpus. It is tested that
the output produced with the same model by both versions has the same md5 hash.

Component

Model

Perf 1.5.01

Perf 1.5.12

Tester

Passed

Comment

Sentence Detector

en-sent.bin

42565.4 sent/s

42186.7 sent/s

 

joern

yes  

 

Tokenizer

en-token.bin 3059.5 sent/s

3091.8 sent/s

 

joern

yes  

 

Name Finder

en-ner-person.bin

290.7 sent/s

487.1 sent/s

 

joern

no

 

  OPENNLP-138, feature-gen fix

POS Tagger

en-pos-maxent.bin 721.3 sent/s

732.1 sent/s

 

joern

yes  

 

POS Tagger

en-pos-perceptron.bin

1097.7 sent/s

1110.6 sent/s

 

joern

 

OPENNLP-155 might improve accuracy a little  

Chunker

en-chunker.bin 169,5 sent/s

167,3 sent/s

 

colen

yes  

computerB, tested with CONLL2000 (2012 sentences)

Parser

en-parser-chunking.bin

4.3 sent/s

11.6 sent/s

 

joern

yes

 

  Macbook was sleeping a little while doing 1.5.0

Note: Test was done on MacBook Pro 13" 7.1, 2.66 GHz Core 2 Duo, 8GB Ram, 256GB SSD running OS X 10.6.6
and Java 1.6.0_22 64-Bit Server.The performance varies because light weight tasks have been performed in the background while testing.

...

Component

Model

Training Time 1.5.01

Training Time 1.5.12

Tester

Passed

Comment

Sentence Detector

en-sent.bin

0m12.847s

0m11.255s

 

joern

yes  

 

Tokenizer

en-token.bin

2m16.694s

2m30.115s

 

joern

yes

 

  Re-test tagging was very slow, only 250 sent/s

Name Finder

en-ner-date.bin

 

 

joern

no  

OPENNLP-138  

Name Finder

en-ner-location.bin

 

 

joern

no  

OPENNLP-138  

Name Finder

en-ner-money.bin

 

 

joern

no  

OPENNLP-138  

Name Finder

en-ner-organization.bin

 

 

joern

no  

OPENNLP-138  

Name Finder

en-ner-percentage.bin

 

 

joern

no  

OPENNLP-138  

Name Finder

en-ner-person.bin

 

 

joern

no  

OPENNLP-138  

POS Tagger

en-pos-maxent.bin

 

 

joern

 

 

POS Tagger

en-pos-perceptron.bin

 

 

joern

 

 

Chunker

en-chunker.bin

 

 

joern  

  

Note: Remove here, its CONLL 2000 anyway

Parser

en-parser-chunking.bin 110m8.712s

138m9.045s

 

joern

yes  

 

Note: Time was measured with the time command, the value is the "real" time value.

...

Component

Data

Tester

Tagging Perf 1.5.0 1

Tagging Perf 1.5.1 2

Comment

Sentence Detector

  

joern

 

 

Will not be done in this release.

Tokenizer

  

joern

 

 

We need a de-tokenizer dictionary first, will be done in next release.

Name Finder

CONLL 2002 Dutch Person ned.testa

joern
 

Precision: 0.7906976744186046
Recall: 0.48364153627311524 48364153627311524 
F-Measure: 0.6001765225066196

 

 

Name Finder

CONLL 2002 Dutch Person ned.testb

joern  

Precision: 0.8527980535279805
Recall: 0.6384335154826958 6384335154826958 
F-Measure: 0.7302083333333333

 

 

Name Finder

CONLL 2002 Dutch Organization ned.testa

joern  

Precision: 0.8386075949367089
Recall: 0.38629737609329445 38629737609329445 
F-Measure: 0.5289421157684631

 

 

Name Finder

CONLL 2002 Dutch Organization ned.testb

joern

 

Precision: 0.7784200385356455
Recall: 0.4580498866213152 4580498866213152 
F-Measure: 0.5767309064953604

 

 

Name Finder

CONLL 2002 Dutch Location ned.testa

joern  

Precision: 0.8362831858407079
Recall: 0.3945720250521921 3945720250521921 
F-Measure: 0.5361702127659574

 

 

Name Finder

CONLL 2002 Dutch Location ned.testb

joern  

Precision: 0.854251012145749 854251012145749 
Recall: 0.5452196382428941 5452196382428941 
F-Measure: 0.665615141955836

 

 

Name Finder

CONLL 2002 Dutch Misc ned.testa

joern

 

Precision: 0.8300492610837439
Recall: 0.4505347593582888 4505347593582888 
F-Measure: 0.5840554592720971

 

 

Name Finder

CONLL 2002 Dutch Misc ned.testb

joern  

Precision: 0.8373205741626795
Recall: 0.44229149115417016 44229149115417016 
F-Measure: 0.5788313120176405

 

 

Name Finder

CONLL 2002 Combined ned.testa

joern  

Precision: 0.7906976744186046
Recall: 0.48364153627311524 48364153627311524 
F-Measure: 0.6001765225066196


 

Name Finder

CONLL 2002 Dutch Combined ned.testb

joern  

Precision: 0.8527980535279805
Recall: 0.6384335154826958 6384335154826958 
F-Measure: 0.7302083333333333

 

 

Name Finder

CONLL 2002 Spanish Person esp.testa

joern

 

Precision: 0.8982630272952854
Recall: 0.5924713584288053 5924713584288053 
F-Measure: 0.7140039447731755


 

Name Finder

CONLL 2002 Spanish Person esp.testb

joern  

Precision: 0.9008 9008 
Recall: 0.7659863945578231 7659863945578231 
F-Measure: 0.8279411764705882


 

Name Finder

CONLL 2002 Spanish Organization esp.testa

joern
 

Precision: 0.8216258879242304
Recall: 0.6123529411764705 6123529411764705 
F-Measure: 0.7017189079878665


 

Name Finder

CONLL 2002 Spanish Organization esp.testb

joern

 

Precision: 0.8009331259720062
Recall: 0.7357142857142858  7357142857142858  
F-Measure: 0.7669396872673119

 

 

Name Finder

CONLL 2002 Spanish Location esp.testa

joern
 

Precision: 0.7481789802289281
Recall: 0.7306910569105691 7306910569105691 
F-Measure: 0.739331619537275


 

Name Finder

CONLL 2002 Spanish Location esp.testb

joern

 

Precision: 0.8226221079691517
Recall: 0.5904059040590406 5904059040590406 
F-Measure: 0.6874328678839956


 

Name Finder

CONLL 2002 Spanish Misc esp.testa

joern

 

Precision: 0.6446886446886447
Recall: 0.3955056179775281 3955056179775281 
F-Measure: 0.49025069637883006


 

Name Finder

CONLL 2002 Spanish Misc esp.testb

joern

 

Precision: 0.6595744680851063
Recall: 0.36578171091445427 36578171091445427 
F-Measure: 0.4705882352941176


 

Name Finder

CONLL 2002 Spanish Combined esp.testa

joern

 

Precision: 0.8982630272952854  8982630272952854  
Recall: 0.5924713584288053 5924713584288053 
F-Measure: 0.7140039447731755


 

Name Finder

CONLL 2002 Spanish Combined esp.testb

joern

 

Precision: 0.9008 9008 
Recall: 0.7659863945578231 7659863945578231 
F-Measure: 0.8279411764705882


 

Name Finder

CONLL 2003 English Person eng.testa

jkosin Precision: 0.901992661721591
Recall: 0.7263843648208469
F-Measure: 0.8047194918352375

Precision: 0.9352201257861635
Recall: 0.8072747014115093 8072747014115093 
F-Measure: 0.8665501165501166

 

 

Name Finder

CONLL 2003 English Person eng.testb

jkosin

Precision:  0.8977988745723299
Recall: 0.6821273964131107
F-Measure: 0.7752427693131103 Precision: 0.0.8873546511627907
Recall: 0.7551020408163265 7551020408163265 
F-Measure: 0.8159037754761109

 

 

Name Finder

CONLL 2003 English Organization eng.testa

jkosin

Precision:  0.8290322580645161
Recall: 0.6226696495152871
F-Measure: 0.711183505195638 Precision: 0.8528584817244611
Recall: 0.6785980611483967 6785980611483967 
F-Measure: 0.7558139534883722

 

 

Name Finder

CONLL 2003 English Organization eng.testb

jkosin

Precision:  00.818058934847256 8263422818791947
Recall:  00.5394340758579169 5930162552679109 
F-Measure:  0.6501526888707977 Precision: 0.8263422818791947
Recall: 0.5930162552679109
F-Measure: 0.0.6905012267788293


 

Name Finder

CONLL 2003 English Location eng.testa

jkosin

Precision: 0.9584186939820742
Recall:  0.7408818726183996
F-Measure: 0.8357262402029991 Precision: 0.9283837056504599
Recall: 0.769188894937398 769188894937398 
F-Measure: 0.8413218219708247

 

 

Name Finder

CONLL 2003 English Location eng.testb

jkosin

Precision:  00.9485177151120753 9156180606957809
Recall:  00.7182254196642686 7416067146282974 
F-Measure:  00.8174619349330977

Precision: 0.9156180606957809
Recall: 0.7416067146282974
F-Measure: 0.8194766478966545

8194766478966545

 

 

Name Finder

CONLL 2003 English Misc eng.testa

jkosin

Precision: 0.8492613111726685
Recall:  0.6052060737527115
F-Measure: 0.706757826338278 Precision: 0.8539007092198582
Recall: 0.6529284164859002 6529284164859002 
F-Measure: 0.7400122925629993

 

 

Name Finder

CONLL 2003 English Misc eng.testb

jkosin

Precision:  00.8979300499643112 8599137931034483
Recall:  00.5299145299145299 5683760683760684 
F-Measure:  00.6664957615531857

Precision: 0.8599137931034483
Recall: 0.5683760683760684
F-Measure: 0.6843910806174958

6843910806174958

 

 

Name Finder

CONLL 2003 English Combined eng.testa

jkosin

Precision: 0.8230655223984119
Recall: 0.8039380679905755
F-Measure: 0.8133893616650641 Precision: 0.8601818493738206
Recall: 0.8438236284079434 8438236284079434 
F-Measure: 0.8519242205420101

 

1000 iterations

Name Finder

CONLL 2003 English Combined eng.testb

jkosin

Precision: 0.7849405582672956 8036415565869333
Recall: 0.7563739376770539 7970963172804533 
F-Measure: 0.7703925220469681

Precision: 0.8036415565869333
Recall: 0.7970963172804533
F-Measure: 0.8003555555555556

8003555555555556

 

1000 1000 iterations

Name Finder

CONLL 2003 German Person deu.testa

joern

Precision: 0.8272041489863272  8602620087336245
Recall: 0.22626695217701642  28122769450392576 
F-Measure:  0.35533762893472637 0.4238838084991931

 

 

Name Finder

CONLL 2003 German Person deu.testb

joern

Precision: 0.8602620087336245 878 
Recall: 0.28122769450392576 3673640167364017 
F-Measure: 0.4238838084991931 5179941002949853

 

 

Name Finder

CONLL 2003 German Person Organization deu.testb testa

joern

Precision: 0.7535042735042735  8365695792880259
Recall: 0.2602510460251046  41659951651893634 
F-Measure:  0.38687890773270717 0.5562130177514794

 

 

Name Finder

CONLL 2003 German Organization deu.testb

joern

Precision: 0.878 7942583732057417
Recall: 0.3673640167364017 4294954721862872 
F-Measure: 0.5179941002949853 5575146935348446

 

 

Name Finder

CONLL 2003 German Organization Location deu.testa

joern

Precision: 0.6615148726058698  7362637362637363
Recall: 0.29814665592264306  34038950042337 
F-Measure: 0.4110375194740828 Precision: 0.8365695792880259
Recall: 0.41659951651893634
F-Measure: 0.5562130177514794 4655471916618414

 

 

Name Finder

CONLL 2003 German Organization Location deu.testb

joern

Precision: 0.690884820747521  75 
Recall: 0.3311772315653299  3652173913043478 
F-Measure:  0.4477327413690855 Precision: 0.7942583732057417
Recall: 0.4294954721862872
F-Measure: 0.5575146935348446 4912280701754385

 

 

Name Finder

CONLL 2003 German Location Misc deu.testa

joern

Precision: 0.8779137529137528  7213930348258707
Recall: 0.32006773920406434  14356435643564355 
F-Measure: 0.46910886680647634 Precision: 0.7362637362637363
Recall: 0.34038950042337
F-Measure: 0.4655471916618414 2394715111478117

 

 

Name Finder

CONLL 2003 German Location Misc deu.testb

joern

Precision: 0.741636798088411  6198830409356725
Recall: 0.3169082125603865  1582089552238806 
F-Measure:  0.44406386065180703 Precision: 0.75
Recall: 0.3652173913043478
F-Measure: 0.4912280701754385 2520808561236623

 

 

Name Finder

CONLL 2003 German Misc Combined deu.testa

joern

Precision: 0.8151658767772512  7675205413243112
Recall: 0.12178217821782178  32857438444030623 
F-Measure: 0.21190646707366545 Precision: 0.7213930348258707
Recall: 0.14356435643564355
F-Measure: 0.2394715111478117 46015647638365687

 

 

Name Finder

CONLL 2003 German Misc Combined deu.testb

joern

Precision: 0.8125  7553418803418803
Recall: 0.15074626865671642  3849714130138851 
F-Measure:  0.2543095099748208 Precision: 0.6198830409356725
Recall0.5100090171325519

 

 

POS Tagger

CONLL 2006 Danish

joern

Accuracy: 0.1582089552238806
F-Measure: 0.2520808561236623 9511278195488722

 

  Name

Finder POS Tagger

CONLL 2003 German Combined deu.testa 2006 Dutch

joern

Precision Accuracy: 0.6622805891862553 
Recall: 0.28698530933167804 
F-Measure: 0.400445860424834

Precision: 0.7675205413243112
Recall: 0.32857438444030623
F-Measure: 0.46015647638365687

 

Name Finder

CONLL 2003 German Combined deu.testb

joern

Precision: 0.6632526799570968 
Recall: 0.33324258099646065 
F-Measure: 0.44360278183404916

Precision: 0.7553418803418803
Recall: 0.3849714130138851
F-Measure: 0.5100090171325519

 

POS Tagger

CONLL 2006 Danish

joern

Accuracy: 0.9511278195488722

Accuracy: 0.9511278195488722

 

POS Tagger

CONLL 2006 Dutch

joern

Accuracy: 0.9324977618621307

Accuracy: 0.9324977618621307

 

POS Tagger

CONLL 2006 Portuguese

joern

Accuracy: 0.9659110277825124

Accuracy: 0.9659110277825124

 

9324977618621307

 

 

POS Tagger

CONLL 2006 Portuguese

joern

Accuracy: 0.9659110277825124

 

 

POS Tagger

CONLL 2006 Swedish

joern

Accuracy: 0.9275106082036775

 

 

POS Tagger

CONLL 2006 Swedish

joern

Accuracy: 0.9275106082036775

Accuracy: 0.9275106082036775

 

Chunker

CONLL 2000

colen  

Precision: 0.9255923572240226
Recall: 0.9220610430991112
F-Measure: 0.9238233255623465
Recall: 0.9220610430991112 
F-Measure: 0.9238233255623465

 

  Evaluator was not available in 1.5.0. To evaluate if something changed I compared the output of 1.5.0 and 1.5.1. The output changed a little because of a bug fixed in 1.5.1 (missing trailing closing bracket)

Chunker

Arvores Deitadas

colen

 

Precision: 0.9406086044071353
Recall: 0.9364814040952779 9364814040952779 
F-Measure: 0.9385404669668097

 

  AD format for Chunker was not available for 1.5.0

The results of the tagging performance might differ compared to the
1.5.0 release since a precision bug in the calculation of the score has been fixed:
https://issues.apache.org/jira/browse/OPENNLP-59

...

Analysis Engine

Tester

Passed

Comment

Sentence Detector

joern

yes

 

  Used to process millions of news articles

Sentence Detector Trainer

Tommaso

yes  

Trained and tested with cmd line tool

Tokenizer ME

joern

yes

 

  Used to process millions of news articles

Tokenizer Trainer

Tommaso

 

Trained and tested with cmd line tool

Name Finder

joern

yes

 

  Used to process millions of news articles

Name Finder Trainer

Tommaso

yes  

Trained and tested with cmd line tool

Chunker

joern

yes  

as part of sample pear

Chunker Trainer

 

 

 

POS Tagger

joern

yes  

as part of sample pear

POS Tagger Trainer

Tommaso

yes  

Trained and tested with cmd line tool

Parser

 

 

 

createPear.sh

joern

no, retest with RC5

Test that pear is build and works. Now fixed after OPENNLP-143.

 

createPear.sh

joern

 

 

Sample PEAR

joern

yes  

installed and run over sample text

...

Package

File or Test

Tester

Passed

Comment

Binary

LICENSE

joern  

yes  

AL 2.0 and BSD for JWNL

Binary

NOTICE

joern  

yes  

standard notice, dates are correct

Binary

README

joern  

yes  

 

Binary

RELEASE_NOTES.html

joern  

yes  

issue list is generated correctly

Binary

Test signatures: .md5, .sha1, .asc

joern  

yes  

rc7  

Source

LICENSE

joern  

yes  

standard AL 2.0 file

Source

NOTICE

joern  

yes  

standard notice, dates are correct

Source

Test signatures: .md5, .sha1, .asc

joern  

yes  

rc7  

Source

Can build from source?

joern  

yes  

Test should be done without jwnl and opennlp in local m2 repo.
Test was done on Ubuntu 10.10.