Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

To pass the test the event hash and the model output must be identical.

Component

Model

Training Time 1.5.2

Training Time 1.5.3

Tester

Passed

Comment

Sentence Detector

en-sent.bin

 

 

Jörn

  yes

 

Tokenizer

en-token.bin

 

 

Jörn

  yes

 

POS Tagger

en-pos-maxent.bin

 

 

Jörn

  yes

 

POS Tagger

en-pos-perceptron.bin

 

 

Jörn

  yes

 

Parser

en-parser-chunking.bin

 

 

Jörn

 

yes

Tested on 10k sentences  

Note: Time was measured with the time command, the value is the "real" time value.

...

Component

Data

Tester

Tagging Perf 1.5.2

Tagging Perf 1.5.3

Comment

Sentence Detector

 

 

 

 

 

Tokenizer

 

 

 

 

 

Name Finder

CONLL 2002 Dutch Person ned.testa

jkosin

Precision: 0.7552941176470588
Recall: 0.4566145092460882
F-Measure: 0.5691489361702128

Name Finder

CONLL 2002 Dutch Person ned.testa

 

Precision: 0.7552941176470588
Recall: 0.4566145092460882
F-Measure: 0.5691489361702128  

 

Name Finder

CONLL 2002 Dutch Person ned.testb

  jkosin

Precision: 0.8505025125628141
Recall: 0.6165755919854281
F-Measure: 0.7148891235480465

 

 

Precision: 0.8505025125628141
Recall: 0.6165755919854281
F-Measure: 0.7148891235480465

 

Name Finder Name Finder

CONLL 2002 Dutch Organization ned.testa

  jkosin

Precision: 0.8561872909698997
Recall: 0.37317784256559766
F-Measure: 0.5197969543147207

Precision: 0.8561872909698997
Recall: 0.37317784256559766
F-Measure: 0.5197969543147207  

 

Name Finder

CONLL 2002 Dutch Organization ned.testb

  jkosin

Precision: 0.7830374753451677
Recall: 0.4501133786848073
F-Measure: 0.5716342692584593

Precision: 0.7830374753451677
Recall: 0.4501133786848073
F-Measure: 0.5716342692584593  

 

Name Finder

CONLL 2002 Dutch Location ned.testa

  jkosin

Precision: 0.8458333333333333
Recall: 0.42379958246346555
F-Measure: 0.564673157162726

 

Precision: 0.8458333333333333
Recall: 0.42379958246346555
F-Measure: 0.564673157162726

  

Name Finder

CONLL 2002 Dutch Location ned.testb

  jkosin

Precision: 0.8816326530612245
Recall: 0.5581395348837209
F-Measure: 0.6835443037974683

Precision: 0.8816326530612245
Recall: 0.5581395348837209
F-Measure: 0.6835443037974683  

 

Name Finder

CONLL 2002 Dutch Misc ned.testa

  jkosin

Precision: 0.8354114713216958
Recall: 0.44786096256684493
F-Measure: 0.5831157528285466

Precision: 0.8354114713216958
Recall: 0.44786096256684493
F-Measure: 0.5831157528285466  

 

Name Finder

CONLL 2002 Dutch Misc ned.testb

  jkosin

Precision: 0.8264984227129337
Recall: 0.44144903117101936
F-Measure: 0.5755079626578803

 

Precision: 0.8264984227129337
Recall: 0.44144903117101936
F-Measure: 0.5755079626578803

 

Name Finder

CONLL 2002 Combined ned.testa

  jkosin

Precision: 0.6509695290858726
Recall: 0.628822629969419
F-Measure: 0.6397044526540929

 

1000 iterations

Name Finder

CONLL 2002 Dutch Combined ned.testb

 

Precision: 0.6869929337869668 664424218440839
Recall: 0.6660746003552398 6418195718654435
F-Measure: 0.6763720690543674  6529263076025666

1000 iterations
OPENNLP-417

Name Finder

CONLL 2002 Spanish Person esp.testa Dutch Combined ned.testb

jkosin  

Precision: 0.9010695187165776 6869929337869668
Recall: 0.5515548281505729 6660746003552398
F-Measure: 0.684263959390863

 

 

Name Finder

CONLL 2002 Spanish Person esp.testb

 6763720690543674

Precision: 0.9195205479452054 7006019366657943
Recall: 0.7306122448979592 679269221009896
F-Measure: 0.8142532221379833 6897706776603968

 

  1000 iterations
OPENNLP-417

Name Finder

CONLL 2002 Spanish Organization Person esp.testa

  jkosin

Precision: 0.8288942695722357 9010695187165776
Recall: 0.6041176470588235 5515548281505729
F-Measure: 0.6988771691051379

 

 

Name Finder

CONLL 2002 Spanish Organization esp.testb

 684263959390863

Precision: 0.8036277602523659 9010695187165776
Recall: 0.7278571428571429 5515548281505729
F-Measure: 0.7638680659670164  684263959390863

 

Name Finder

CONLL 2002 Spanish Location Person esp.testa testb

  jkosin

Precision: 0.7743016759776536 9195205479452054
Recall: 0.7042682926829268 7306122448979592
F-Measure: 0.7376263970196913

 

 

Name Finder

CONLL 2002 Spanish Location esp.testb

 8142532221379833

Precision: 0.8301886792452831 9195205479452054
Recall: 0.5682656826568265 7306122448979592
F-Measure: 0.6746987951807228  8142532221379833

 

Name Finder

CONLL 2002 Spanish Misc Organization esp.testa

  jkosin

Precision: 0.6492890995260664 8288942695722357
Recall: 0.30786516853932583 6041176470588235
F-Measure: 0.4176829268292683

 

 

6988771691051379

Precision: 0.8288942695722357
Recall: 0.6041176470588235
F-Measure: 0.6988771691051379

 

Name Name Finder

CONLL 2002 Spanish Misc Organization esp.testb

  jkosin

Precision: 0.686046511627907 8036277602523659
Recall: 0.3480825958702065 7278571428571429
F-Measure: 0.461839530332681

 

 

Name Finder

CONLL 2002 Spanish Combined esp.testa

 7638680659670164

Precision: 0.7005423249233671 8036277602523659
Recall: 0.6828315329809239 7278571428571429
F-Measure: 0.6915735567970205 7638680659670164

  1000 iterations

Name Finder

CONLL 2002 Spanish Combined Location esp.testb testa

  jkosin

Precision: 0.756635931824532 7743016759776536
Recall: 0.7611017425519955 7042682926829268
F-Measure: 0.7588622670589884

 

7376263970196913

Precision: 0.7743016759776536
Recall: 0.7042682926829268
F-Measure: 0.7376263970196913

  1000 iterations

Name Finder

CONLL 2003 English Person eng.testa 2002 Spanish Location esp.testb

jkosin

Precision: 0.9523195876288659 8301886792452831
Recall: 0.8023887079261672 5682656826568265
F-Measure: 0.8709487330583382 6746987951807228

Precision: 0.95231958762886598301886792452831
Recall: 0.80238870792616725682656826568265
F-Measure: 0.8709487330583382 6746987951807228

 

Name Finder

CONLL 2003 English Person eng.testb 2002 Spanish Misc esp.testa

jkosin

Precision: 0.9391727493917275 6492890995260664
Recall: 0.7161410018552876 30786516853932583
F-Measure: 0.8126315789473685 4176829268292683

Precision: 0.93917274939172756492890995260664
Recall: 0.716141001855287630786516853932583
F-Measure: 0.8126315789473685 4176829268292683

 

Name Finder

CONLL 2003 English Organization eng.testa 2002 Spanish Misc esp.testb

jkosin

Precision: 0.8768046198267565 686046511627907
Recall: 0.6793437733035048 3480825958702065
F-Measure: 0.7655462184873949 461839530332681

Precision: 0.8768046198267565686046511627907
Recall: 0.67934377330350483480825958702065
F-Measure: 0.7655462184873949 461839530332681

 

Name Finder

CONLL 2003 English Organization eng.testb 2002 Spanish Combined esp.testa

jkosin

Precision: 0.8435980551053485 7005423249233671
Recall: 0.6267308850090307 6828315329809239
F-Measure: 0.7191709844559586 6915735567970205

Precision: 0.84359805510534857047866069323273
Recall: 0.62673088500903076869685129855205
F-Measure: 0.7191709844559586 6957635009310986

1000 iterations
OPENNLP-417  

Name Finder

CONLL 2003 English Location eng.testa 2002 Spanish Combined esp.testb

jkosin

Precision: 0.9361421988150099 756635931824532
Recall: 0.7740881872618399 7611017425519955
F-Measure: 0.8474374255065554 7588622670589884

Precision: 0.93614219881500997588711930706902
Recall: 0.77408818726183997633501967397415
F-Measure: 0.8474374255065554 7611041053664006

1000 iterations
OPENNLP-417  

Name Finder

CONLL 2003 English Location Person eng.testb testa

jkosin

Precision: 0.9206349206349206 9523195876288659
Recall: 0.7302158273381295 8023887079261672
F-Measure: 0.8144433299899699 8709487330583382

Precision: 0.92063492063492069523195876288659
Recall: 0.73021582733812958023887079261672
F-Measure: 0.8144433299899699 8709487330583382

 

Name Finder

CONLL 2003 English Misc Person eng.testa testb

jkosin

Precision: 0.9027982326951399 9391727493917275
Recall: 0.6648590021691974 7161410018552876
F-Measure: 0.7657713928794503 8126315789473685

Precision: 0.90279823269513999391727493917275
Recall: 0.66485900216919747161410018552876
F-Measure: 0.7657713928794503 8126315789473685

 

Name Finder

CONLL 2003 English Misc Organization eng.testb testa

jkosin

Precision: 0.8592436974789915 8768046198267565
Recall: 0.5826210826210826 6793437733035048
F-Measure: 0.6943972835314092 7655462184873949

Precision: 0.85924369747899158768046198267565
Recall: 0.58262108262108266793437733035048
F-Measure: 0.6943972835314092 7655462184873949

 

Name Finder

CONLL 2003 English Combined Organization eng.testa testb

jkosin

Precision: 0.861812521618817 8435980551053485
Recall: 0.8386065297879501 6267308850090307
F-Measure: 0.8500511770726714 7191709844559586

Precision: 0.86406087858872368435980551053485
Recall: 0.84079434533826996267308850090307
F-Measure: 0.8522688502217672 7191709844559586

  1000 iterations
OPENNLP-417

Name Finder

CONLL 2003 English Combined Location eng.testb testa

jkosin

Precision: 0.8041311831853597 9361421988150099
Recall: 0.7857648725212465 7740881872618399
F-Measure: 0.7948419450165667 8474374255065554

Precision: 0.80648668236999459361421988150099
Recall: 0.78806657223796047740881872618399
F-Measure: 0.7971702337243664 8474374255065554

  1000 iterations
OPENNLP-417

Name Finder

CONLL 2003 German Person deu.testa English Location eng.testb

jkosin

Precision: 0.9132653061224489 9206349206349206
Recall: 0.25553176302640973 7302158273381295
F-Measure: 0.3993307306190742 8144433299899699

Precision: 0.91326530612244899206349206349206
Recall: 0.255531763026409737302158273381295
F-Measure: 0.3993307306190742 8144433299899699

 

Name Finder

CONLL 2003 German Person deu.testb English Misc eng.testa

jkosin

Precision: 0.8732106339468303 9027982326951399
Recall: 0.3573221757322176 6648590021691974
F-Measure: 0.507125890736342 7657713928794503

Precision: 0.87321063394683039027982326951399
Recall: 0.35732217573221766648590021691974
F-Measure: 0.507125890736342 7657713928794503

 

Name Finder

CONLL 2003 German Organization deu.testa English Misc eng.testb

jkosin

Precision: 0.8407224958949097 8592436974789915
Recall: 0.4125705076551168 5826210826210826
F-Measure: 0.5535135135135135 6943972835314092

Precision: 0.84072249589490978592436974789915
Recall: 0.41257050765511685826210826210826
F-Measure: 0.5535135135135135 6943972835314092

 

Name Finder

CONLL 2003 German Organization deu.testb English Combined eng.testa

jkosin

Precision: 0.8014705882352942 861812521618817
Recall: 0.4230271668822768 8386065297879501
F-Measure: 0.5537679932260795 8500511770726714

Precision: 0.80147058823529428640608785887236
Recall: 0.42302716688227688407943453382699
F-Measure: 0.5537679932260795 8522688502217672

 

1000 iterations
OPENNLP-417

Name Name Finder

CONLL 2003 German Location deu.testa English Combined eng.testb

jkosin

Precision: 0.7816326530612245 8041311831853597
Recall: 0.32430143945808637 7857648725212465
F-Measure: 0.45840813883901854 7948419450165667

Precision: 0.78163265306122458064866823699945
Recall: 0.324301439458086377880665722379604
F-Measure: 0.45840813883901854 7971702337243664

1000 iterations
OPENNLP-417  

Name Finder

CONLL 2003 German Location Person deu.testb testa

jkosin

Precision: 0.8033826638477801 9132653061224489
Recall: 0.3671497584541063 25553176302640973
F-Measure: 0.5039787798408487 3993307306190742

Precision: 0.80338266384778019132653061224489
Recall: 0.367149758454106325553176302640973
F-Measure: 0.5039787798408487 3993307306190742

 

Name Finder

CONLL 2003 German Misc Person deu.testa testb

jkosin

Precision: 0.7055555555555556 8732106339468303
Recall: 0.12574257425742574 3573221757322176
F-Measure: 0.21344537815126052 507125890736342

Precision: 0.70555555555555568732106339468303
Recall: 0.125742574257425743573221757322176
F-Measure: 0.21344537815126052 507125890736342

 

Name Finder

CONLL 2003 German Misc Organization deu.testb testa

jkosin

Precision: 0.6601307189542484 8407224958949097
Recall: 0.15074626865671642 4125705076551168
F-Measure: 0.2454434993924666 5535135135135135

Precision: 0.66013071895424848407224958949097
Recall: 0.150746268656716424125705076551168
F-Measure: 0.2454434993924666 5535135135135135

 

Name Finder

CONLL 2003 German Combined Organization deu.testa testb

  jkosin

Precision: 0.7718859429714857 8014705882352942
Recall: 0.319263397475688 4230271668822768
F-Measure: 0.5537679932260795

Precision: 0.8014705882352942
Recall: 0.4516978922716628
 4230271668822768
F-Measure: 0.5537679932260795

 

Name Finder

CONLL 2003 German Combined Location deu.testb testa

  jkosin

Precision: 0.7467566165023353 7816326530612245
Recall: 0.3917778382793357 32430143945808637
F-Measure: 0.5139285714285715

 

 

POS Tagger

CONLL 2006 Danish

 

45840813883901854

Precision: 0.7816326530612245
Recall: 0.32430143945808637
F-Measure Accuracy: 0.9511278195488722 45840813883901854

 

 

POS Tagger Name Finder

CONLL 2006 Dutch

 

2003 German Location deu.testb

jkosin

Precision Accuracy: 0.9324977618621307

 

 

POS Tagger

CONLL 2006 Portuguese

 

Accuracy8033826638477801
Recall: 0.9659110277825124  3671497584541063
F-Measure: 0.5039787798408487

Precision: 0.8033826638477801
Recall: 0.3671497584541063
F-Measure: 0.5039787798408487

  POS

Tagger Name Finder

CONLL 2006 Swedish

 

2003 German Misc deu.testa

jkosin

Precision Accuracy: 0.9275106082036775

 

 

7055555555555556
Recall: 0.12574257425742574
F-Measure: 0.21344537815126052

Precision: 0.7055555555555556
Recall: 0.12574257425742574
F-Measure: 0.21344537815126052

 

Name Finder

CONLL 2003 German Misc deu.testb

jkosin

Precision: 0.6601307189542484
Recall: 0.15074626865671642
F-Measure: 0.2454434993924666

Precision: 0.6601307189542484
Recall: 0.15074626865671642
F-Measure: 0.2454434993924666

 

Name Finder

CONLL 2003 German Combined deu.testa

jkosin

Precision: 0.7718859429714857
Recall: 0.319263397475688
F-Measure: 0.4516978922716628

Precision: 0.7783891945972986
Recall: 0.32195323815435545
F-Measure: 0.45550351288056207

OPENNLP-417

Name Finder

CONLL 2003 German Combined deu.testb

jkosin

Precision: 0.7467566165023353
Recall: 0.3917778382793357
F-Measure: 0.5139285714285715

Precision: 0.749351323300467
Recall: 0.3931391233324258
F-Measure: 0.5157142857142857

OPENNLP-417

POS Tagger

CONLL 2006 Danish

Jörn / ?

Accuracy: 0.9511278195488722

Accuracy: 0.9512987012987013

Jörn: Same result as other tester

POS Tagger

CONLL 2006 Dutch

Jörn

Accuracy: 0.9324977618621307

Accuracy: 0.9324977618621307

 

POS Tagger

CONLL 2006 Portuguese

Jörn / ?

Accuracy: 0.9659110277825124

Accuracy: 0.9659110277825124

Jörn: Same result as other tester

POS Tagger

CONLL 2006 Swedish

Jörn

Accuracy: 0.9275106082036775

Accuracy: 0.9275106082036775

 

Chunker

CONLL 2000

William

Precision: 0.9257575757575758
Recall: 0.9221868187154117
F-Measure: 0.9239687473746113

Precision: 0.9257575757575758
Recall: 0.9221868187154117
F-Measure: 0.9239687473746113

 

Sentence Detector

Arvores Deitadas
(Floresta Virgem)
(10-fold cross-validation)

William

 

Precision: 0.9891491491491492
Recall: 0.9894066523820013
F-Measure: 0.9892778840089301

PERCEPTRON Cutoff 0
1.5.2 works poorly because
we didn't have configurable EOS

Tokenizer

Arvores Deitadas
(Floresta Virgem)
(10-fold cross-validation)

William

 

Precision: 0.9995231988260895
Recall: 0.9994542652270997

Chunker

CONLL 2000

William

Precision: 0.9257575757575758
Recall: 0.9221868187154117
F-Measure: 0.9239687473746113 9994887308380267 Precision:

PERCEPTRON Cutoff 0 .9257575757575758
Recall: 0.9221868187154117
F-Measure: 0.9239687473746113
alphaNumOpt  

Chunker

Arvores Deitadas
(10-fold cross-validation)

William

Precision: 0.9404684925220583
Recall: 0.9374181341871635
F-Measure: 0.9389408359191154

Precision: 0.9562405864042575
Recall: 0.9582419351592844
F-Measure: 0.9572402147035765

OPENNLP-541, OPENNLP-423

...

Analysis Engine

Tester

Passed

Comment

Sentence Detector

 

 

 

Sentence Detector Trainer

 

 

 

Tokenizer ME

 

 

 

Tokenizer Trainer

 

 

 

Name Finder

 

 

 

Name Finder Trainer

 

 

 

Chunker

 

 

 

Chunker Trainer

 

 

 

POS Tagger

 

 

 

POS Tagger Trainer

 

 

 

Parser

 

 

 

createPear.sh

  Jörn  

yes

 

Sample PEAR

  Jörn  

yes

 

Distribution Review

Please ensure that the listed files below are included in the distributions
and are in a good state.

Package

File or Test

Tester

Passed

Comment

Binary

LICENSE

  Jörn  

Yes

AL 2.0 and BSD for JWNL

Binary

NOTICE

  Jörn  

Yes

standard notice, dates are correct. JWNL is mentioned

Binary

README

Jörn

  Yes

  File was reviewed on the dev list.

Binary

RELEASE_NOTES.html

Jörn

  Yes

  issue list is generated correctly

Binary

Test signatures: .md5, .sha1, .asc

  Jörn  

Yes rc4

tested for rc3

Binary

JIRA issue list created

William

No

Yes

Minor issue: the project.version was not filled. The list is empty

Binary

Contains maxent, tools, uima and jwnl jars

  Jörn  

Yes

 

Source

LICENSE

  Jörn  

Yes

standard AL 2.0 file

Source

NOTICE

  Jörn  

Yes

standard notice, dates are correct

Source

Test signatures: .md5, .sha1, .asc

  Jörn

  rc1

tested for rc3

Source

Can build from source?

  Jörn  

Yes

Test should be done without jwnl and opennlp in local m2 repo.
Test was done on Ubuntu 10.10.

Notes about testing

Compatibility tests

The following commands can be used to reproduce the compatibility tests with Leipzig corpus.

Code Block
 
# Corpus preparation: the following command will create documents from the corpus. Sed is used to remove the language prefix

sh bin/opennlp DoccatConverter leipzig -data ../eng_news_2010_300K-text/eng_news_2010_300K-sentences.txt -encoding UTF-8 -lang en | sed -E 's/^en[[:space:]]//g' > ../out-tokenized-documents.test

# Corpus preparation: this forces the detokenization of the documents

sh bin/opennlp SentenceDetectorConverter namefinder -data ../out-tokenized-documents.test -encoding UTF-8 -detokenizer trunk/opennlp-tools/lang/en/tokenizer/en-detokenizer.xml > ../out-documents.test

# Now the actually tests. Execute it for the previous release and for the current RC. Compare the output using diff:

time sh bin/opennlp SentenceDetector ../models/en-sent.bin < ../out-documents.test > ../out-sentences_1.5.2.test

time sh bin/opennlp TokenizerME ../models/en-token.bin < ../out-sentences_1.5.2.test > ../out-toks_1.5.2.test

time sh bin/opennlp TokenNameFinder ../models/en-ner-person.bin < ../out-toks_1.5.2.test > ../out-ner_1.5.2.test

time sh bin/opennlp POSTagger ../models/en-pos-maxent.bin < ../out-toks_1.5.2.test > ../out-pos_maxent_1.5.2.test

time sh bin/opennlp POSTagger ../models/en-pos-perceptron.bin < ../out-toks_1.5.2.test > ../out-pos_pers_1.5.2.test

time sh bin/opennlp ChunkerME ../models/en-chunker.bin < ../out-pos_pers_1.5.2.test > ../out-chk_1.5.2.test

time sh bin/opennlp Parser ../models/en-parser-chunking.bin < ../out-toks_1.5.2.test > ../out-parse_1.5.2.test