Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Almost finished the example. A few more commands to go.

...

Code Block
languagebash
# Extract the model to get the tag dictionary
 
$ unzip pos-pt_xmldic.model pos-pt_xmldic# Takexmldic
 
# Take a look at the file size$size
$ ls -alh pos-pt_xmldic
total 3464
drwxr-xr-x 5 colen staff 170B 11 Jul 23:23 .
drwxr-xr-x 16 colen staff 544B 11 Jul 23:23 ..
-rw-r--r-- 1 colen staff 306B 11 Jul 21:03 manifest.properties
-rw-r--r-- 1 colen staff 1,1M 11 Jul 21:03 pos.model
-rw-r--r-- 1 colen staff 554K 11 Jul 21:03 tags.tagdict# Converttagdict
 
# Convert the tags.tagdict to a table like FSAdictionary Dictionary

...

ls -lah pos-pt_xmldic.model
-rw-r--r-- 1 colen staff 839K 8 Jul 01:24 pos-pt_nodic.model

## evaluate
bin/opennlp POSTaggerEvaluator.conllx -model pos-pt_xmldic.model -data portuguese_bosque_test.conll -encoding UTF-8

Accuracy: 0.9676154763933867

 

# convert TAGDICT

...

and .info file to be consumed by MorfologikDictionaryBuilder
$ bin/morfologik-addon XMLDictionaryToTable -inputFile pos-pt_xmldic/tags.tagdict -outputFile pt-morfologik.txt -separator

...

 + -encoder prefix -encoding UTF-8

...


Created dictionary: pt-morfologik.txt
Created metadata: pt-morfologik.txt
 
# Create the FSA Dictionary
$ bin/morfologik-addon MorfologikDictionaryBuilder -inputFile pt-morfologik.txt -encoding UTF-8

 

# TODO format this

ls -lah pt-morfologik.dict
-rw-r--r-- 1 colen staff 268K 8 Jul 10:37 pt-morfologik.dict

...