THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Code Block | ||
---|---|---|
| ||
# Extract the model to get the tag dictionary $ unzip pos-pt_xmldic.model pos-pt_xmldic# Takexmldic # Take a look at the file size$size $ ls -alh pos-pt_xmldic total 3464 drwxr-xr-x 5 colen staff 170B 11 Jul 23:23 . drwxr-xr-x 16 colen staff 544B 11 Jul 23:23 .. -rw-r--r-- 1 colen staff 306B 11 Jul 21:03 manifest.properties -rw-r--r-- 1 colen staff 1,1M 11 Jul 21:03 pos.model -rw-r--r-- 1 colen staff 554K 11 Jul 21:03 tags.tagdict# Converttagdict # Convert the tags.tagdict to a table like FSAdictionary Dictionary |
...
ls -lah pos-pt_xmldic.model
-rw-r--r-- 1 colen staff 839K 8 Jul 01:24 pos-pt_nodic.model
## evaluate
bin/opennlp POSTaggerEvaluator.conllx -model pos-pt_xmldic.model -data portuguese_bosque_test.conll -encoding UTF-8
Accuracy: 0.9676154763933867
# convert TAGDICT
...
and .info file to be consumed by MorfologikDictionaryBuilder $ bin/morfologik-addon XMLDictionaryToTable -inputFile pos-pt_xmldic/tags.tagdict -outputFile pt-morfologik.txt -separator |
...
+ -encoder prefix -encoding UTF-8 |
...
Created dictionary: pt-morfologik.txt Created metadata: pt-morfologik.txt # Create the FSA Dictionary $ bin/morfologik-addon MorfologikDictionaryBuilder -inputFile pt-morfologik.txt -encoding UTF-8 |
# TODO format this
ls -lah pt-morfologik.dict
-rw-r--r-- 1 colen staff 268K 8 Jul 10:37 pt-morfologik.dict
...