THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
You can use the API to create a POSDictionary that can be used with POSTagger tool.
- Create an POSDictionary object, optionally set the case sensitivity flag, which is true by default;
- Populate the POSDictionary object using the addTags(String word, String... tags) method;
- Serialize the dictionary using the serialize(OutputStream out) method.
Notes:
- The addTags method is package-private. You will need to use it from a class which declares the same package.
- Close the OutputStream after using the serialize method.
Code example:
Code Block | ||||
---|---|---|---|---|
| ||||
// the method POSDictionary.addTags is package-private, so we need to use the same package. package opennlp.tools.postag; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStream; import java.util.Arrays; import java.util.Scanner; public class MyPOSDictionaryWriter { public static void main(String[] args) throws IOException { // first we create a POSDictionary boolean isCaseSensitive = true; POSDictionary dictionary = new POSDictionary(isCaseSensitive); // suppose that you have a file with words and tags as follows: // word0 tag0 tag1 // word1 tag2 // word3 tag3 tag4 tag5 // we open the file and read it line by line... Scanner scanner = new Scanner(new FileInputStream("fileName"), "UTF-8"); while (scanner.hasNextLine()) { String line = scanner.nextLine(); String[] data = line.split("\\s+"); String word = data[0]; String[] tags = Arrays.copyOfRange(data, 1, data.length); dictionary.addTags(word, tags); } scanner.close(); // now we can serialize the dictionary OutputStream out = new FileOutputStream("posDict.xml"); dictionary.serialize(out); out.close(); } } |