Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
java
java
LuceneDocumentMetadata documentMetadata = new LuceneDocumentMetadata("contents").withField("modified", Date.class);
TikaLuceneContentExtractor extractor = new TikaLuceneContentExtractor(new PDFParser(), true);
Document document = extractor .extract( Files.newInputStream( new File( "testPDF.pdf" ).toPath() ), documentMetadata  );

At this point, the document is ready to be analyzed and indexed. The TikaLuceneContentExtractor uses LuceneDocumentMetadata to create the properly typed document fields and currently supports DoubleField, FloatField, LongField, IntField, IntField, TextField (for content) and StringField (also used to store dates).

To demonstrated the full power of the CXF 3.0.2 content extraction and search capabiities, the demo project 'jax_rs_search' has been developed and is distributed in the sample bundle. The project could be found in the official Apache CXF Github repository.