Page History

...

Code Block

	java
	java

LuceneDocumentMetadata documentMetadata = new LuceneDocumentMetadata("contents").withField("modified", Date.class);
TikaLuceneContentExtractor extractor = new TikaLuceneContentExtractor(new PDFParser(), true);
Document document = extractor .extract( Files.newInputStream( new File( "testPDF.pdf" ).toPath() ), documentMetadata  );

At this point, the document is ready to be analyzed and indexed. The TikaLuceneContentExtractor uses LuceneDocumentMetadata to create the properly typed document fields and currently supports DoubleField, FloatField, LongField, IntField, IntField, TextField (for content) and StringField (also used to store dates).

To demonstrated the full power of the CXF 3.0.2 content extraction and search capabiities, the demo project 'jax_rs_search' has been developed and is distributed in the sample bundle. The project could be found in the official Apache CXF Github repository.

Child pages

Versions Compared

Old Version 20

New Version 21

Key