Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
public LuceneIndexFactory {
 /**
 * Configure the way objects are converted to lucene documents for this lucene index
 * @param luceneSerializer A callback which converts a region value to a 
 * Lucene document or documents to be stored in the index.
 */
 public voidLuceneIndexFactory setLuceneSerializer(LuceneSerializer luceneSerializer);
}  
  
/**
 * An interface for writing the fields of an object into a lucene document
 * The region key will be added as a field to the returned documents.
 * @param index lucene index
 * @param value user object to be serialized into index
 */
public interface LuceneSerializer {
  Collection<Document> toDocuments(LuceneIndex index, Object value);
}

XML Configuration 

 

<cache
    xmlns    xmlns    xmlns    xsi
    xsi:schemaLocation="http://geode.apache.org/schema/cache
        http
        http://geode.apache.org/schema/cache/cache-1.0.xsd
        http
        http://geode.apache.org/schema/lucene
        http
        http://geode.apache.org/schema/lucene/lucene-1.0.xsd"
    version
    version="1.0">
 
    <region
    <region name="region"
 refid
 refid="PARTITION">
        <lucene
        <lucene:index name="index">
           <lucene:field name="a"
 analyzer
 analyzer="org.apache.lucene.analysis.core.KeywordAnalyzer"/>
           <lucene:field name="b"
 analyzer
 analyzer="org.apache.lucene.analysis.core.SimpleAnalyzer"/>
           <lucene:field name="c"
 analyzer
 analyzer="org.apache.lucene.analysis.standard.ClassicAnalyzer"/>
          <lucene:serializer="org
           <lucene:serializer>
             <class-name>org.apache.
lucene
geode.
internal
cache.
repository.FlatFormatSerializer"
lucene.FlatFormatSerializer</class-name>
           </lucene:serializer>
      
/>        
</lucene:index>
    <
    </
region>
region>
</
cache>If serializer is not specified, it will use the default HeterogeneousLuceneSerializer.
cache>
 

 

 

We will also provide a built-in implementation for LuceneSerializer

...

called FlatFormatSerializer(). With this example serializer users can specify nested fields using the syntax fieldnameAtLevel1.fieldnameAtLevel2

...

for both indexing and querying. 

For example, in the following data model Customer object contains both a Person fieldobject and a collection of Page objects. The Person object also contains a Page fieldobject.

Code Block
public class Customer implements Serializable {
  private String name;
  private PersonCollection<String> contactphoneNumbers; // search nested object 
  private Collection<Person> contacts;
  private Page[] myHomePages;
  ......
}
public class Person implements Serializable {
  private String name;
  private String email;
  private int revenue;
  private String address;
  private String[] phoneNumbers;
  private Page homepage;
  .......
}
public class Page implements Serializable {
  private int id; // search integer in int format
  private String title;
  private String content;
  ......
}

 

The following example below demonstrates how to index the nested fields: contactcontacts.name, contactcontacts.email, contactcontacts.address, contactcontacts.homepage.title.

Note: each segment is a field name, not a field type, because Customer class could have more than one field of type Person; e.g. Person contact contacts and Person deliveryman. The field name is used to identify the parent field.

...

Code Block
// Get LuceneService
LuceneService luceneService = LuceneServiceProvider.get(cache);

// Create Index on fields, some are fields in nested objects:
luceneService.createIndexFactory().setLuceneSerializer(new FlatFormatSeralizerFlatFormatSerializer()) /* an out-of-box LuceneSerializer implementation */
      .addField("name").addField("contactcontacts.name").addField("contactcontacts.email").addField("contactcontacts.address").addField("contactcontacts.homepage.title")
      .create("customerIndex", "Customer");

// Now to create region
Region CustomerRegion = ((Cache)cache).createRegionFactory(shortcut).create("Customer");


gfsh command line:


Code Block
gfsh create lucene index --name=customerIndex --region=/Customer --field=name,contacts.name,contacts.email,contacts.address,contacts.homepage.title --serializer=org.apache.geode.cache.lucene.FlatFormatSerializer

 

The syntax for querying the nested field is the same as for a top level field, but with the additional qualifying parent field name, such as "contactcontacts.name:tzhou11*". This distinguishes which "name" field when there can potentially be more than one 'name' field at different hierarchical levels in the object.

Code Block
LuceneQuery query = luceneService.createLuceneQueryFactory().create("customerIndex", "Customer", "contactcontacts.name:tzhou11*", "name");
 
PageableLuceneQueryResults<K,Object> results = query.findPages();

Out-Of-Box implementation

We 'll will provide an out-of-box implementation for the LuceneSerializer: FlatFormatSerializer.

...

For example, the FlatFormatSerializer will convert a Customer object into a document as

(name:John11),(contactcontacts.name:tzhou11), (contactcontacts.email:tzhou11@gmail.com), (contactcontacts.address:15220 Wall St), (contactcontacts.homepage.id:11), (contactcontacts.homepage.title: Mr. tzhou11), (contactcontacts.homepage.content: xxx)

Risks and Mitigations

...