Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Please refer to Geode 1.2.0 documentation with final implementation is here.

Requirements


  1. Allow user to create Lucene Indexes on data stored in Geode

  2. Update the indexes asynchronously to avoid impacting write latency
  3. Allow user to perform text (Lucene) search on Geode data using the Lucene index. Results from the text searches may be stale due to asynchronous index updates.

  4. Provide highly available of indexes using Geode's HA capabilities 

  5. Scalability
  6. Performance comparable to RAMFSDirectory

Out of Scope
  1. Building next/better Solr/Elasticsearch.

  2. Enhancing the current Geode OQL to use Lucene index.

...

Users will interact with a new LuceneService interface, which provides methods for creating indexes and querying. Users can also create indexes through gfsh or cache.xml.

Java API 

LuceneService

Code Block
/**
   * Create a lucene index using default analyzer.
   */
  public void createIndex(String indexName, String regionName, String... fields);
  
  /**
   * Create a lucene index using specified analyzer per field
   */
  public void createIndex(String indexName, String regionName,  
      Map<String, Class<? extends Analyzer>> analyzerPerField);

  public void destroyIndex(LuceneIndex index);
 
  public LuceneIndex getIndex(String indexName, String regionName);
  
  public Collection<LuceneIndex> getAllIndexes();

  /**
   * Get a factory for building queries
   */ 
  public LuceneQueryFactory createLuceneQueryFactory();

...

LuceneQueryFactory

Code Block
 /**
   * Set page size for a query result. The default page size is 0 which means no pagination.
   * If specified negative value, throw IllegalArgumentException
   * @param pageSize
   * @return itself
   */
  LuceneQueryFactory setPageSize(int pageSize);
  
  /**
   * Set max limit of result for a query
   * If specified limit is less or equal to zero, throw IllegalArgumentException
   * @param limit
   * @return itself
   */
  LuceneQueryFactory setResultLimit(int limit);
  
  /**
   * Set a list of fields for result projection.
   * 
   * @param fieldNames
   * @return itself
   */
  LuceneQueryFactory setProjectionFields(String... fieldNames);
  
  /**
   * Create wrapper object for lucene's QueryParser object using default standard analyzer.
   * The queryString is using lucene QueryParser's syntax. QueryParser is for easy-to-use 
   * with human understandable syntax. 
   *  
   * @param regionName region name
   * @param indexName index name
   * @param queryString query string in lucene QueryParser's syntax
   * @param defaultField default field used by the Lucene Query Parser
   * @param K the key type in the query results
   * @param V the value type in the query results
   * @return LuceneQuery object
   * @throws ParseException
   */
  public <K, V> LuceneQuery<K, V> create(String indexName, String regionName, String queryString, String defaultField) 
      throws ParseException;
  /**
   * Creates a wrapper object for Lucene's Query object. This {@link LuceneQuery} builder method could be used in
   * advanced cases, such as cases where Lucene's Query object construction needs Lucene's API over query string. The
   * {@link QueryDeserializer} will be used to re-construct the Lucene Query object on remote hosts.
   * 
   * @param indexName index name
   * @param regionName region name
   * @param provider constructs and provides a Lucene Query object
   * @param K the key type in the query results
   * @param V the value type in the query results
   * @return LuceneQuery object
   */
  public <K, V> LuceneQuery<K, V> create(String indexName, String regionName, LuceneQueryProvider provider);

/**
 * The instances of this class will be used for distributing Lucene Query objects and re-constructing the Query object.
 * If necessary the implementation needs to take care of serializing and de-serializing Lucene Query object. Geode
 * respects the DataSerializable contract to provide optimal object serialization. For instance,
 * {@link LuceneQueryProvider}'s toData method will be used to serialize it when it is sent to another member of the
 * distributed system. Implementation of DataSerializable can provide a zero-argument constructor that will be invoked
 * when they are read with DataSerializer.readObject.
 */
public interface LuceneQueryProvider extends Serializable {
  /**
   * @return A Lucene Query object which could be used for executing Lucene Search on indexed data
   * @param The local lucene index the query is being constructed against.
   * @throws QueryException if the provider fails to construct the query object
   */
  public Query getQuery(LuceneIndex index) throws QueryException;
}

LuceneQuery

Code Block
/**
 * Provides wrapper object of Lucene's Query object and execute the search. 
 * <p>Instances of this interface are created using
 * {@link LuceneQueryFactory#create}.
 * 
 */
public interface LuceneQuery {
  /**
   * Execute the search and return keys. 
   */
  public Collection<K> findKeys();
  /**
   * Execute the search and return values. 
   */
  public Collection<V> findValues();

  /**
   * Execute the search and return list of LuceneResultStruct. 
   */
  public List<LuceneResultStruct<K,V>> findResults();

  /**
   * Execute the search and get results. 
   */
  public PageableLuceneQueryResults<K,V> findPages();
  
  /**
   * Get page size setting of current query. 
   */
  public int getPageSize();
  
  /**
   * Get limit size setting of current query. 
   */
  public int getLimit();
  
  /**
   * Get projected fields setting of current query. 
   */
  public String[] getProjectedFieldNames();
}

...

LuceneResultStruct

...

Now that this feature has been implemented, please refer to the javadocs for details on the Java API.

 

...

    Examples

Code Block
// Get LuceneService
LuceneService luceneService = LuceneServiceProvider.get(cache);

// Create Index on fields with default analyzer:
luceneService.createIndex(indexName, regionName, "field1", "field2", "field3");

// create index on fields with specified analyzer:
Map<String, Analyzer> analyzerPerField = new HashMap<String, Analyzer>();
analyzerPerfield.put("field1", new StandardAnalyzer());
analyzerPerfield.put("field2", new KeywardAnalyzer());
luceneService.createIndex(indexName, regionName, analyzerPerField);
 
Region region = cache.createRegionFactory(RegionShutcut.PARTITION).create(regionName);

// Create Query
LuceneQuery query = luceneService.createLuceneQueryFactory().setLimit(200).setPageSize(20)
  .create(indexName, regionName, querystring, "field1" /* default field */);

// Search using Query
PageableLuceneQueryResults<K,Object> results = query.findPages();

// Pagination
while (results.hasNext()) {
  results.next().stream().forEach(struct -> {
    Object value = struct.getValue();
    System.out.println("Key is "+struct.getKey()+", value is "+value);
  });
}

 

...