Proposal - Clusterable Search Via Solr

Status	Proposal under development
Target Release	4.1?
JIRA Issue	TBD
Original Authors	Dave Johnson

Abstract

Roller uses Lucene for search and Lucene stores its search index on disk. So, if you have multiple Roller instances running you can either 1) have both Lucene instances write to the same disk file which will fail 2) have two inconsistent search indexes which will be extremely irritating or 3) turn off Roller's build-in search and use some external spider.

For those who are not happy with those choices, I offer this proposal to use a) Apache Solr and b) some improvements to Roller's plug-in infrastructure to enable a cluster-able search implementation in Roller.

Design

The basic idea is to embed a search engine in Roller, one that has a web services interface. In a cluster, one instance of Roller will run this embedded search engine and all instances will call it via web services to index, re-index, de-index and search. This will make Roller's search facility cluster-able.

Here's the plan in four steps.

ONE. Embed Solr search engine in Roller, and expose it's web services interface

allow it to run on one host of the system
all hosts will call the search service via web services
implementation:
- include Solr jars
- add Solr Servlet to web.xml, ensure it abides by Roller's search configuration properties
- enable authentication

Introduce WeblogListener and supporting infrastructure in manager implementations that do CRUD on entries and comments. Here's a rough outline of the plug-in interface, which is still TBD.

/** implement this to get notification of changes to Roller weblogs */
public interface WeblogListerner {
   entryAdded()
   entryUpdated()
   entryRemoved()
   spamCommentDetected()
   commentAdded()
   commentUpdated()
   commentRemoved()
}

TWO. Create a new SolrSearchManager implementation that calls Solr Web Services to manage the index, perform searches, etc. This may require some changes to the SearchManager interface and work to keep the Lucene implementation in sync.

THREE. Create a SearchWeblogListener implementation that indexes, re-indexes and de-indexes as needed by calling the SearchManager interface.

FOUR. Replace calls in Struts actions to search manager for indexing, re-indexing, etc. They are no longer needed now that SearchWeblogListener is in place.

And we're done. Folks who want clustering can use the new SolrSearchManager, folks who don't can continue to use LucenseSearchManager.

Child pages

Proposal - Clusterable Search Via Solr

Abstract

Design