Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

The OODT File Manager is only compatible with a Solr version of 4.X or above - i.e. no support is provided for interacting with pre-existing Solr installations of 3.X or below. The technical motivation is that the Solr implementation of the OODT Catalog interface relies heavily on the "atomic update" functionality that was introduced in Solr 4 - i.e. the ability to update single parts of a Solr document without the need to re-index the whole document. The logical rationale behind this decision is that this is a new functionality provided to the OODT framework, and consequently there is no need to support legacy deployments: a project that wishes to leverage this architecture might as well start anew with the most up to date version of Solr, instead of installing an older version.

...

When a physical product is sent for archiving to the File Manager, the associated metadata must be transformed into query-able information that is stored in the back-end Solr catalog. By default, the Solr File Manager will transform each product into one corresponding Solr document, thus generating a single searchable record in the Solr index. Each product attribute is transformed into a corresponding Solr field Solr field with the same name and value(s) (note that all fields must be defined in the project specific schema deployed with the Solr installation).

Alternatively, a project may provide its own algorithm for generating Solr records from a CAS product by implementing the ProductSerializer interface. For example, a project that manages products composed of full directories may wish to create a "collection"-level Solr record for the enclosing directory, and separate "file"-level Solr records for each file in the directory. These different record types could be stored in the same Solr core,  or or sent to separate Solr cores.

...

Additionally, the following properties control how products are ingested and extracted into/from the Solr server,
i.e. the implementations used for the extension points described above. These properties have default values,
and need to be set only when the default is not the desired behavior.

  • org.apache.oodt.cas.filemgr.catalog.solr.productIdGenerator=org.apache.oodt.cas.filemgr.catalog.solr.UUIDProductIdGenerator
    • Optional: controls the algorithm for generating the product unique identifier when it is first stored in the catalog.
    • Default: UUIDProductIdGenerator: this class generates a new UUID every time a product is indexed.
    • Alternative out of the box implementation: NameProductIdGenerator: this class will assign the product an identifier equal to the product name.
    • Alternatively: provide any custom implementation of the ProductIdGenerator interface.
  • org.apache.oodt.cas.filemgr.catalog.solr.productSerializer=org.apache.oodt.cas.filemgr.catalog.solr.DefaultProductSerializer
    • Optional: controls the format of the documents ingested into Solr, i.e. how a CAS product object is transformed into one (or more) Solr records; and vice-versa how CAS products are queried back from the Solr index
    • Default: DefaultProductSerializer: creates one Solr record for each incoming CAS product:
      • the product core attributes (id, name, type) are converted to Solr fields starting with "CAS." ("CAS.ProductId", "CAS.ProductName", ....)
      • the product identifier is used again to assign the Solr record identifier (i.e. "id" and "CAS.ProductId" have the same value)
      • the product references are converted into Solr fields starting with ("CAS.Reference..." or "CAS.RootReference...")
      • the product metadata attributes are converted into Solr fields with the same name and number of valuvalues
    • Alternative: any custom implementation of the ProductSerializer interface can be used.

...

The file schema.xml, part of each specific Solr deployment, defines which metadata fields are stored in the Solr index, and can consequently be queried and queried and retrieved by clients. Note that no metadata field can be ingested in Solr unless it is defined (explicitely or implicitely) in schema.xml. Additionally, a specific requirement of the File Manager - Solr integration is that each metadata field included in schema.xml must be "stored" (i.e.  defined defined with stored="true"), so that it can be retrieved and re-inserted during partial document updates.

Each project using a Solr File Manager is responsible for creating and deploying a schema.xml file that is consistent with its own algorithms for generating Solr documents from product metadata, and viceversa (as defined by the specific implementations implementation of the ProductSerializer  interfaceinterface). An example schema.xml is provided as part of the File Manager distribution in the _resources/ sub-directory. This schema This schema is compatible with the class DefaultProductSerializer (the default implementation of ProductSerializer), and contains the following definitions:

...