Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state: "Under DiscussionFinished"

Discussion thread: here

JIRA: here (<- link to https://issues.apache.org/jira/browse/SOLR-XXXX)

Released: NA

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast). Confluence supports inline comments that can also be used.

Motivation

Solr's current backup/restore functionality has several frustrating limitations.

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySOLR-15086

Released: no

Motivation

Solr's current backup/restore functionality has several frustrating limitations.

Current index backups are based on full Current index backups are based on full snapshots.  Snapshot-based backups are slow and expensive because they copy all the files of the index regardless of how little the index may have changed since the last backup.  Some, much, or all of the backup may be spent transferring data already present in the backup repository - a needless inefficiency.

...

Solr can support restoring to existing collections by making use of the "read only" mode that was introduced in SOLR-1372113271.  The restore API can put the target collection in read-only mode, restore a backup for each shard, and then toggle off "read only" mode.

...

Regardless of the BackupRepository in use, this SIP proposes that backups be taken in an incremental manner, so that only those index files not stored by previous backups will be stored for the given backup.  This will result in changes to the format of each backup.  The general thrust of these changes is that a given backup "location" can (and should) be used to store multiple backups, and that backup includes a metadata file used to indicate which Lucene index files are a part of the backup and the path to each of these within the umbrella backup "location".

Proposed Changes:

...

Backup File Format

For details on the specific backup file format being proposed and how it enables incremental backups to be done accurately, see the SIP sub-page dedicated to this topic here.

Proposed Changes: HTTP API

As mentioned above this SIP proposes small tweaks to the existing backup and As mentioned above this SIP proposes small tweaks to the existing backup and restore APIs.  These are described in more detail below.

...

(If the community disagrees and wants Solr releases to support creation of both types of backups simultaneously, the existing "repository" API parameter can be used to disambiguate the type of backup to be created.)

Example (assumes SIP is completed by Solr 8.9.0): UserA runs a single-node 8.7.0 cluster and creates regular backups for their collections.  When 8.9.0 is released they perform one final snapshot-based backup and upgrade their cluster.  Shortly after upgrading their harddisk fails.  They are able to restore the old snapshot-based backup by using Solr's backup API.  As time goes on they can take backups with the same API call they've used previously, though the files on disk for each of these are now in the incremental-backup format.

...

titleUnder Discussion

...

.


Test Plan

Much of this SIP can be tested as any other Solr functionality.  The API changes, the framework for backups and restoration, the new restoration to existing collections functionality, and our current "BackupRepository" implementations (HDFS and local file system) can all be tested in the usual way as JUnit tests.  These will likely be built off of a modified AbstractCloudBackupRestoreTestCase.  Tests for the API can use LocalFileSystemRepository without any burdensome setup.

...