Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state: "Under DiscussionFinished"

Discussion thread: here (<- link to https://mail-archives.apache.org/mod_mbox/lucene-dev/)

JIRA: here (<- link to https://issues.apache.org/jira/browse/SOLR-XXXX)

Released: NA

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast). Confluence supports inline comments that can also be used.

Motivation


JIRA:

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySOLR-15086

Released: no

Motivation

Solr's current backup/restore functionality Solr's current backup/restore functionality has several frustrating limitations.

...

  1. Incremental i.e. only the changed data is saved instead of a full copy. This allows users to save on storage costs as well as significantly speed up the backup process if only small changes have happened since the last backup.
  2. Cloud friendly i.e. supports one or more blob storage systems available in major public clouds such as Amazon S3, Google Cloud Storage, Azure Blob Storage etc.
  3. Safe against index corruption i.e. backups should succeed only if the backed up index is not corrupt
  4. Restorable to existing collections i.e. it should be possible to restore to the source collection (or any existing collection, assuming it is compatible with the source)

Public Interfaces

The proposed changes involve changes to several different levels of public interfaces.

  • At the HTTP API level it proposes slight changes to the Backup and Restore APIs (at both the Collection and Core Admin layers).  It also proposes the introduction of two wholly new backup APIs, a "list backups" API, and a "delete backup" API.
  • At the Java API level it proposes changes to the interfaces used to define backup repositories.  Notably: `BackupRepository`.
  • At the file-format level this SIP proposes a new format for storing backups on disk, which is a public interface in a very limited sense.

See the "Proposed Changes" section below for details on how these interfaces are likely to change.

Proposed Changes: Overall

This SIP proposes a handful of related changes to Solr's backup/restore functionality.  These changes will be discussed as they relate to the four motivations mentioned in the "Motivation" section above.

Restore to existing collections

Solr can support restoring to existing collections by making use of the "read only" mode that was introduced in SOLR-13721.  The restore API can put the target collection in read-only mode, restore a backup for each shard, and then toggle off "read only" mode.

Safety against corruption

Corruption-checks can be done by computing a checksum for each index file as it is read and prepared for uploading to the backup repository.  This checksum can then be compared to the checksum stored at the end of each Lucene index file.  If the checksum's don't match the backup can be aborted.

Observant readers might recognize (1) and (3) from code that Cao Manh Dat proposed in SOLR-13608.  In that sense this SIP is a superset of that ticket, created to cover a broader swath of functionality and generate more discussion/review of the design.  Both Dat's ticket and this SIP are informed by code written by Dat, Shalin Mangar, and others, which is available in rough form here

Public Interfaces

The proposed changes involve changes to several different levels of public interfaces.

  • At the HTTP API level it proposes slight changes to the Backup and Restore APIs (at both the Collection and Core Admin layers).  It also proposes the introduction of two wholly new backup APIs, a "list backups" API, and a "delete backup" API.
  • At the Java API level it proposes changes to the interfaces used to define backup repositories.  Notably: `BackupRepository`.
  • At the file-format level this SIP proposes a new format for storing backups on disk, which is a public interface in a very limited sense.

See the "Proposed Changes" section below for details on how these interfaces are likely to change.

Proposed Changes: Overall

This SIP proposes a handful of related changes to Solr's backup/restore functionality.  These changes will be discussed as they relate to the four motivations mentioned in the "Motivation" section above.

Restore to existing collections

Solr can support restoring to existing collections by making use of the "read only" mode that was introduced in SOLR-13271.  The restore API can put the target collection in read-only mode, restore a backup for each shard, and then toggle off "read only" mode.

Safety against corruption

Corruption-checks can be done by computing a checksum for each index file as it is read and prepared for uploading to the backup repository.  This checksum can then be compared to the checksum stored at the end of each Lucene index file.  If the checksum's don't match the backup can be aborted.

This ensures that files are uncorrupted This ensures that files are uncorrupted when they are initially backed up.  It also saves on the expensive, full-file reads that would be necessary to compute a checksum by computing the checksum when we're already reading the file for backup.  This method however does not protect against the case where an existing file is corrupted _after_ backup.

...

Listing files in a directory is a common operation in backup and restore. However, the list of files are usually well known at write time. Therefore, we write a manifest file per backup, per directory (if needed) once all files in the directory have been written. This manifest lists the files that are part of the manifest (or directory). The list operation of the backup repository for blob stores can use the manifest file to return the list of files consistently. This is similar to how Lucene writes segment files at the end.

...

Regardless of the BackupRepository in use, this SIP proposes that backups be taken in an incremental manner, so that only those index files not stored by previous backups will be stored for the given backup.  This will result in changes to the format of each backup.  The general thrust of these changes is that a given backup "location" can (and should) be used to store multiple backups, and that backup includes a metadata file used to indicate which Lucene index files are a part of the backup and the path to each of these within the umbrella backup "location".

Proposed Changes:

...

Backup File Format

For details on the specific backup file format being proposed and how it enables incremental backups to be done accurately, see the SIP sub-page dedicated to this topic here.

Proposed Changes: HTTP API

As mentioned above this SIP proposes small tweaks to the existing backup and restore APIs.As mentioned above this SIP proposes small tweaks to the existing backup and restore APIs.  These are described in more detail below.

...

Code Block
languagetext
titleV1 Backup request
/admin/collections?action=BACKUP&
  name=myBackupName&
  collection=myCollectionName&
  location=/path/to/my/shared/drive&
  maxNumBackup=5

...


Code Block
languagejstext
titleV2 Backup response
linenumberstrue
request
POST /v2/collections

{
  "responseHeaderbackup-collection": {
    "statuscollection":0 "myCollectionName",
    "QTimename":61},
  "successmyBackupName":{,
    "127.0.0.1:60324_solrlocation":{"responseHeader":{
  "/path/to/my/shared/drive",
       "statusmaxNumBackup":0, 5
  }
}

Example backup response:

Code Block
languagejs
titleBackup response
linenumberstrue
{
  "responseHeader":{
    "status":0,
    "QTime":61},
  "success":{
    "      "QTime":31}},
    "127.0.0.1:6032360324_solr":{"responseHeader":{
        "status":0,
        "QTime":31}}},
    "collection":"127.0.0.1:60323_solr":{"responseHeader":{
        "status":0,
        "QTime":31}}},
  "collection":"myCollectionName",
  "numShards": 2,
  "backupId":0,
  "indexVersion":"8_4",
  "startTime":"2019-08-28T16:03:19.127Z",
  "indexSizeMB":0.004,
  "shards":{
    "shard2":{
      "startTime":"2
019-08-28T16:03:19.127Z",
      "indexFileCount":17,
      "uploadedIndexFileCount":17,
      "indexSizeMB":0.003,
      "uploadedIndexFileMB":0.003,
      "endTime":"2019-08-28T16:03:19.155Z",
      "shardBackupId":"md_shard2_id_0",
      "node":"127.0.0.1:60324_solr"},
    "shard1":{
      "startTime":"2019-08-28T16:03:19.127Z",
      "indexFileCount":17,
      "uploadedIndexFileCount":17,
      "indexSizeMB":0.003,
      "uploadedIndexFileMB":0.003,
      "endTime":"2019-08-28T16:03:19.155Z",
      "shardBackupId":"md_shard1_id_0",
      "node":"127.0.0.1:60323_solr"}}}

...

Code Block
languagetext
titleV1 Restore request
/admin/collections?action=RESTORE&
  name=myBackupName&
  location=/path/to/my/shared/drive&
  collection=myRestoredCollectionName&
  backupId=10

Delete Backup API

This is a completely new Collection API to delete a backup, rotate backups and/or purge unused files left behind by a failed backup operation. It supports the following parameters:

  1. name - A string backup name. This backup name is resolved against the given location. This is a required parameter.
  2. location - A string location of the backup. This is a required parameter.
  3. repository - A string name of the repository to be used. This is optional. If not present, the default repository configured in the solr.xml is used.
  4. Exactly 1 of the following parameters, which identify the backup data to be deleted.
    1. backupId - An integer backup ID whose files have to be deleted.
    2. maxNumBackup - An integer that limits the maximum number of backups to keep e.g. if maxNumBackup=5 then if number of backups in the provided location is more than 5, the oldest ones are deleted until only 5 backups exist.
    3. purge - A flag used to turn on 'purging' of any potentially "orphaned" files that are not part of any backup and therefore should be deleted.

At a given time, only one of backupId, maxNumBackup and purge parameters should be specified.

Code Block

/admin/collections?action=DELETE_BACKUP&
  name=myBackupName&
  location=/path/to/my/shared/drive&
  backupId=<number of backupId>

Example response when deleting a particular backupId:

Code Block
languagejs
{
  "responseHeader" : {
..},
  "collection" : "collection1",
  "deleted" : [
    {
      "backupId" : 2,
      "startTime" : "2019-08-27T09:11:17.230673Z",
      "size" : 9581,
      "numFiles" : 52
    }
  ]
}

Example response for purge:


Code Block
languagetext
titleV2 Restore request
POST /v2/collections
{
  "restore-collection": {
    "collection": "myRestoredCollectionName",
    "name": "myBackupName",
    "location": "/path/to/my/shared/drive",
    "backupId": 10 
  } 
} 

Delete Backup API

This is a completely new Collection API to delete a backup, rotate backups and/or purge unused files left behind by a failed backup operation. It supports the following parameters:

  1. name - A string backup name. This backup name is resolved against the given location. This is a required parameter.
  2. location - A string location of the backup. This is a required parameter.
  3. repository - A string name of the repository to be used. This is optional. If not present, the default repository configured in the solr.xml is used.
  4. Exactly 1 of the following parameters, which identify the backup data to be deleted.
    1. backupId - An integer backup ID whose files have to be deleted.
    2. maxNumBackup - An integer that limits the maximum number of backups to keep e.g. if maxNumBackup=5 then if number of backups in the provided location is more than 5, the oldest ones are deleted until only 5 backups exist.
    3. purge - A flag used to turn on 'purging' of any potentially "orphaned" files that are not part of any backup and therefore should be deleted.

At a given time, only one of backupId, maxNumBackup and purge parameters should be specified.

Code Block
titleV1 Delete Backup Request
/admin/collections?action=DELETE_BACKUP&
  name=myBackupName&
  location=/path/to/my/shared/drive&
  backupId=<number of backupId>


Code Block
titleV2 Delete Backup Request
POST /v2/collections/backups 
{
  "delete-backup": {
    "name": "myBackupName",
    "location": "/path/to/my/shared/drive",
    "backupId": 5 
  } 
} 

Example response when deleting a particular backupId:

Code Block
languagejs
Code Block

{
  "responseHeader" : {
..},
  "collection" : "collection1",
  "purgeddeleted" : [
    {
      "numIndexFilesbackupId" : 2,
      "startTime"  }
}

List Backup API

This is also a new Collection API to list the existing backups for a collection. Unlike the other Collection APIs described above, this API does not need to go through the overseer and can be answered by any node in the cluster. The following parameters are supported:

  1. name - A string name of the backup (usually the collection name)
  2. location - A string location of the backup. This is resolved against the repository.
  3. repository - An optional string to identify the repository. If none is provided, then the default repository configured in solr.xml is used.
Code Block
/admin/collections?action=LISTBACKUP&
  name=myBackupName&
  location=/path/to/my/shared/drive

Example response:

: "2019-08-27T09:11:17.230673Z",
      "size" : 9581,
      "numFiles" : 52
    }
  ]
}

Example response for purge:

Code Block

{
  "responseHeader" : {..
Code Block
{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "collection" : "backuprestore_testbackupinccollection1",
  "backupspurged" :[
    {
      "indexFileCountnumIndexFiles" : 2
    }
}

List Backup API

This is also a new Collection API to list the existing backups for a collection. Unlike the other Collection APIs described above, this API does not need to go through the overseer and can be answered by any node in the cluster. The following parameters are supported:

  1. name - A string name of the backup (usually the collection name)
  2. location - A string location of the backup. This is resolved against the repository.
  3. repository - An optional string to identify the repository. If none is provided, then the default repository configured in solr.xml is used.
Code Block
titleV1 List Backups API
/admin/collections?action=LISTBACKUP&
  name=myBackupName&
  location=/path/to/my/shared/drive


Code Block
titleV2 List Backups API
POST /v2/collections/backups 
{
  "list-backups": {
    "name": "myBackupName",
    "location": "/path/to/my/shared/drive" 
  } 
} 

Example response:

Code Block
{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "collection":"backuprestore_testbackupinc",
  "backups":[
    {:26,
      "indexSizeMB":0.004,
      "shardBackupIds":{
        "shard2":"md_shard2_id_2",
        "shard1":"md_shard1_id_2"},
      "collection.configName":"conf1",
      "backupId":2,
      "collectionAlias":"backuprestore_testbackupinc",
      "startTime":"2019-08-28T16:02:11.485Z",
      "indexVersion":"8.2.1"},
    {
      "indexFileCount":2,
      "indexSizeMB":0.0,
      "shardBackupIds":{
        "shard2":"md_shard2_id_3",
        "shard1":"md_shard1_id_3"},
      "collection.configName":"conf1",
      "backupIdindexFileCount":326,
      "collectionAliasindexSizeMB":"backuprestore_testbackupinc"0.004,
      "startTimeshardBackupIds":"2019-08-28T16:02:14.375Z",
{
        "indexVersionshard2":"8.2.1"}md_shard2_id_2",
    {
      "indexFileCountshard1":"md_shard1_id_2"},
      "indexSizeMBcollection.configName":0.0"conf1",
      "shardBackupIdsbackupId":{2,
        "shard2collectionAlias":"md_shard2_id_4backuprestore_testbackupinc",
        "shard1startTime":"md_shard1_id_4"}2019-08-28T16:02:11.485Z",
      "collection.configNameindexVersion":"conf18.2.1"},
    {
      "backupIdindexFileCount":42,
      "collectionAliasindexSizeMB":"backuprestore_testbackupinc"0.0,
      "startTime":"2019-08-28T16:02:14.406ZshardBackupIds":{
        "shard2":"md_shard2_id_3",
        "indexVersionshard1":"8.2.1"}]}

CoreAdmin APIs

Backup Core API

This is supposed to be an internal API to be called by the Backup Collection API. It supports two new parameters:

  1. shardBackupId - (Required) The shard backup ID assigned by the Backup Collection API for the current backup.
  2. prevShardBackupId - The previous shard backup ID against which the incremental backup is to be made. The previous shard backup is used as the base to find changed data.
md_shard1_id_3"},
      "collection.configName":"conf1",
      "backupId":3,
      "collectionAlias":"backuprestore_testbackupinc",
      "startTime":"2019-08-28T16:02:14.375Z",
      "indexVersion":"8.2.1"},
    {
      "indexFileCount":2,
      "indexSizeMB":0.0,
      "shardBackupIds":{
        "shard2":"md_shard2_id_4",
        "shard1":"md_shard1_id_4"},
      "collection.configName":"conf1",
      "backupId":4,
      "collectionAlias":"backuprestore_testbackupinc",
      "startTime":"2019-08-28T16:02:14.406Z",
      "indexVersion":"8.2.1"}]}

CoreAdmin APIs

Backup Core API

This is supposed to be an internal API to be called by the Backup Collection API. It supports two new parameters:

  1. shardBackupId - (Required) The shard backup ID assigned by the Backup Collection API for the current backup.
  2. prevShardBackupId - The previous shard backup ID against which the incremental backup is to be made. The previous shard backup is used as the base to find changed data.


Code Block
titleV1 Backup Core API
admin/cores?action=BACKUPCORE&
  core=core-node1&
  location=/path/to/my/shared/drive/myBackupName&
  prevShardBackupId=md_shard1_id_0
  shardBackupId=md_shard1_id_1


Code Block
titleV2 Backup Core API
POST /v2/cores/someCoreName

{
  "backup-core": {
    "location": "
Code Block
admin/cores?action=BACKUPCORE&
  core=core-node1&
  location=/path/to/my/shared/drive/myBackupName&
  prevShardBackupId=with/backupName",
    "shardBackupId": "md_shard1_id_1",
    "prevShardBackupId": "md_shard1_id_0"
  shardBackupId=md_shard1_id_1}
}

Restore Core API

This is also an internal API to be called by the Restore Collection API. It supports two new parameters:

  1. incremental – An optional boolean that signals whether the data being restored is in the "incremental" format or not. Defaults to false.
  2. shardBackupId - The shard backup ID to be restored. This is a required parameter if incremental=true is specified.
  3. not. Defaults to false.
  4. shardBackupId - The shard backup ID to be restored. This is a required parameter if incremental=true is specified.


Code Block
titleV1 Restore Core API
admin/cores?action=RESTORECORE&
  core=core-node1&
  incremental=true&
  location=/path/to/my/shared/drive/myBackupName&
  shardBackupId=md_shard1_id_1


Code Block
titleV2 Restore Core API
POST /v2/cores/someRestoreCoreName

{
  "restore-core": {
    "incremental": true,
    "location": "/path/to/shared/drive/with/backupName",
    "shardBackupId": "
Code Block
admin/cores?action=RESTORECORE&
  core=core-node1&
  incremental=true&
  location=/path/to/my/shared/drive/myBackupName&
  shardBackupId=md_shard1_id_1"
  }
}

Compatibility, Deprecation, and Migration Plan

...