Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add top-level collection/backup-name directories

...

Backups under the proposed incremental format would create a file tree like the one shown below at a given "location" (i.e. location parameter value) in the backup repository.  Each "location" can store an arbitrary number of backups, but all backups must be for the same collectionorganized at the highest level by the "backup name" (usually but not always the name of the collection being backed up).


Code Block
languagetext
titleOverall File Layout
/backup_location
    /techproducts
	    /backup_0.properties
	    /backup_1.properties
	    /shard_backup_metadata
		    /md_shard1_id_0.json
		    /md_shard2_id_0.json
		    /md_shard1_id_1.json
		    /md_shard2_id_1.json
	    /index
		    /0DD2971A-53D6-4224-A49B-8AC90D158F97
		    /1AA2CF56-BFA0-40D5-8B9B-5CAD47B07396
            ...
	    /zk_backup_0
            conf/
                <configset files>
            state.json
            ...
	    /zk_backup_1
            ...


The file listing above shows a single backup "location" which contains two incremental backups ("0", and "1").  Several different classes of files can be seen:

...

  1. Solr receives a request to backup the single-shard "techproducts" collection.
  2. Solr looks at the chosen repository and location , location, and collection/backup name to find the most recent backup available. (Unfortunately this does require a repository "list" operation on the root location "/backup_location/techproducts" to identify the most recent backup_N.properties file. The cloud storage offered by many cloud providers is "eventually consistent", so these list operations are avoided whereever possible.) The returned file listing informs Solr that there are currently no backups for the specified collection at the specified location, so the current backup will be "0".
  3. Solr gathers the index files on the shard-leader. It gives each a UUID and uploads each file to /techproducts/index/<UUID>, computing a checksum and remembering the size as each file is uploaded.
  4. Solr uses the information computed during file-upload to create a shard-level metadata file, with pointers to each Lucene index file. This file is uploaded to the repository as /techproducts/shard_backup_metadata/md_shard1_id_0
    2. Solr looks at the chosen repository and location to find the most recent backup available. (Unfortunately this does require a repository "list" operation on the root location to identify the most recent backup_N.properties file. The cloud storage offered by many cloud providers is "eventually consistent", so these list operations are avoided whereever possible.) The returned file listing informs Solr that there are currently no backups at the specified location, so the current backup will be "0".
  5. With all index data uploaded, Solr creates the "zk_backup_0" directory under the root location, fetches all necessary data from ZK, and stores it there.
  6. With all other backup information persisted to the repository, Solr persists the collection-level metadata file "backup_0.properties" to advertise that a completed backup is now available.

...

  1. Solr receives a request to backup the single-shard "techproducts" collection.
  2. Solr looks at the chosen repository, location, and location collection to find the most recent backup available. (As before, this requires a "list" operation.) The file listing informs Solr that there is an existing backup "0", making the current backup "1" accordingly.
  3. Solr reads backup_0.properties. In this file, Solr reads the pointer to the shard-metadata file for "techproduct"s only shard: '/techproducts/shard_backup_metadata/md_shard1_id_0'. Solr fetches this file as well.
  4. Solr gathers the index files on the shard leader. For each, it checks whether the file has already been uploaded according to the records in md_shard1_id_0. If the shard-metadata file has an entry for a given local file, and the recorded checksum and file-size match those exhibited by the local file, the local file is skipped over. Otherwise the file is uploaded as in step (3) from "Initial Backup".

  5. Solr builds the md_shard1_id_1 file based on data computed from the just-uploaded files, and entries that matched from the previous backup. This file is uploaded as /techproducts/shard_backup_metadata?/md_shard1_id_1
  6. ZK data and the collection-level metadata file are created and stored as in the concluding steps of the "Initial Backup".