(Author: Yuan Tian)

  • Recovery is performed at the granularity of the virtual storage group. Each virtual storage group recovers independently, one virtual storage group becomes available immediately after it has recovered.
  • Recovery is started from the recover() method under StorageGroupProcessor class, then transferred to logical storage group, virtual storage group, time partition and TsFile.

Recover the whole IoTDB

org.apache.iotdb.db.engine.StorageEngine.recover()

  1. Set isAllSgReady to false and create recoveryThreadPool, which is used to execute recovery task of each virtual storage group.

  2. Call the asyncRecover method of each logical storage group, which submits virtual storage group recovery task to recoveryThreadPool. The details of this asynchronous recovery process are as below:

    1. Set isVsgReady[i] to false.

    2. Call StorageEngine.buildNewStorageGroupProcessor -> constructor of StorageGroupProcessor -> StorageGroupProcessor.recover to recover each virtual storage group(details of StorageGroupProcessor.recover will be explained in the next section) .

    3. Set isVsgReady[i] to true and set virtualStorageGroupProcessor[i] to the recovered virtual storage group. After this, user can read/write this logical storage group.

  3. Create asynchronous thread recoverEndTrigger to wait all virtual storage group ready, then set isAllSgReady to true and shutdown recoveryThreadPool.

Recover one Virtual Storage Group

org.apache.iotdb.db.engine.storagegroup.StorageGroupProcessor.recover()

  1. First get all the data files ending with .tsfile in the virtual storage group and return corresponding TsFileResource object. File types are as follows:

    • Sequence Files
      • new version: 0.10 version tsfiles(sealed/unsealed)
      • old version: 0.9 version tsfiles(sealed)
    • Unsequence Files
      • new version: 0.10 version tsfiles(sealed/unsealed)
      • old version: 0.9 version tsfiles(sealed)
  2. If 0.9 version TsFile exists in the storage group, add the old version's sequence and unsequence files to upgradeSeqFileList and upgradeSeqFileList respectively for upgrade and query.

  3. Group sequence and unsequence files according to time partition id and store as Map<Long, List<TsFileResource>>.

  4. Call the recoverTsFiles method to recover all sequence/unsequence files of each time partition(details of this method will be explained in the next section) .
  5. Check whether there is a Modification file during the merge, and call the RecoverMergeTask.recoverMerge method to recover the merge.
  6. Call the recoverCompaction method to recover the compaction.
  7. Traverse all sequence/unsequence tsfiles(including old version files) and call the updatePartitionFileVersion method on them to update the version number of each time partition.

  8. Call the updateLastestFlushedTime() method to update the latestTimeForEachDevice, partitionLatestFlushedTimeForEachDevice and globalLatestFlushedTimeForEachDevice with old version sequential tsfile.

    • latestTimeForEachDevice records the latest timestamp under each partition that all devices have been inserted into (including unflushed and flushed)
    • partitionLatestFlushedTimeForEachDevice records the latest timestamp of all devices under each partition that has been flushed. It is used to determine whether a newly inserted point is out of order.
    • globalLatestFlushedTimeForEachDevice records the latest timestamp of all devices that have been flushed (a summary of the latest timestamps of each partition)
  9. Finally traverse all restored sequence files to update latestTimeForEachDevice, partitionLatestFlushedTimeForEachDevice and globalLatestFlushedTimeForEachDevice again

Recover all TsFiles(Seq/Unseq) in one partiton

org.apache.iotdb.db.engine.storagegroup.StorageGroupProcessor.recoverTsFiles(List<TsFileResource> tsFiles, boolean isSeq)

Traverse all TsFiles passed in and recovering them one by one.

  1. Construct a TsFileRecoverPerformer object to recover the TsFile

  2. Call the recover() method of TsFileRecoverPerformer, whether to redo wal is judged by which level the tsfile is at(details of this method will be explained in the next section) .
    • If the tsfile isn‘t at the level 0, the recover() method doesn't need redoing wal, just adds the restored TsFileResource to TsFileManagement and skip subsequent procedures.
    • If the tsfile is at the level 0, the recover() method needs redoing wal and execute subsequent procedures.
  3. Whether to construct TsFileProcessor is judged by if the tsfile is the last file.
    • If the tsfile is not the last file or cannot be written into, just set the closed attribute of the TsFileResource to true.

    • If the tsfile is the last file and can be written into, it means that this is the last TsFile of this partition, and it is unsealed, so keep it unsealed and construct a TsFileProcessor object for it, then place the TsFileProcessor object in workSequenceTsFileProcessors or workUnsequenceTsFileProcessors.

  4. Finally, add the TsFileResource object into TsFileManagement.

Recover one TsFile

org.apache.iotdb.db.writelog.recover.TsFileRecoverPerformer.recover(boolean needRedoWal, Supplier<ByteBuffer[]> supplier, Consumer<ByteBuffer[]> consumer)

  1. First use the tsfile to construct a RestorableTsFileIOWriter object. In the construction method of RestorableTsFileIOWriter, the content of the tsfile will be checked and truncated if necessary

    1. If there is nothing in this file, write MAGIC_STRING and VERSION_NUMBER for it, and return directly. At this time, crashed is false, and canWrite is true;
    2. If there is content in this file, construct a TsFileSequenceReader object to parse the content, call the selfCheck method:
      1. initialize truncatedSize to HeaderLength
      2. If the content of the file is complete (have a complete header of MAGIC_STRING and VERSION_NUMBER, and a tail of MAGIC_STRING), return TsFileCheckStatus.COMPLETE_FILE
      3. If the file length is less than HeaderLength(len(MAGIC_STRING) + len(VERSION_NUMBER)), or the content of the file header is not MAGIC_STRING + VERSION_NUMBER, return INCOMPATIBLE_FILE
      4. If the file length is exactly equal to HeaderLength, and the file content is MAGIC_STRING + VERSION_NUMBER, then retunr HeaderLength
      5. If the file length is greater than HeaderLength and the file header is legal, but there is no MAGIC_STRING at the end of the file, it means that the file is incomplete and needs to be truncated. Read from VERSION_NUMBER position, read out the data in the following chunk, and recover the ChunkMetadata based on the data in the chunk. If you encounter CHUNK_GROUP_FOOTER, it means that the entire ChunkGroup is complete. Update truncatedSize to the current position
      6. Return truncatedSize
    3. truncated the file according to the returned truncatedSize
      1. If truncatedSize is equal to TsFileCheckStatus.COMPLETE_FILE, set crashed and canWrite to false, and close the output stream of the file
      2. If truncatedSize is equal to TsFileCheckStatus.INCOMPATIBLE_FILE, the output stream of the file is closed and an exception is thrown
      3. Otherwise, set crashed and canWrite to true and truncated the file to truncatedSize
  2. Judge whether the file is complete by the returned RestorableTsFileIOWriter

    1. If the TsFile file is complete

      1. If the resource file corresponding to the TsFile exists, the resource file is deserialized (including the minimum and maximum timestamps of each device in the tsfile), and the file version number is restored
      2. If the resource file corresponding to the TsFile does not exist, regenerate the resource file and persist it to disk.
      3. Return the generated RestorableTsFileIOWriter, skip subsequent procedures.
    2. If TsFile is incomplete

      1. Call recoverResourceFromWriter to recover the resource information through the ChunkMetadata information in RestorableTsFileIOWriter
      2. Call the redoLogs method to write the data in one or more wal files corresponding to this file to a temporary Memtable and persist to this incomplete TsFile
        1. For sequential files, skip WALs whose timestamp is less than or equal to the current resource
        2. For unsequential files, redo all WAL, it is possible to repeatedly write to ChunkGroup of multiple devices
      3. If the TsFile is not the last TsFile of the current partition, or there is a .closing file in the TsFile, call the endFile() method of RestorableTsFileIOWriter to seal the file, delete the .closing file and generates resource file for it.
  • No labels