(Author: Yuan Tian)
- Recovery is performed at the granularity of the virtual storage group. Each virtual storage group recovers independently, one virtual storage group becomes available immediately after it has recovered.
- Recovery is started from the recover() method under StorageGroupProcessor class, then transferred to logical storage group, virtual storage group, time partition and TsFile.
Recover the whole IoTDB
org.apache.iotdb.db.engine.StorageEngine.recover()
Set isAllSgReady to false and create recoveryThreadPool, which is used to execute recovery task of each virtual storage group.
Call the asyncRecover method of each logical storage group, which submits virtual storage group recovery task to recoveryThreadPool. The details of this asynchronous recovery process are as below:
Set isVsgReady[i] to false.
Set isVsgReady[i] to true and set virtualStorageGroupProcessor[i] to the recovered virtual storage group. After this, user can read/write this logical storage group.
Create asynchronous thread recoverEndTrigger to wait all virtual storage group ready, then set isAllSgReady to true and shutdown recoveryThreadPool.
Recover one Virtual Storage Group
org.apache.iotdb.db.engine.storagegroup.StorageGroupProcessor.recover()
First get all the data files ending with .tsfile in the virtual storage group and return corresponding TsFileResource object. File types are as follows:
- Sequence Files
- new version: 0.10 version tsfiles(sealed/unsealed)
- old version: 0.9 version tsfiles(sealed)
- Unsequence Files
- new version: 0.10 version tsfiles(sealed/unsealed)
- old version: 0.9 version tsfiles(sealed)
- Sequence Files
If 0.9 version TsFile exists in the storage group, add the old version's sequence and unsequence files to
upgradeSeqFileList
andupgradeSeqFileList
respectively for upgrade and query.Group sequence and unsequence files according to time partition id and store as
Map<Long, List<TsFileResource>>.
- Call the recoverTsFiles method to recover all sequence/unsequence files of each time partition
- Check whether there is a Modification file during the merge, and call the
RecoverMergeTask.recoverMerge
method to recover the merge. - Call the recoverCompaction method to recover the compaction.
Traverse all sequence/unsequence tsfiles(including old version files) and call the updatePartitionFileVersion method on them to update the version number of each time partition.
Call the
updateLastestFlushedTime()
method to update thelatestTimeForEachDevice
,partitionLatestFlushedTimeForEachDevice
andglobalLatestFlushedTimeForEachDevice
with old version sequential tsfile.latestTimeForEachDevice
records the latest timestamp under each partition that all devices have been inserted into (including unflushed and flushed)partitionLatestFlushedTimeForEachDevice
records the latest timestamp of all devices under each partition that has been flushed. It is used to determine whether a newly inserted point is out of order.globalLatestFlushedTimeForEachDevice
records the latest timestamp of all devices that have been flushed (a summary of the latest timestamps of each partition)
Finally traverse all restored sequence files to update
latestTimeForEachDevice
,partitionLatestFlushedTimeForEachDevice
andglobalLatestFlushedTimeForEachDevice
again
Recover a TsFile(Seq/Unseq) of each partiton
org.apache.iotdb.db.engine.storagegroup.StorageGroupProcessor.recoverTsFiles(List<TsFileResource> tsFiles, boolean isSeq)
Traverse all TsFiles passed in and recovering them one by one.
Construct a
TsFileRecoverPerformer
object to recover the TsFile- Call the recover() method of TsFileRecoverPerformer, whether to redo wal is judged by which level the tsfile is at
- If the tsfile isn‘t at the level 0, the recover() method doesn't need redoing wal, just adds the restored TsFileResource to TsFileManagement and skip subsequent procedures.
- If the tsfile is at the level 0, the recover() method needs redoing wal and execute subsequent procedures.
- Whether to construct TsFileProcessor is judged by if the tsfile is the last file.
If the tsfile is not the last file or cannot be written into, just set the
closed
attribute of theTsFileResource
totrue
.If the tsfile is the last file and can be written into, it means that this is the last TsFile of this partition, and it is unsealed, so keep it unsealed and construct a
TsFileProcessor
object for it, then place the TsFileProcessor object in workSequenceTsFileProcessors or workUnsequenceTsFileProcessors.
Finally, add the
TsFileResource
object into TsFileManagement.
Details about recovering a TsFile
org.apache.iotdb.db.writelog.recover.TsFileRecoverPerformer.recover(boolean needRedoWal, Supplier<ByteBuffer[]> supplier, Consumer<ByteBuffer[]> consumer)
First use the tsfile to construct a
RestorableTsFileIOWriter
object. In the construction method ofRestorableTsFileIOWriter
, the content of the tsfile will be checked and truncated if necessary- If there is nothing in this file, write
MAGIC_STRING
andVERSION_NUMBER
for it, and return directly. At this time,crashed
isfalse
, andcanWrite
istrue
; - If there is content in this file, construct a
TsFileSequenceReader
object to parse the content, call theselfCheck
method, truncate the incomplete content and initializetruncatedSize
toHeaderLength
- If the content of the file is complete (have a complete header of
MAGIC_STRING
andVERSION_NUMBER
, and a tail ofMAGIC_STRING
), returnTsFileCheckStatus.COMPLETE_FILE
- If the file length is less than
HeaderLength(len(MAGIC_STRING) + len(VERSION_NUMBER))
, or the content of the file header is notMAGIC_STRING + VERSION_NUMBER
, returnINCOMPATIBLE_FILE
- If the file length is exactly equal to
HeaderLength
, and the file content isMAGIC_STRING + VERSION_NUMBER
, then retunrHeaderLength
- If the file length is greater than
HeaderLength
and the file header is legal, but there is noMAGIC_STRING
at the end of the file, it means that the file is incomplete and needs to be truncated. Read fromVERSION_NUMBER
position, read out the data in the following chunk, and recover the ChunkMetadata based on the data in the chunk. If you encounterCHUNK_GROUP_FOOTER
, it means that the entire ChunkGroup is complete. UpdatetruncatedSize
to the current position - Return
truncatedSize
- If the content of the file is complete (have a complete header of
- truncated the file according to the returned
truncatedSize
- If
truncatedSize
is equal toTsFileCheckStatus.COMPLETE_FILE
, setcrashed
andcanWrite
tofalse
, and close the output stream of the file - If
truncatedSize
is equal toTsFileCheckStatus.INCOMPATIBLE_FILE
, the output stream of the file is closed and an exception is thrown - Otherwise, set
crashed
andcanWrite
totrue
and truncated the file totruncatedSize
- If
- If there is nothing in this file, write
Judge whether the file is complete by the returned RestorableTsFileIOWriter
If the TsFile file is complete
- If the resource file corresponding to the TsFile exists, the resource file is deserialized (including the minimum and maximum timestamps of each device in the tsfile), and the file version number is restored
- If the resource file corresponding to the TsFile does not exist, regenerate the resource file and persist it to disk.
- Return the generated
RestorableTsFileIOWriter
If TsFile is incomplete
- Call
recoverResourceFromWriter
to recover the resource information through the ChunkMetadata information inRestorableTsFileIOWriter
- Call the
redoLogs
method to write the data in one or more wal files corresponding to this file to a temporary Memtable and persist to this incomplete TsFile- For sequential files, skip WALs whose timestamp is less than or equal to the current resource
- For unsequential files, redo all WAL, it is possible to repeatedly write to ChunkGroup of multiple devices
- If the TsFile is not the last TsFile of the current partition, or there is a
.closing
file in the TsFile, call theendFile()
method ofRestorableTsFileIOWriter
to seal the file, delete the.closing
file and generates resource file for it.
- Call