Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Now every LSM component would have a component Id (including both memory and disk components). A component Id is represented as a interval of two timestamps. A memory component has a mutable Id, while disk which is reset every time when the memory component is recycled. Disk components have immutable Ids ids which are persisted in the component metadata. When create a component, the caller needs to populate the proper Id for the result component.

Memory Component

A memory component receives the Id when it is created (activated). Since memory components of all indexes are always activated togetherCurrently we use LSMIOOperationCallback to maintain component ids, the same as maintaining component LSNs. For all indexes of a dataset partition, we have to guarantee all these memory components receive the same Id upon activation. To achieve this, we introduce the LSM component Id generator, which is shared across by the dataset. The id generator (CorrelatedLSMComponentIdGenerator in codebase) supports two operationoperations, GetId and RefreshId. GetId would always return the same Id if RefreshId is not called. However, after RefreshId is called, GetId is guaranteed to return a new Id based on the current timestamp. Memory components call GetId to receive a new Id from the Id generator, and they receive the same Id upon activation. However, before old components are flushed, we call RefreshId of the id generator such that the new memory components can receive a new Id. 

Here is the basic workflow of id management for memory components during flush

  • PrimaryIndexOpTracker: call refreshId
  • PrimaryIndexOpTracker: call RefreshIdupdateLSN (which stores the latest id from id generator)
  • PrimaryIndexOpTracker: schedule flush of memory components
  • New When new memory components are created (logically), and call GetId to get Id from the id generatoractivated, it fetches the previous stored id.
  • ...

Disk Component

A disk component has the immutable Id, which is persisted in the component metadata. When a disk component is created, the caller needs to set its Id properly. A disk component can be created based on the following cases:

...

Currently we only allow an empty dataset to be loaded once. To be consistent with the design of Id generatorFor simplicity, we simply simple assign the bulk loaded components with Id returned from the Id generator. At the end of load, we refresh memory components of all indexes to receive a new Id, because the previous Id has been occupied by the loaded componentId [0, 0].

External Dataset

Component Id acceleration is mainly designed for internal datasets. However, to be consistent in the codebase, components of external datasets always get id [0, 0].

...