Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When the content of a database is delivered by a CDC system, though the size of the dataset i.e. the database is limited, however its changes can continue forever, therefore unbounded from the receiver's perspective. The concept of versioning isn't necessary. However, But it is important for adjunct data store to be able to provide a consistent snapshot of the database, this can be achieved as long as the CDC system provides ordering and at least once delivery guarantees. Regardless if a steam are partitioned or notWhen a dataset is delivered through one stream, we guarantee consistency at container level, i.e. once bootstrap is complete the adjunct data store can be treated as a snapshot (or of the database within a container. When a dataset is delivered through multiple streams, we guarantee consistency at task level, i.e. once bootstrap is complete the adjunct data store can be treated as a fraction of a database) snapshot of the database within a containertask. No  No guarantee is provided at job level.

 

Bounded dataset

When the sources are read-only files, for example a machine learning model, they are by nature size-bounded. However, we should expect new versions of the dataset to be produced over time. It is desirable to be able to incorporate new versions without interrupting current operation. Similar to unbounded dataset, a copy of a set of files can be produced by bootstrap process, and thereafter processing of main input follows. The requires the delivery system (system connector) to be able to inject markers in a stream to signal the end of a dataset. When an adjunct data store sees the marker, it knows the current dataset is complete, and it can "seal" the store and prepare for the next version. Any data coming thereafter would be stored in the next version. While building a new version, an adjunct data store continues to serve the current version, after the new version is built, it switches to the new version and discards the old version. This can work seamlessly and user would never only see two one versions at the same a time. 

When a file is delivered through one stream (unpartitioned), we guarantee a consistent snapshot (copy) of the entire file at container level; when a file is delivered through multiple streams (partitioned), we guarantee a consistent fraction of the snapshot at task level. No guarantee is provided at job level.

...