Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We want to have an adjunct data (AD) store that is a read-only cache. It automatically stores streaming data for later usage. Adjunct data can be accessed the same way as accessing a key-value store in Samza, in addition we guarantee a consistent view of data from a Samza task’s perspective. Data can be either partitioned or unpartitioned. If the dataset is small enough to fit in a RocksDB instance, the same copy would be populated in every container via a broadcast stream; if it is large enough fit in one database instance it would be partitioned across containers of a Samza job. 

  

Theoretically an AD store could be either local (RocksDB and MemDB) or centralized (CouchBase), however we believe the use of a centralized data store is more of a side effect of the lack of a local adjunct data store. For now we defer the support of a centralized adjunct data store until we see clear evidence.

...

  • Automatic maintenance of local cache
  • Table oriented operations for fluent API

...

Proposed Changes

 

 


Public Interfaces

...