Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The input data source for the in-memory system can be broadly classified as bounded and unbounded data. We are limiting the scope of this SEP to only bounded data source that is immutable as the input source. It simplifies the view of the data and also the initialization step for the consumers. However, in-memory system for intermediate streams supports both bounded and unbounded data. The sink a.k.a output source is modeled to be mutable.

 

Data Type

Samza has a pluggable system design allowing users to implement their own system consumers. Typically, consumers consume raw message and wrap them using IME. However, it is possible for some systems to introduce subclass of IME and pass them to tasks. For this reason, we need to support for different data types within in-memory collection.

  1. Raw messages: In-memory system will behave like a typical consumer and wrap the raw message using IME. It takes care of populating The offset and key fields for the message are populated by the in-memory system. Note, the offset is defined as the position of the data in the collection and the key is the hash code of the raw message. 
  2. Type of IME:  In-memory system acts as a pass through system consumer, passing the actual message envelope to the task without any wrapping.

...