Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

How it works for end-of-stream:

  1. When an input stream is consumed to the end, Samza sends an Eos message to the control channel which includes the input topic and partition.

  2. Once the EOS messages are received from all the partitions of this input, we know the input is end-of-stream. Then the ControlStreamConsumer will inspect the stream graph and find out the intermediate stream that all its input streams to it have been all end-of-stream. If so, we mark the intermediate stream pending end-of-stream. After that, whenever a marked intermediate stream partition reaches its highest offset (high watermark in Kafka), we can emit end-of-stream message for this partition. It’s guaranteed that the partition reaches end of stream.


Approach 2: In-band control messages

In this approach we don’t use a separate stream to keep the control messages. We use the intermediate streams themselves as both data and control.


 

How it works for end-of-stream:

  1. When an input stream is consumed to the end, Samza finds out the following intermediate streams that all the inputs have been end-of-stream (through the topology of the operator graph).

  2. The task will send an Eos message to all the partitions of the intermediate streams in 1.

  3. Each consumer of the intermediate streams will count the watermark messages received for each partition and declare end of stream once all the EOS messages have been received.


Comparisons of the two approaches:

 

 

Pros

Cons

Approach 1

- Intermediate streams are clean with only user data. This is convenient if user wants to consume it elsewhere.

- Simple recovery from failure, just read the control stream from the beginning.

- Less number of messages. The control messages needed is the same as the input stream partition count (n partitions). So the total will be n messages.

- Need to correlate the out-of-band control message with the source stream, which is complex to track and requires synchronization between input streams and control stream. 

- Need to maintain a separate stream for control messages

Approach 2

- No coordination needed between control message and input messages. When a control message is received, it is a marker that the messages sent before the control message have been consumed completely. This is critical to support general event-time watermarks.

- Complicated failure scenario. The consumer of control messages needs to checkpoint the control messages received, so when it recovered from failure, it can still resume.

- More control messages required. For each intermediate stream (m partitions), we need to write each task of the producer (n tasks) into it. So the total will be n*m messages.

 

Based on the pros and cons above, we propose to use the in-band approach to support control messages.

Detail details

Intermediate Stream Message Format:

The format of the intermediate stream message:

Code Block
IntermediateMessage =>  [MessageType MessageData]
  MessageType => byte
  MessageData => byte[]

  MessageType => [0(UserMessage), 1(Watermark), 2(EndOfStream)]
  MessageData => [UserMessage/ControlMessage]
  ControlMessage =>
     TypeVersion => int
     TaskName =>  VersionString
     TaskCount => int
     Other Message Data (based on different types of control message)

For user message, we will use the user provided serde (default is the system serde). For control message, we will use JSON serde since it is built in Samza and easy to parse.

ControlMessage

We will support two types of ControlMessage: EndOfStreamMessage and WatermarkMessage

Code Block
public abstract class ControlMessage {
  private final int version = 1String taskName;
  private publicfinal int getVersion() {    return version;  }
}
 
public class EndOfStreamMessage extends ControlMessage{
 private final String taskName;
 private final int taskCounttaskCount;
  private int version = 1;

 private public EndOfStreamMessageControlMessage(String taskName, int taskCount) {
    super(ControlMessageType.EndOfStream.ordinal())this.taskName = taskName;
    this.timestamptaskCount = timestamptaskCount;
  }

  public this.taskName = taskName;String getTaskName() {
   this.taskCount =return taskCounttaskName;
  }

  public longint getTimestampgetTaskCount() {
    return timestamptaskCount;
  }

  public Stringvoid getTaskNamesetVersion(int version) {
    this.version return= taskNameversion;
  }

  public int getTaskCountgetVersion() {
    return taskCountversion;
  }  
}
 
public class WatermarkMessageEndOfStreamMessage extends ControlMessage{
 private final long timestamp;
 private final String taskNamestreamId;
 private final int taskCount;

 private WatermarkMessageEndOfStreamMessage(longString timestampstreamId, String taskName, int taskCount) {
   super(ControlMessageType.Watermark.ordinal())taskName, taskCount);
   this.timestampstreamId = timestampstreamId;
 }

 public this.taskName = taskName;String getStreamId() {
   this.taskCount = taskCountreturn streamId;
 }
}

public publicclass longWatermarkMessage getTimestamp()extends ControlMessage{
 private final returnlong timestamp;

  }

 public String getTaskName(private WatermarkMessage(long timestamp, String taskName, int taskCount) {
   return taskName;  super(taskName, taskCount);
   this.timestamp = timestamp;
 }

 public intlong getTaskCountgetTimestamp() {   return taskCounttimestamp;  }
}

Reconciliation

For EOS/watermark messages, Samza will trigger the event to the consumer tasks after we received the message from all the producers (previous stage tasks). So when SystemConsumers gets EOS/watermark messages, Samza needs to count the tasks based on the total number of producing tasks. The counting works as follows:

  1. For each intermediate stream partition, Samza keeps track of the end-of-stream/watermark messages received from the producing tasks, and counts the number of tasks that it has been received in the messages.

  2. When the count matches the total task count, Samza will emit a end-of-stream/watermark message to the task that’s assigned for this stream partition.

  3. When Samza received further watermark messages, it will emit a watermark with the earliest event time across all the stream partitions. No emission if the earliest event time doesn’t change.

Checkpoint control messages

For failure scenario, the latest control message received from each intermediate stream partition could be lost without checkpointing. Since we will need these messages for counting and triggering, we need to checkpoint control messages and preserve both intermediate stream partition and the producing task information. A checkpoint will be:

Code Block
Key => IntermediateStreamPartition.ControlMessageType
Value => ControlMessageCheckpoint
 
public class ControlMessageCheckpoint {
 int taskCount;
 Map<String, Long> tasksToEventTime;
}

 

 


 

 

...