Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The concept of watermarks as used in Flink is inspired by Google's MillWheel: http://research.google.com/pubs/pub41378.html and by work building on top of it for Google Cloud Dataflow API: http://strataconf.com/big-data-conference-uk-2015/public/schedule/speaker/193653

Ordering Records

 

  • Ordering for OrderedKeyDataStream
  • Ordering for windows

 

 Elements in a stream can be ordered by timestamp. Normally, it would not be possible to order elements of an infinite stream. By using watermarks, however, Flink can order elements in a stream. The ordering operator has to buffer all elements it receives. Then, when it receives a watermark it can sort all elements that have a timestamp that is lower than the watermark and emit them in the sorted order. This is correct because the watermark signals that not more elements can arrive that would be intermixed with the sorted elements. The following code example and diagram illustrate this idea:

 

Code Block
languagescala
titleDataStream Example
linenumberstrue
val stream: DataStream[MyType] = env.createStream(new KafkaSource(...))
val orderedStream: OrderedKeyDataStream[MyType] = stream.keyBy(...).orderByTime()
val mapped: DataStream[MyType] = orderedStream.flatMap { ... }

 

 

 Using the same method, Flink can also sort the input of a windowing operator. This can be useful if you want to have window trigger policies that react to changes in the data and need the input in sorted order. We will come back to this when we look at windowing policies further below.

Timestamps on Results of Window Operations

...