You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

NOTE: This document is a design draft and does not reflect the state of a stable release, but the work in progress on the current master, and future considerations.

Before implementing operations that rely in some way on time it is necessary to define the different concepts of time in Flink and how time is handled in a streaming topology.

Different Types of Time

In Flink Streaming, every element has a timestamp attached to it. When a value enters a streaming topology through a source this source attaches a timestamp to the value. The timestamp can either be the current system time of the source (ingress time) or it can be a timestamp that is extracted from the value (event time).

Some operations, such a time-windows and ordering by time, require a notion of time. In these operations the user has the choice between two approaches: 1) using the current system time of the machine that an operation is running on, or 2) using the timestamp that was attached to a value when it entered the topology. Because there are two ways of assigning timestamps at sources there are now three different types of times. All of which have different pros and cons:

  • Operator Time: This usually gives the fastest results since using timestamps in processing might require waiting for slower operations and buffering of elements since the elements will arrive at operations out-of-order. That is, they are usually not ordered by their timestamps. This will, however, not give you any guarantees about the ordering of elements when joining several data streams and when operations require a repartitioning of the elements. 
  • Event Time: This makes it possible to define orderings or time windows that are completely independent of the system time of the machines that a topology is running on. This might, however, lead to very high latencies because operations have to wait until they can be sure that they have seen all elements up to a certain timestamp. 
  • Ingress Time: This gives you a good compromise between event time and operator time. Even after repartitioning data or joining several data streams the elements can still be processed according to the time that they entered the system. The latency should also be low because the timestamps that sources attach to elements are steadily increasing.

Tracking the Process of Time

Latency considerations

 

 

Ordering Records

  • Ordering for OrderedKeyDataStream
  • Ordering for windows

 

 

 

Timestamps on Results of Window Operations

 

 

Sematics of Sequences of Windows

 

Aligned Time Windows

 

General Windows

 

 

Accessing the Timestamp of a Record

 

 

Implementing Windows on Ingress- and Event Time

 

Time-windows

Pane based approach

Other windows

Staging buffer

  • No labels