Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Definition: "idempotent update" is one in which the new result and prior result,   when serialized, are identical byte arrays.

...

The other crucial aspect of our proposal is why we restrict ourselves to dropping idempotent updates for operations that does not have the old result already loaded. To check for idempotence, both the old and new result has to be available. However, when the old result is unavailable, then we will have to procure that result again somehow. Unfortunately, that would exact an extra performance cost (which can be quite expensive). 

Effects on Stream Time

In certain situations, idempotent updates are used to advance stream time quite aggressively. If these updates are dropped, it is possible that stream time advancement will start to stall, having some effect on operations which rely on stream time for functionality. After some consideration, we have decided that the current behavior is satisfactory. The components which can be affected is listed below:

  1. Stream aggregation (both windowed or not) will not be effected. As emphasized in the behavior changes, the current arrangement means that any operations which will advance stream time will still be emitted (as we recall earlier, operations are dropped only if the timestamp has not changed.)
  2. Windowed operations are a primary user of stream time, so they have the largest potential to be impacted in someway by this KIP. For background, In order to advance stream time far enough to close a window or push it out of retention, the new records must have timestamps that are in later windows, which means that they are updates to new keys, which means they would not be suppressed as idempotent. In retrospect, we can conclude that at worst the maximum amount of stream time which can be delayed by updates would be one window. Interestingly enough, this isn't too bad, since it is not in the contract to emit results at the earliest time possible (just some time after the window closes).
  3. The last one is Suppression. One of the primary usages of suppression is to limit the rate of record emission, and at the same time, there should be a consistent rate of records being forwarded by the suppression operator. If we drop idempotent updates, it is possible that the buffer would grind to a halt since stream time is no longer advanced, but this isn't as big an issue as it seems. If this is the case, the user can simply remove the suppression operator from the topology (and allow the dropping of idempotent updates to fulfill the same role as the suppression buffer).

Compatibility, Deprecation, and Migration Plan

...