Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagejava
public interface KStream<K, V> {    
	/**
     * Marking the {@code KStream} as partitioned signals the stream is partitioned as intended,
     * and does not require further repartitioning in downstream key changing operations.
     *
     * <p><em>
     *     Note that {@link KStream#markAsPartitioned()} SHOULD NOT be used with interactive query(IQ) or {@link KStream#join}.
     *     For reasons that when repartitions happen, records are physically shuffled by a composite key defined in the stateful operation.
     *     However, if the repartitions were cancelled, records stayed in their original partition by its original key. IQ or joins
     *     assumes and uses the composite key instead of the original key.
     * </p></em>
     *
     * @return a {@code KStream} that will not repartition in subsequent operations.
     */
    KStream<K, V> markAsPartitioned();

}

Proposed Change

In KStreamImpl , set the boolean flag repartitionRequired  to   to be false when the method is invoked. 

Usage

Example: canceling repartition in a streams aggregation would look like:

...

CompositeKey the design will be safe both from the pov of data locality and IQ but adds complexity to the usage. 

Option 2: Optional configuration in Named Operations( Joined  , Grouped   , etc)

  • It would allow us to hit only the relevant parts of the DSL and be less prone to undesired behaviors when it comes to IQ or joins. 
  • More generic, can be applied to KTable as well. In comparison, the markAsPartitioned()  approach  approach is targeting the KStreams interface only where it focuses on a specific set of overhead/pain points introduced by repartitionRequired 
  • It touches on a larger surface area of the API.