THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Code Block | ||
---|---|---|
| ||
public interface KStream<K, V> { /** * Marking the {@code KStream} as partitioned signals the stream is partitioned as intended, * and does not require further repartitioning in downstream key changing operations. * * <p><em> * Note that {@link KStream#markAsPartitioned()} SHOULD NOT be used with interactive query(IQ) or {@link KStream#join}. * For reasons that when repartitions happen, records are physically shuffled by a composite key defined in the stateful operation. * However, if the repartitions were cancelled, records stayed in their original partition by its original key. IQ or joins * assumes and uses the composite key instead of the original key. * </p></em> * * @return a {@code KStream} that will not repartition in subsequent operations. */ KStream<K, V> markAsPartitioned(); } |
Proposed Change
In KStreamImpl , set the boolean flag repartitionRequired
to to be false when the method is invoked.
Usage
Example: canceling repartition in a streams aggregation would look like:
...
CompositeKey
the design will be safe both from the pov of data locality and IQ but adds complexity to the usage.
Option 2: Optional configuration in Named Operations( Joined , Grouped , etc)
- It would allow us to hit only the relevant parts of the DSL and be less prone to undesired behaviors when it comes to IQ or joins.
- More generic, can be applied to KTable as well. In comparison, the
markAsPartitioned()
approach approach is targeting the KStreams interface only where it focuses on a specific set of overhead/pain points introduced byrepartitionRequired
. - It touches on a larger surface area of the API.