Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagejava
titleDataStream
/**
 * Reinterprets the given {@link DataStream} as a {@link KeyedStream}, which extracts keys with
 * the given {@link KeySelector}. The precondition is that the base stream is already grouped by key and partitioned.
 *
 * <p>With the reinterpretation, the upstream operators can now be chained with the downstream operator and therefore
 * avoiding shuffling data and network consumption between two subtasks. If shuffling or network consumption is acceptable, 
 * {@link DataStream#keyBy(KeySelector)} is recommended.
 *
 * <p>IMPORTANT<p>"Pre-KeyedStream": If theThe base stream is notstreaming infrom the external SourceTasksystem, thiswhere methodit workshas wellbeen ifgrouped theby followingkey conditions
 * are satisfiedand partitioned.
 *
 * <ul>
<p>IMPORTANT: *For a "Pre-KeyedStream", the maximum <li>Thenumber baseof streamcurrently hasexisting eversplits beenmust partitioned w.r.t. the selected key by the {@link DataStream#keyBy(KeySelector)}.not be larger than the maximum
 * number of key-groups. Since every <li>Thesplit would valuebe ofmapped selectedto keys of the base stream have not changed since {@link DataStream#keyBy(KeySelector)}.one key-group. To config the number of key-group, please config the
 * maximum    <li>No data shuffling since {@link DataStream#keyBy(KeySelector)}supported parallelism.
 * </ul>
 *
 * <p>Overall, the data stream has to be partitioned exactly in the same way as if it was
 * created through a {@link DataStream#keyBy(KeySelector)}.
 *
 * <p>IMPORTANT: If the base stream is in the SourceTask, which means that the data stream from the external
 * system has been partitioned and so-called "pre-KeyedStream". This method works well if the following conditions
 * are satisfied.
 *
 * <ul>
 *     <li>The base stream has been partitioned w.r.t. the selected key in the external system. But the stream does not need
 *     to be partitioned exactly in the same way Flink's keyBy would partition the data.
 *     <li>Selected keys of the base stream could only exist in one {@link SourceSplit}. If it exists in multiple
 *     {@link SourceSplit}s, the stream should not be called "pre-KeyedStream". And therefore use this method could
 *     cause state inconsistency.
 *     <li>The value of selected keys of the base stream remain the same as the input source stream from the external system.
 * </ul>
 *
 * <p>IMPORTANT: For a "pre-KeyedStream", the maximum number of currently existing splits must not be larger than the maximum
 * number of key-groups. Since every split would be mapped to one key-group. To config the number of key-group, please config the
 * maximum supported parallelism.
 *
 * @param keySelector Function that defines how keys are extracted from the data stream.
 * @param <K> Type of the extracted keys.
 * @return The reinterpretation of the {@link DataStream} as a {@link KeyedStream}.
 */
public <K> KeyedStream<T, K> reinterpretAsKeyedStream(KeySelector<T, K> keySelector) {}

/**
 * Reinterprets the given {@link DataStream} as a {@link KeyedStream}, which extracts keys with
 * the given {@link KeySelector}.
 *
 * <p>With the reinterpretation, the upstream operators can now be chained with the downstream operator and therefore
 * avoiding shuffling data and network consumption between two subtasks. If shuffling or network consumption is acceptable, 
 * {@link DataStream#keyBy(KeySelector)} is recommended.
 *
 * <p>IMPORTANT: If the base stream is not in the SourceTask, this method works well if the following conditions
 * are satisfied.
 *
 * <ul>
 *     <li>The base stream has ever been partitioned w.r.t. the selected key by the {@link DataStream#keyBy(KeySelector)}.
 *     <li>The value of selected keys of the base stream have not changed since {@link DataStream#keyBy(KeySelector)}.
 *     <li>No data shuffling since {@link DataStream#keyBy(KeySelector)}.
 * </ul>
 *
 * <p>Overall, the data stream has to be partitioned exactly in the same way as if it was
 * created through a {@link DataStream#keyBy(KeySelector)}.
 *
 * <p>IMPORTANT: If the base stream is in the SourceTask, which means that the data stream from the external
 * system has been partitioned and so-called "pre-KeyedStream". This method works well if the following conditions
 * are satisfied.
 *
 * <ul>
 *     <li>The base stream has been partitioned w.r.t. the selected key in the external system. But the stream does not need
 *     to be partitioned exactly in the same way Flink's keyBy would partition the data.
 *     <li>Selected keys of the base stream could only exist in one {@link SourceSplit}. If it exists in multiple
 *     {@link SourceSplit}s, the stream should not be called "pre-KeyedStream". And therefore use this method could
 *     cause state inconsistency.
 *     <li>The value of selected keys of the base stream remain the same as the input source stream from the external system.
 * </ul>
 *
 * <p>IMPORTANT: For a "pre-KeyedStream", the maximum number of currently existing splits must not be larger than the maximum
 * number of key-groups. Since every split would be mapped to one key-group. To config the number of key-group, please config the
 * maximum supported parallelism. <p>IMPORTANT: For a "Pre-KeyedStream", if chaining is disabled for debugging, this method might not work.
 *
 * @param keySelector Function that defines how keys are extracted from the data stream.
 * @param <K> Type of the extracted keys.
 * @return The reinterpretation of the {@link DataStream} as a {@link KeyedStream}.
 */
public <K> KeyedStream<T, K> reinterpretAsKeyedStream(KeySelector<T, K> keySelector) {}

/**
 * @see #reinterpretAsKeyedStream(KeySelector)
 *
 * @param keySelector Function that defines how keys are extracted from the data stream.
 * @param <K> Type of the extracted keys.
 * @return The reinterpretation of the {@link DataStream} as a {@link KeyedStream}.
 */
public <K> KeyedStream<T, K> reinterpretAsKeyedStream(KeySelector<T, K> keySelector, TypeInformation<K> typeInfo) {}

...