Page History

...
Code Block
language	java
title	DataStream
/**
 * Reinterprets the given {@link DataStream} as a {@link KeyedStream}, which extracts keys with
 * the given {@link KeySelector}.
 *
 * <p>With the reinterpretation, the upstream operators can now be chained with the downstream operator and therefore
 * avoiding shuffling data and network consumption between two subtasks. If shuffling or network consumption is acceptable, 
 * {@link DataStream#keyBy(KeySelector)} is recommended.
 *
 * <p>IMPORTANT: If the base stream is not in the SourceTask, this method works well if the following conditions
 * are satisfied.
 *
 * <ul>
 *     <li>The base stream has ever been partitioned withw.r.t. the selected key by the {@link DataStream#keyBy(KeySelector)}.
 *     <li>The value of selected keys of the base stream have not changed since {@link DataStream#keyBy(KeySelector)}.
 *     <li>No data shuffling since {@link DataStream#keyBy(KeySelector)}.
 * </ul>
 *
 * <p>Overall, the data stream has to be partitioned exactly in the same way as if it was
 * created through a {@link DataStream#keyBy(KeySelector)}.
 *
 * <p>IMPORTANT: If the base stream is in the SourceTask, which means that the data stream from the external
 * system has been partitioned and so-called "pre-KeyedStream". This method works well if the following conditions
 * are satisfied.
 *
 * <ul>
 *     <li>The base stream has been partitioned w.r.t. the selected key in the external system. But the stream does not need
 to* be
 *   to be partitioned exactly in the same way Flink's keyBy would partition the data.
 *     <li>Selected keys of the base stream could only exist in one {@link SourceSplit}. If it exists in multiple
 *     {@link SourceSplit}s, the stream should not be called "pre-KeyedStream". And therefore use this method could
 *     cause state inconsistency.
 *     <li>The value of selected keys of the base stream remain the same as the input source stream from the external system.
 * </ul>
 *
 * <p>IMPORTANT: For a "pre-KeyedStream", the maximum number of currently existing splits must not be larger than the maximum
 * number of key-groups. Since every split would be mapped to one key-group. To config the number of key-group, please config the
 * maximum supported parallelism.
 *
 * @param keySelector Function that defines how keys are extracted from the data stream.
 * @param <K> Type of the extracted keys.
 * @return The reinterpretation of the {@link DataStream} as a {@link KeyedStream}.
 */
public <K> KeyedStream<T, K> reinterpretAsKeyedStream(KeySelector<T, K> keySelector) {}

/**
 * Reinterprets the given {@link DataStream} as a {@link KeyedStream}, which extracts keys with
 * the given {@link KeySelector}.
 *
 * <p>With the reinterpretation, the upstream operators can now be chained with the downstream operator and therefore
 * avoiding shuffling data and network consumption between two subtasks. If shuffling or network consumption is acceptable, 
 * {@link DataStream#keyBy(KeySelector)} is recommended.
 *
 * <p>IMPORTANT: If the base stream is not in the SourceTask, this method works well if the following conditions
 * are satisfied.
 *
 * <ul>
 *     <li>The base stream has ever been partitioned with w.r.t. the selected key by the {@link DataStream#keyBy(KeySelector)}.
 *     <li>The value of selected keys of the base stream have not changed since {@link DataStream#keyBy(KeySelector)}.
 *     <li>No data shuffling since {@link DataStream#keyBy(KeySelector)}.
 * </ul>
 *
 * <p>Overall, the data stream has to be partitioned exactly in the same way as if it was
 * created through a {@link DataStream#keyBy(KeySelector)}.
 *
 * <p>IMPORTANT: If the base stream is in the SourceTask, which means that the data stream from the external
 * system has been partitioned and so-called "pre-KeyedStream". This method works well if the following conditions
 * are satisfied.
 *
 * <ul>
 *     <li>The base stream has been partitioned w.r.t. the selected key in the external system. But the stream does not need
 to* be
 *   to be partitioned exactly in the same way Flink's keyBy would partition the data.
 *     <li>Selected keys of the base stream could only exist in one {@link SourceSplit}. If it exists in multiple
 *     {@link SourceSplit}s, the stream should not be called "pre-KeyedStream". And therefore use this method could
 *     cause state inconsistency.
 *     <li>The value of selected keys of the base stream remain the same as the input source stream from the external system.
 * </ul>
 *
 * <p>IMPORTANT: For a "pre-KeyedStream", the maximum number of currently existing splits must not be larger than the maximum
 * number of key-groups. Since every split would be mapped to one key-group. To config the number of key-group, please config the
 * maximum supported parallelism.
 *
 * @param keySelector Function that defines how keys are extracted from the data stream.
 * @param <K> Type of the extracted keys.
 * @return The reinterpretation of the {@link DataStream} as a {@link KeyedStream}.
 */
public <K> KeyedStream<T, K> reinterpretAsKeyedStream(KeySelector<T, K> keySelector, TypeInformation<K> typeInfo) {}
...
Page tree

Versions Compared

Old Version 16

New Version 17

Key