THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Code Block | ||||
---|---|---|---|---|
| ||||
/** * Reinterprets the given {@link DataStream} as a {@link KeyedStream}, which extracts keys with * the given {@link KeySelector}. * * <p>With the reinterpretation, the upstream operators can now be chained with the downstream operator and therefore * avoiding shuffling data and network consumption between two subtasks. If shuffling or network consumption is acceptable, * {@link DataStream#keyBy(KeySelector)} is recommended. * * <p>IMPORTANT: If the base stream is not in the SourceTask, this method works well if the following conditions * are satisfied. * * <ul> * <li>The base stream has ever been partitioned withw.r.t. the selected key by the {@link DataStream#keyBy(KeySelector)}. * <li>The value of selected keys of the base stream have not changed since {@link DataStream#keyBy(KeySelector)}. * <li>No data shuffling since {@link DataStream#keyBy(KeySelector)}. * </ul> * * <p>Overall, the data stream has to be partitioned exactly in the same way as if it was * created through a {@link DataStream#keyBy(KeySelector)}. * * <p>IMPORTANT: If the base stream is in the SourceTask, which means that the data stream from the external * system has been partitioned and so-called "pre-KeyedStream". This method works well if the following conditions * are satisfied. * * <ul> * <li>The base stream has been partitioned w.r.t. the selected key in the external system. But the stream does not need to* be * to be partitioned exactly in the same way Flink's keyBy would partition the data. * <li>Selected keys of the base stream could only exist in one {@link SourceSplit}. If it exists in multiple * {@link SourceSplit}s, the stream should not be called "pre-KeyedStream". And therefore use this method could * cause state inconsistency. * <li>The value of selected keys of the base stream remain the same as the input source stream from the external system. * </ul> * * <p>IMPORTANT: For a "pre-KeyedStream", the maximum number of currently existing splits must not be larger than the maximum * number of key-groups. Since every split would be mapped to one key-group. To config the number of key-group, please config the * maximum supported parallelism. * * @param keySelector Function that defines how keys are extracted from the data stream. * @param <K> Type of the extracted keys. * @return The reinterpretation of the {@link DataStream} as a {@link KeyedStream}. */ public <K> KeyedStream<T, K> reinterpretAsKeyedStream(KeySelector<T, K> keySelector) {} /** * Reinterprets the given {@link DataStream} as a {@link KeyedStream}, which extracts keys with * the given {@link KeySelector}. * * <p>With the reinterpretation, the upstream operators can now be chained with the downstream operator and therefore * avoiding shuffling data and network consumption between two subtasks. If shuffling or network consumption is acceptable, * {@link DataStream#keyBy(KeySelector)} is recommended. * * <p>IMPORTANT: If the base stream is not in the SourceTask, this method works well if the following conditions * are satisfied. * * <ul> * <li>The base stream has ever been partitioned with w.r.t. the selected key by the {@link DataStream#keyBy(KeySelector)}. * <li>The value of selected keys of the base stream have not changed since {@link DataStream#keyBy(KeySelector)}. * <li>No data shuffling since {@link DataStream#keyBy(KeySelector)}. * </ul> * * <p>Overall, the data stream has to be partitioned exactly in the same way as if it was * created through a {@link DataStream#keyBy(KeySelector)}. * * <p>IMPORTANT: If the base stream is in the SourceTask, which means that the data stream from the external * system has been partitioned and so-called "pre-KeyedStream". This method works well if the following conditions * are satisfied. * * <ul> * <li>The base stream has been partitioned w.r.t. the selected key in the external system. But the stream does not need to* be * to be partitioned exactly in the same way Flink's keyBy would partition the data. * <li>Selected keys of the base stream could only exist in one {@link SourceSplit}. If it exists in multiple * {@link SourceSplit}s, the stream should not be called "pre-KeyedStream". And therefore use this method could * cause state inconsistency. * <li>The value of selected keys of the base stream remain the same as the input source stream from the external system. * </ul> * * <p>IMPORTANT: For a "pre-KeyedStream", the maximum number of currently existing splits must not be larger than the maximum * number of key-groups. Since every split would be mapped to one key-group. To config the number of key-group, please config the * maximum supported parallelism. * * @param keySelector Function that defines how keys are extracted from the data stream. * @param <K> Type of the extracted keys. * @return The reinterpretation of the {@link DataStream} as a {@link KeyedStream}. */ public <K> KeyedStream<T, K> reinterpretAsKeyedStream(KeySelector<T, K> keySelector, TypeInformation<K> typeInfo) {} |
...