...
Page properties |
---|
...
|
...
...
|
...
...
|
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
We introduce the cache method to the SingleOutputStreamOperator
class. In order to allow caching the side output of the SingleOutputStreamOperator
, we let the getSideOutput
method returns a SingleOutputStreamOperator
.
Code Block | ||
---|---|---|
| ||
public class SingleOutputStreamOperator<T> extends DataStream<T> { ... /** * Cache the intermediate result of the transformation. Only job running in batch mode with * blocking shuffle mode can create cache. Otherwise, an exception will be thrown. The cache * is generated lazily at the first time the intermediate result is computed. The cache will * be clear when {@link CachedDataStream#invalidateCache()} called or the * {@link StreamExecutionEnvironment} close. * * @return CachedDataStream that can use in later job to reuse the cached intermediate * result. */ public CachedDataStream cache() { ... } public <X> SingleOutputStreamOperator<X> getSideOutput(OutputTag<X> sideOutputTag) { ... } } |
We introduce CacheDataStream
that extends the DataStream
and implements the AutoCloseable
interface.
...
When translating the CacheTransformation
, if the intermediate result of the input has not been cached, a stream node with the sample same parallelism as its input is added to the StreamGraph. Its chaining strategy is set to HEAD so that it will not be chained with the upstream node and the intermediate result will be created.
...
During JobGraph translation, multiple StreamNodes may be chaining chained together to form a JobVertex. While translating the StreamGraph to JobGraph, if the translator sees a node whose intermediate result should be cached, it sets the ResultPartitionType of the JobEdge to BLOCKING_PERSISTENT.
...