Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. The reason why the connection between Global Stream and Non-Global Stream is not supported is that the number of partitions of GlobalStream is forced to be 1, but it is generally not 1 for Non-Global Stream, which will cause conflicts when determining the number of partitions of the output stream. If necessary, they should be transformed into mutually compatible streams and then connected.
  2. Connecting two broadcast streams doesn't really make sense, because each parallelism would have exactly same input data from both streams and any process would be duplicated. 
  3. The reason why the output of two keyed partition streams can be keyed or non-keyed is the same as we mentioned above in the case of single input.
  4. When we connect two KeyedPartitionStream, they must have the same key type, otherwise we can't decide how to merge the partitions of the two streams. At the same time, things like access state and register timer are also restricted to the partition itself, cross-partition interaction is not meaningful.

  5. The reasons why the connection between KeyedPartitionStream and NonKeyedPartitionStream is not supported are as follows:
    1. The data on KeyedStream is deterministic, but on NonKeyed is not. It is difficult to think of a scenario where the two need to be connected.
    2. This will complicate the state declaration and access rules. A more detailed discussion can be seen in the subsequent state-related sub-FLIP.
    3. If we see that most people have clear demands for this, we can support it in the future.

...

Currently, we only support adding FLIP-27 based source. The stream returned from `fromSource` method is Non-KeyedPartitionStream by default. If there is a clear key selecting strategy, the keyBy partitioning can be followed later. The connector part will be explained in more detail in future FLIP.

ProcessFunction

...

Process function is used to describe the processing logic of data. It is the key part for users to implement their job. Overall, we have a base interface for all user defined process functions that contains some life cycle methods, such as open and close. In addition, it also contains some common methods related to state and watermark, but we omit these methods here for simplicity, and we will introduce it in the corresponding sub-FLIPs.

Code Block
languagejava
titleProcessFunction.java
/** This is the base class for all user defined process functions. */
public interface ProcessFunction extends Function {
    /**
     * Initialization method for the function. It is called before the actual working methods (like
     * processRecord) and thus suitable for one time setup work.
     *
     * <p>By default, this method does nothing.
     *
     * @throws Exception Implementations may forward exceptions, which are caught by the runtime.
     *     When the runtime catches an exception, it aborts the task and lets the fail-over logic
     *     decide whether to retry the task execution.
     */
    default void open() throws Exception {}

    /**
     * Tear-down method for the user code. It is called after the last call to the main working
     * methods (e.g. processRecord).
     *
     * <p>This method can be used for clean up work.
     *
     * @throws Exception Implementations may forward exceptions, which are caught by the runtime.
     *     When the runtime catches an exception, it aborts the task and lets the fail-over logic
     *     decide whether to retry the task execution.
     */
    default void close() throws Exception {}

    // Omit some methods related to state and watermark here.
}

...