Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Discussion threadhttps://lists.apache.org/thread/qq39rmg3f23ysx5m094s4c4cq0m4tdj5
Vote thread-https://lists.apache.org/thread/oy9mdmh6gk8pc0wjdk5kg8dz3jllz9ow
JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-32594

Release-

...

As shown in the figure below, we might have a job that pre-processes records from a bounded source (i.e. inputA) using an operator (i.e. operatorA) which only emits results after its input has ended, which is also known as . We will call such operator full-dam operator. Then operatorB needs to join records emitted by operatorA with records from an unbounded source, and emit results with low processing latency in real-time.

...

Code Block
languagejava
package org.apache.flink.streaming.api.operators;
 
/** The builder class for {@link OperatorAttributes}. */
@Experimental
public class OperatorAttributesBuilder {
 
    public OperatorAttributesBuilder() {...}        

    /**
     * Set to true if and only if the operator only emits records after all its inputs have ended.
     * If it is not set, the default value false is used.
     */   
     public OperatorAttributesBuilder setOutputOnlyAfterEndOfStream(
            boolean outputOnlyAfterEndOfStream) {...}

     
        

	/**
     * If any operator attribute is not nullset explicitly, we will log it at DEBUG level and set it to the default
     * value.
     */
    	public OperatorAttributes build() {...}

}


Code Block
languagejava
package org.apache.flink.streaming.api.operators;
 
/**
 * OperatorAttributes element provides Job Manager with information that can be
 * used to optimize the job performance.
 */
@Experimental
public class OperatorAttributes {
  
    /**
     * Returns true if and only if the operator only emits records after all its inputs have ended.
     *
     * <p>Here are the implications when it is true:
     *
     * <ul>
     *   <li> The results of this operator as well as its chained operators have blocking partition type.
     *   <li> This operator as well as its chained operators will be executed in batch mode.
     * </ul>
     */     
	public boolean isOutputOnlyAfterEndOfStream() {...}
}

...

  • It would be useful to add an ExecutionState (e.g. Finishing) to specify whether the task has reached the end for all its inputs. This allows JM to deploy its downstream tasks and possibly apply hybrid shuffle to increase job throughput.
  • Setting the upstream operators of the full-dam operator in batch mode will also increase the job throughput, and Hybrid shuffle mode can also be used in batch mode part to further improve the performance when there are sufficient slot resources.
  • Optimizing the implementation of frequently used full-dam operators,  such as aggregate/CoGroup/Join/..., can achieve higher throughput by using the optimizations currently done in batch mode. For example, we can instantiate an internal sorter in the operator, and it will not have to invoke WindowAssigner#assignWindows or triggerContext#onElement for each input record.

...