Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A common UDF type is the ScalarFunction.  This works well for CPU-intensive operations, but less well for IO bound or otherwise long-running computations.  
One  One example of this is remote calls to external systems where networking, serialization, and database lookups might dominate the execution time.
StreamTask has a single thread serially executing operators and their contained calls, which happen synchronously in the remote call case.
Since each call can take, say, 1 second or more, that limits throughput and the overall performance, potentially accumulating backpressure to the upstream operator.
The solution is to either: increase the parallelism of the query (resulting in a higher resource cost, overhead, etc.) or asynchronously fire off many requests concurrently and receive results as they complete.
This FLIP aims to address the latter solution by introducing AsyncScalarFunction, a new UDF type which allows for issuing concurrent function calls.

...

It is most prudent to only allow async behavior where it is known to not violate SQL semantics.  
To  To do this, rules will be introduced which contain query structures which we don’t want to allow and if found, all of the async calls will be executed in synchronous mode.

This can be done by introducing a new trait AsyncOperatorModeTrait, which comes in sync mode and async mode (default), and which will be attached to a FlinkLogicalCalc
if it contains async calls which we would prefer to execute in sync mode.  Execution Execution in synchronous mode just utilizes the same stack of as async, but waits on the result immediately after issuing the request.

...