Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


This page is meant as a template for writing a FLIP. To create a FLIP choose Tools->Copy on this page and modify with your content and replace the heading with the next FLIP number and a description of your issue. Replace anything in italics with your own description.

Document the state by adding a label to the FLIP page with one of "discussion", "accepted", "released", "rejected".
Page properties


Discussion threadhere (<- link to https://lists.apache.org/list.html?dev@flink.apache.org)/thread/q3st6t1w05grd7bthzfjtr4r54fv4tm2
Vote threadhere (<- link to https://lists.apache.org/list.html?dev@flink.apache.org)thread/dt4tnwk8hcfj0sp3l3qwloqlljog8xm7
JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-33978

JIRAhere (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

Release<Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).



Motivation

A common UDF type is the ScalarFunction.  This works well for CPU-intensive operations, but less well for IO bound or otherwise long-running computations.  One example of this is remote calls to external systems where networking, serialization, and database lookups might dominate the execution time. StreamTask has a single thread serially executing operators and their contained calls, which happen synchronously in the remote call case. Since each call can take, say, 1 second or more, that limits throughput and the overall performance, potentially accumulating backpressure to the upstream operator. The solution is to either: increase the parallelism of the query (resulting in a higher resource cost, overhead, etc.) or asynchronously fire off many requests concurrently and receive results as they complete. This FLIP aims to address the latter solution by introducing AsyncScalarFunction, a new UDF type which allows for issuing concurrent function calls.

...

Since the call to the AsyncScalarFunction is wrapped in a AsyncFunction taking input rows, we have the benefit of using the existing class AsyncWaitOperator, which handles ordering, checkpointing, timeouts and other implementation details.  Since only ordered results are handled in this scope, ORDERED will be the default behavior.

The PhysicalAsyncCalcs mentioned in the planning phase will translate to an exec node, which creates the transformation containing this operator.

...