You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Status

Current stateUnder Discussion

Discussion threadhttp://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-for-Asynchronous-I-O-in-FLINK-tt13497.html

JIRA Unable to render Jira issues macro, execution error.

Released: Release this feature as soon as possible

Google Dochttps://docs.google.com/document/d/1Lr9UYXEz6s6R_3PWg3bZQLF3upGaNEkc0rQCFSzaYDI/edit#

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

I/O access, for the most case, is a time-consuming process, making the TPS for single operator much lower than in-memory computing, particularly for streaming job, when low latency is a big concern for users. Starting multiple threads may be an option to handle this problem, but the drawbacks are obvious: The programming model for end users may become more complicated as they have to implement thread model in the operator. Furthermore, they have to pay attention to coordinate with checkpointing.

Terms

AsyncFunction: Async I/O will be triggered in AsyncFunction.

AsyncWaitOperator: An StreamOperator which will invoke AsyncFunction.

AsyncCollector: For each input streaming record, an AsyncCollector will be created and passed into user's callback to get the async i/o result.

AsyncCollectorBuffer: A buffer to keep all AsyncCollectors.

Emitter Thread: A working thread in AsyncCollectorBuffer, being signalled while some of AsyncCollectors have finished async i/o and emitting results to the following opeartors.

 

Public Interfaces

A helper class, named AsyncDataStream, is added to provide two methods to add AsyncFunction, which will do async i/o operation, into FLINK streaming job.

AsyncDataStream.java
public class AsyncDataStream {
 /**
  * Add an AsyncWaitOperator. The order of output stream records may be reordered.
  *
  * @param func AsyncWaitFunction
  * @return A new DataStream.
  */
 public static DataStream<OUT> unorderedWait(DataStream<IN>, AsyncWaitFunction<IN, OUT> func);

 /**
  * Add an AsyncWaitOperator. The order of output stream records is guaranteed to be the same as input ones.
  *
  * @param func AsyncWaitFunction
  * @return A new DataStream.
  */
 public static DataStream<OUT> orderedWait(DataStream<IN>, AsyncWaitFunction<IN, OUT> func);
}


Proposed Changes

Overview

The following diagram illustrates how the streaming records are processed while

  • arriving at AsyncWaitOperator
  • recovering from task failover
  • snapshotting state
  • being emitted by Emitter Thread

Sequence Diagram

AsyncFunction

AsyncFunction works as a user function in AsyncWaitOperator, which looks like StreamFlatMap operator, having open()/processElement(StreamRecord<IN> record)/processWatermark(Watermark mark).

For user’s concrete AsyncFunction, the asyncInvoke(IN input, AsyncCollector<OUT> collector) has to be overriden to supply codes to start an async operation.

AsyncFunction.java
public interface AsyncFunction<IN, OUT> extends Function, Serializable {
  /**
   * Trigger async operation for each stream input.
   * The AsyncCollector should be registered into async client.
   *
   * @param input Stream Input
   * @param collector AsyncCollector
   */
  void asyncInvoke(IN input, AsyncCollector<OUT> collector) throws Exception;
}

For each input stream record of AsyncWaitOperator, they will be processed by AsyncFunction.asyncInvoke(IN input, AsyncCollector<OUT> cb). Then AsyncCollector will be append into AsyncCollectorBuffer. We will cover AsyncCollector and AsyncCollectorBuffer later.

 

AsyncCollector

 

AsyncCollector is created by AsyncWaitOperator, and passed into AsyncFunction, where it should be added into user’s callback. It acts as a role to get results or errors from user codes and notify the AsyncCollectorBuffer to emit results.

The functions specific for the user is the collect, and they should be called when async operation is done or errors are thrown out.

 

 

AsyncCollector.java
public class AsyncCollector<OUT> {
  private List<OUT> result;
  private Throwable error;
  private AsyncCollectorBuffer<OUT> buffer;

  /**
   * Set result
   * @param result A list of results.
   */
  public void collect(List<OUT> result) { 
    this.result = result;
    buffer.mark(this);
  }

  /**
   * Set error
   * @param error A Throwable object.
   */
  public void collect(Throwable error) {
    this.error = error;
    buffer.mark(this);
  }

  /**
   * Get result. Throw RuntimeException while encountering an error.
   * @return A List of result.
   * @throws RuntimeException RuntimeException wrapping errors from user codes.
   */
  public List<OUT> getResult() throws RuntimeException { ... }
}

 

 

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users? 
  • If we are changing behavior how will we phase out the older behavior? 
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

Test Plan

Describe in few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

  • No labels