Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current stateUnder Discussion

Discussion threadhere (<- link to httpshttp://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-153-Support-state-access-in-Python-DataStream-API-tt47127.htmlmail-archives.apache.org/mod_mbox/flink-dev/)

JIRAhere (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

...

  1. State API between the Runner and the SDK harness which could be used for state access in the Python user-defined function. (Note: The state communication uses a separate channel from the data channel.
  2. It has defined five kinds of states in the proto message in the State API (Refer to StateKey for more details) and three types of operations for state access in the State API: Get / Append / Clear:

    State Type

    Usecase

    Runner

    Remote references (and other Runner specific extensions)

    IterableSideInput

    Side input of iterable

    MultimapSideInput

    Side input of of values of map

    MultimapKeyedSideInput

    Side input of of keys of map

    BagUserState

    User state with primary key (value state, bag state, combining value state)

    Among them, only BagUserStage is dedicated for state access in Beam. The others are used for data access in Beam. 

  3. Building on the proto message of BagUserState, it has supported four kinds of user-facing API in Beam’s Python SDK harness: BagRuntimeState, SetRuntimeState, ReadModifyWriteRuntimeState and CombiningValueRuntimeState.

...