Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Iteration Body and Round Semantics

At the iteration API level, we would need the corresponding concept corresponding to Epoch and Batch. We would call processing one epoch as a round: users would specify a subgraph as the body of the iteration to specify how to calculate the update, after the iteration body process the whole dataset for one time (namely one Epoch). Apparently the round is meaningful only for the bounded cases.the update, after the iteration body process the whole dataset for one time (namely one Epoch). Apparently the round is meaningful only for the bounded cases.

draw.io Diagram
bordertrue
diagramNameiteration
simpleViewerfalse
width
linksauto
tbstyletop
diagramDisplayName
lboxtrue
diagramWidth631
revision10

Figure 1. The structure of an iteration body.

To process the inputs for multiple rounds, we would need feedback edges to emit the outputs of the last round to the inputs of the iteration body, and we union the initial inputs and the feedbacks. For bounded dataset when processing the first epoch, the data would be from the initial input edges and when processing the remaining epochs, the data would be from the feedback edges. 

There are also inputs to the iteration body that do not have feedbacks. For example, an ML algorithm might have two inputs, one is the initialized model and the other is the training data. The input corresponding to the initial model will have a feedback after each epoch about the update to the model, but the training data would not need to be updated. 

The iteration body also have one or multiple output streams. The iteration body might output records at each round, and the records emitted in all the rounds composed the final outputs.

Therefore, an iteration body composed of

  1. Variable inputs with feedbacks.
  2. Constant inputs without feedbacks.
  3. Outputs.

For the unbounded case, logically we should have 

Per-Round v.s. All-Rounds Semantics

...

  1. The inputs from outside of the iteration. 
  2. An iteration body specify the structure inside the iteration.
    1. The subgraph inside the iteration.
    2. Some input have corresponding feedbacks to update the underlying data stream. The feedbacks are union with the corresponding inputs: the original inputs are emitted into the iteration body for only once, and the feedbacks are also emitted to the same set of operators.
    3. The outputs going out of the iteration. The outputs could be emitted from arbitrary data stream.

...

    1. .

...



Unbounded Iteration

Similar to FLIP-15, we would more tend to provide a structural iteration API to make it easier to be understand. With this method, users are required to specify an IterationBody that generates the part of JobGraph inside the iteration. The iteration body should specify the DAG inside the iteration, and also the list of feedback streams and the output streams. The feedback streams would be union with the corresponding inputs and the output streams would be provided to the caller routine. 

...