In comparison, the iterateBoundedStreamsUntilTermination(...) method proposed in this FLIP allows users to run an iteration body without incurring this disk performance overhead. Developers have the freedom to optimize the performance based on its algorithm and data size, e.g. cache data in memory in a more compact format.

Example Usages

...

In this section we provide examples code snippets to demonstrate how we can use the APIs proposed in this FLIP to address the target use-cases described above.

Iterative algorithm on bounded data streams in sync mode

We would like to first show the usage of the bounded iteration with the linear regression case: the model is Y = XA, and we would like to acquire the best estimation of A with the SGD algorithm. To simplify we assume the parameters could be held in the memory of one task.

...

Code Block

language	java

public static class ParametersCacheFunction extends ProcessFunction<Tuple2<Integer, double[]>, Tuple2<Integer, double[]>>
    implements BoundedIterationProgressListener<double[]> {  
    
    private final double[] parameters = new double[N_DIM];

    public void processElement(Tuple2<Integer, double[]> update, Context ctx, Collector<Tuple2<Integer, double[]>> output) {
        // Suppose we have a util to add the second array to the first.
        ArrayUtils.addWith(parameters, update);
        output.collect(new Tuple2<>(update.f0, parameters))
    }

    public void onIterationEnd(int[] round, Context context) {
        context.output(FINAL_MODEL_OUTPUT_TAG, parameters);
    }
}

...

Iterative algorithm on unbounded data streams in sync mode

Suppose now we would change the algorithm to unbounded iteration, compared to the offline, the differences is that

...

Page tree

Versions Compared

Old Version 14

New Version 15

Key

Example Usages

Iterative algorithm on bounded data streams in sync mode

Iterative algorithm on unbounded data streams in sync mode

Page tree

Page History

Versions Compared

Old Version 14

New Version 15

Key

Example Usages

Iterative algorithm on bounded data streams in sync mode

Iterative algorithm on unbounded data streams in sync mode