Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here we list the additions and the changes to the Flink ML API.

1) Removed TableEnvironment from the parameter list of fit/transform APIs Updated Transformer/Estimator to take list of tables as inputs and return list of tables as output.

This change simplifies the usage of fit/transform APIsaddresses the use-cases described in the motivation section, e.g. a graph embedding Estimator needs to take 2 tables as inputs.

2) Added PipelineModel and let Pipeline implement only the Estimator. Pipeline is no longer a TransformerGraph, GraphModel and GraphBuilder.

This change makes the experience of using Pipeline consistent with the experience of using Estimator/Transformer, where a class is either an Estimator or a Transformer.

3) Removed Pipeline::appendStage from the Pipeline class.

This change makes the concept of Pipeline consistent with that of Graph/GraphBuilder. Neither Graph nor Pipeline provides the API to construct themselves.

4) Removed the Model interface.

This change simplifies the class hierarchy by removing a redundant class. It follows the philosophy of only adding complexity when we have explicit use-case for it.

5) Renamed PipelineStage to Stage and add the PublicEvolving tag to the Stage interface.

This change is reasonable because we will now compose Graph (not just Pipeline) using this class.

6) Added transformSchemas()  to the Stage interface.

This is needed to validate the compatibility of input schemas with a given Estimator/Transformer instance.

7) Updated Transformer/Estimator to take list of tables as inputs and return list of tables as output.

This change addresses the use-cases described in the motivation section, e.g. a graph embedding Estimator needs to take 2 tables as inputs.

8) Added setStateStreams and getStateStreams to the Transformer interface.

This change addresses the use-cases described in the motivation section, where a running Transformer needs to ingest the model state streams emitted by a Estimator, which could be running on a different machine.

9) Added Graph, GraphModel and GraphBuilder.

addresses the use-cases described in the motivation section, where we need to compose an Estimator from a DAG of Estimator/Transformer. Note that the Graph/GraphBuilder supports Estimator class whose input schemas are different from its fitted Transformer.

3) Added transformSchemas()  to the Stage interface.

This is needed to validate the compatibility of input schemas with a given Estimator/Transformer instance.

4) Added setStateStreams and getStateStreams to the Transformer interface.

This change addresses the use-cases described in the motivation section, where a running Transformer needs to ingest the model state streams emitted by a Estimator, which could be running on a different machine.

5) Removed TableEnvironment from the parameter list of fit/transform APIs.

This change simplifies the usage of fit/transform APIs.

6) Added PipelineModel and let Pipeline implement only the Estimator. Pipeline is no longer a Transformer.

This change makes the experience of using Pipeline consistent with the experience of using Estimator/Transformer, where a class is either an Estimator or a Transformer.

7) Removed Pipeline::appendStage from the Pipeline class.

This change makes the concept of Pipeline consistent with that of Graph/GraphBuilder. Neither Graph nor Pipeline provides the API to construct themselves.

8) Removed the Model interface.

This change simplifies the class hierarchy by removing a redundant class. It follows the philosophy of only adding complexity when we have explicit use-case for it.

9) Renamed PipelineStage to Stage and add the PublicEvolving tag to the Stage interface.

This change is reasonable because we will now compose Graph (not just Pipeline) using this classThis change addresses the use-cases described in the motivation section, where we need to compose an Estimator from a DAG of Estimator/Transformer. Note that the Graph/GraphBuilder supports Estimator class whose input schemas are different from its fitted Transformer.

Interfaces and classes after the proposed API changes

...