Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

1) Removed TableEnvironment from the parameter list of fit/transform APIs.

This change simplifies the usage of fit/transform APIs.

2) Added PipelineModel and let Pipeline implement only the Estimator. Pipeline is no longer a Transformer.

This change makes the experience of using Pipeline consistent with the experience of using Estimator/Transformer, where a class is either an Estimator or a Transformer.

3) Removed Pipeline::appendStage from the Pipeline class.

4) Updated Transformer/Estimator to take list of tables as inputs and return list of tables as output.

This change makes the concept of Pipeline consistent with that of Graph/GraphBuilder. Neither Graph nor Pipeline provides the API to construct themselves.

45) Removed the Model interface..

This change simplifies the class hierarchy by removing a redundant class. It follows the philosophy of only adding complexity when we have explicit use-case for it.

56) Renamed PipelineStage to Stage and add the PublicEvolving tag to the Stage interface.7) Added transformSchemas()  to the Stage interface.

This

...

change is reasonable because we will now compose Graph (not just Pipeline) using this class.

6) Added transformSchemas()  to the Stage interface.

This is needed to validate the compatibility of input schemas with a given Estimator/Transformer instance.

7) Updated Transformer/Estimator to take list of tables as inputs and return list of tables as output.

This change addresses the use-cases described in the motivation section, e.g. a graph embedding Estimator needs to take 2 tables as inputs.

8) Added setStateStreams and getStateStreams to the Transformer interface.

This change addresses the use-cases described in the motivation section, where a running Transformer needs to ingest the model state streams emitted by a Estimator, which could be running on a different machine.

9) Added Graph, GraphModel and GraphBuilder.

This change addresses the use-cases described in the motivation section, where we need to compose an Estimator from a DAG of Estimator/Transformer.

Example Usage

1) Here is an example of composing an Estimator from

8) Added setStateStreams and getStateStreams to the Transformer interface.

9) Added Graph, GraphModel and GraphBuilder.

Example Usage

In this section, we provide an example of using GraphBuilder to construct a DAG of Estimator/Transformer by using the proposed APIs.

Suppose we have the following Transformer and Estimator classes:

...

And we want to compose an Estimator (e.g. Graph) from the following DAG of Transformer/Estimator.

The resulting Graph::fit has is expected to have the following behavior:

  • The method takes 2 input tables. The 1st input table is given to a TransformerA instance. And the 2nd input table is given to another TransformerA instance.
  • An EstimatorB instance fits the output tables of these two TransformerA instances and generates a new TransformerB instance.
  • Returns a GraphModel instance which contains 2 TransformerA instance and 1 TransformerB instance, connected using the same DAG as shown above.


Here is the code snippet to complete this goalwhich achieves the expected goal by using the proposed APIs:

Code Block
languagejava
GraphBuilder builder = new GraphBuilder();

// Creates nodes
Stage<?> stage1 = new TransformerA();
Stage<?> stage2 = new TransformerA();
Stage<?> stage3 = new EstimatorB();
// Creates inputs and inputStates
TableId input1 = builder.createTableId();
TableId input2 = builder.createTableId();
// Feeds inputs to nodes and gets outputs.
TableId output1 = builder.getOutputs(stage1, input1)[0];
TableId output2 = builder.getOutputs(stage2, input2)[0];
TableId output3 = builder.getOutputs(stage3, output1, output2)[0];

// Specifies the ordered lists of inputs, outputs, input states and output states that will
// be used as the inputs/outputs of the corresponding Graph and GraphModel APIs.
TableId[] inputs = new TableId[] {input1, input2};
TableId[] outputs = new TableId[] {output3};
// Generates the Graph instance.
Graph graph = builder.build(inputs, outputs, new TableId[]{}, new TableId[]{});

// Use the Graph instance as an Estimator.
GraphModel model = graph.fit(...);
Table[] results = model.transform(...);

...