Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Compose an Estimator whose input schemas are different from its fitted Transformer 

...

Suppose we have the following Estimator and Transformer classes where an Estimator's input schemas could be different from the input schema of its fitted Transformer:

  • TransformerA whose transform(...) takes 1 input table and has 1 output table.
  • EstimatorA whose fit(...) takes 2 input tables and returns an instance of TransformerA.
  • TransformerB whose transform(...) takes 1 input table and has 1 output table.

And we want to compose an Estimator (e.g. Graph) from the following DAG of Transformer/Estimator.

Image Added

The resulting Graph::fit is expected to have the following behavior:

  • The method takes 2 input tables. Both tables are given to EstimatorA::fit.
  • EstimatorA fits the input tables and generates a TransformerA instance. The TransformerA instance takes 1 table input, which is different from the 2 tables given to the EstimatorA.
  • Returns a GraphModel instance which contains a TransformerA instance and a TransformerB instance, which are connected as a chain.

Note that the fitted GraphModel takes only 1 table as input.

Here is the code snippet that addresses this use-case by using the proposed APIs:

Code Block
languagejava
GraphBuilder builder = new GraphBuilder();

// Creates nodes
Stage<?> stage1 = new EstimatorA();
Stage<?> stage2 = new TransformerB();
// Creates inputs
TableId estimatorInput1 = builder.createTableId();
TableId estimatorInput2 = builder.createTableId();
TableId transformerInput1 = builder.createTableId();

// Feeds inputs to nodes and gets outputs.
TableId output1 = builder.getOutputs(stage1, new TableId[] {estimatorInput1, estimatorInput2}, new TableId[] {transformerInput1})[0];
TableId output2 = builder.getOutputs(stage2, output1)[0];

// Specifies the ordered lists of estimator inputs, transformer inputs, outputs, input states and output states
// that will be used as the inputs/outputs of the corresponding Graph and GraphModel APIs.
TableId[] estimatorInputs = new TableId[] {estimatorInput1, estimatorInput2};
TableId[] transformerInputs = new TableId[] {transformerInput1};
TableId[] outputs = new TableId[] {output2};
TableId[] inputStates = new TableId[] {};
TableId[] outputStates = new TableId[] {};

// Use the Graph instance as an Estimator.
Graph graph = builder.build(estimatorInputs, transformerInputs, outputs, inputStates, outputStates);
GraphModel model = graph.fit(...);
Table[] results = model.transform(...);

Compatibility, Deprecation, and Migration Plan

...