Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This FLIP proposes to add the Graph, GraphModel, GraphBuilder, GraphNode and TableId classes. The following code block shows the public APIs of these classes.

1) Add the TableId class to represent the input/output of a stage.

This class is necessary in order to construct the DAG before we have the concrete Tables available. And this class overrides the equals/hashCode so that it can be used as the key of a hash map.

Code Block
languagejava
public class TableId {
    private final int tableId;

    @Override
    public boolean equals(Object obj) {...}

    @Override
    public int hashCode() {...}
}


2) Add the GraphNode class.

This class contains the stage as well as the input/output of this stage in the form of TableId lists. A DAG can thus be represented as a list of GraphNodes.

Code Block
languagejava
public class GraphNode {
    public final Stage<?> stage;
    public final TableId[] estimatorInputs;
    public final TableId[] modelInputs;
    public final TableId[] outputs;
}


3) Add the Graph class to wrap a DAG of Estimator/Model/Transformer/AlgoOperaor into an Estimator.

Code Block
languagejava
/**
 * A Graph acts as an Estimator. A Graph consists of a DAG of stages, each of which could be an
 * Estimator, Model, Transformer or AlgoOperator. When `Graph::fit` is called, the stages are
 * executed in a topologically-sorted order. If a stage is an Estimator, its `Estimator::fit` method
 * will be called on the input tables (from the input edges) to fit a Model. Then the Model will be
 * used to transform the input tables and produce output tables to the output edges. If a stage is
 * an AlgoOperator, its `AlgoOperator::transform` method will be called on the input tables and
 * produce output tables to the output edges. The GraphModel fitted from a Graph consists of the
 * fitted Models and AlgoOperators, corresponding to the Graph's stages.
 */
@PublicEvolving
public final class Graph implements Estimator<Graph, GraphModel> {
    public Graph(List<GraphNode> nodes, TableId[] estimatorInputIds, TableId[] modelInputs, TableId[] outputs, TableId[] inputModelData, TableId[] outputModelData) {...}

    @Override
    public GraphModel fit(Table... inputs) {...}

    @Override
    public void save(String path) throws IOException/**
 * A Graph acts as an Estimator. A Graph consists of a DAG of stages, each of which could be an
 * Estimator, Model, Transformer or AlgoOperator. When `Graph::fit` is called, the stages are
 * executed in a topologically-sorted order. If a stage is an Estimator, its `Estimator::fit` method
 * will be called on the input tables (from the input edges) to fit a Model. Then the Model will be
 * used to transform the input tables and produce output tables to the output edges. If a stage is
 * an AlgoOperator, its `AlgoOperator::transform` method will be called on the input tables and
 * produce output tables to the output edges. The GraphModel fitted from a Graph consists of the
 * fitted Models and AlgoOperators, corresponding to the Graph's stages.
 */
@PublicEvolving
public final class Graph implements Estimator<Graph, GraphModel> {
    public Graph(List<GraphNode> nodes, TableId[] estimatorInputIds, TableId[] modelInputs, TableId[] outputs, TableId[] inputModelData, TableId[] outputModelData) {...}

    @Override
    public GraphModel fit(Table... inputs) {...}

    @Override
    public void save(String path) throws IOException {...}

    @Override
    public static Graph load(String path) throws IOException {...}
}

/**
 * A GraphModel acts as a Model. A GraphModel consists of a DAG of stages, each of which could be a
 * Model, Transformer or AlgoOperators. When `GraphModel::transform` is called, the stages are
 * executed in a topologically-sorted order. When a stage is executed, its `AlgoOperator::transform`
 * method will be called on the input tables (from the input edges) and produce output tables to the
 * output edges.
 */
public final class GraphModel implements Model<GraphModel> {

    public GraphModel(List<GraphNode> nodes, TableId[] inputIds, TableId[] outputIds, TableId[] inputModelData, TableId[] outputModelData) {...}

    @Override
    public Table[] transform(Table... inputTables) {...}

    @Override
    public void setModelData(Table... inputs) {...}

    @Override
    public Table[] getModelData() {...}

    @Override
    public void save(String path) throws IOException {...}

    public static GraphModel load(String path) throws IOException {...}
}


/**
 * A GraphBuilder provides APIs to build Graph/Model/AlgoOperator from a DAG of stages, each of
 * which could be an Estimator, Model, Transformer or AlgoOperator.
 */
@PublicEvolvingstatic Graph load(String path) throws IOException {...}
}


4) Add the GraphModel class to wrap a DAG of Estimator/Model/Transformer/AlgoOperaor into a Model.

Code Block
languagejava
/**
 * A GraphModel acts as a Model. A GraphModel consists of a DAG of stages, each of which could be an
 * Estimator, Model, Transformer or AlgoOperators. When `GraphModel::transform` is called, the
 * stages are executed in a topologically-sorted order. When a stage is executed, its
 * `AlgoOperator::transform` method will be called on the input tables (from the input edges) and
 * produce output tables to the output edges.
 */
public final class GraphBuilderGraphModel implements Model<GraphModel> {

    public private int maxOutputLength = 20;

    public GraphBuilder() {GraphModel(List<GraphNode> nodes, TableId[] inputIds, TableId[] outputIds, TableId[] inputModelData, TableId[] outputModelData) {...}

    /**@Override
    public * Specifies the upper bound (could be loose) of the number of output tables that can be
     * returned by the Transformer::getModelData and AlgoOperator::transform methods, for any stageTable[] transform(Table... inputTables) {...}

    @Override
    public void setModelData(Table... inputs) {...}

    @Override
    public * involved in this Graph.
Table[] getModelData() {...}

     *@Override
     * <p>The default upper bound ispublic 20.
     */void save(String path) throws IOException {...}

    public static GraphBuilderGraphModel setMaxOutputLengthload(intString maxOutputLengthpath) throws IOException {...}
}


5) Add the GraphBuilder class to build GraphModel or Graph from a DAG of stages.

Code Block
languagejava
/**
 * A GraphBuilder provides APIs to build /**
     * Creates a TableId associated with this GraphBuilder. It can be used to specify the passing ofEstimator/Model/AlgoOperator from a DAG of stages, each of
 * which could be an Estimator, Model, Transformer or AlgoOperator.
 */
@PublicEvolving
public final class GraphBuilder {
    private *int maxOutputLength tables between stages, as well as the input/output tables of the Graph/GraphModel generated= 20;

    public GraphBuilder() {}

    /**
     * bySpecifies thisthe builder.
upper bound (could be loose) */
of the number of publicoutput TableIdtables createTableId() {...}

    /**that can be
     * Ifreturned by the Transformer::getModelData stage is an Estimator, both its fit method and the transform method of its fittedand AlgoOperator::transform methods, for any stage
     * involved in this Graph.
     *
 Model would be invoked with* the<p>The givendefault inputsupper whenbound theis graph runs20.
     */
    public *GraphBuilder <p>If this stage is a Model, Transformer or AlgoOperator, its transform method would be
     * invoked with the given inputs when the graph runs.setMaxOutputLength(int maxOutputLength) {...}

    /**
     * Creates a TableId associated with this GraphBuilder. It can be used to specify the passing of
     *
 tables between stages, as *well as <p>Returnsthe ainput/output listtables of TableIds, which represents outputs of AlgoOperator::transform of the given stage the Graph/GraphModel generated
     * by this builder.
     */
    public TableId[] getOutputs(Stage<?> stage, TableId... inputscreateTableId() {...}

    /**
     * If thisthe stage is an Estimator, both its fit method would be invoked with estimatorInputs, and the
     * transform method of its fitted Model would be invoked with modelInputs.
     * transform method of its fitted
     * <p>ThisModel methodwould throwsbe Exceptioninvoked ifwith the given stageinputs iswhen notthe angraph Estimatorruns.
     *
     * <p>This<p>If this methodstage is usefula whenModel, theTransformer stateor isAlgoOperator, anits Estimatortransform ANDmethod the Estimator::fit needs to takewould be
     * a different list of Tables frominvoked with the Model::transform of given inputs when the fittedgraph Modelruns.
     *
     * <p>Returns a list of TableIds, which represents outputs of ModelAlgoOperator::transform of the fittedgiven Modelstage.
     */
    public TableId[] getOutputs(Stage<?> stage, TableId[] estimatorInputs, TableId[] modelInputs) {...}

    /**... inputs) {...}

    /**
     * If this stage is an Estimator, its fit method would be invoked with estimatorInputs, and the
     * Thetransform setModelData()method of theits fitted GraphModelModel shouldwould invokebe theinvoked setModelData() of the givenwith modelInputs.
     * stage with the given inputs.
     */
 <p>This method throws publicException void setModelData(Stage<?>if the stage, TableId... inputs) {...}

 is not an Estimator.
     /**
     * <p>This method Theis getModelData()useful ofwhen the fittedstate is GraphModelan shouldEstimator invokeAND the getModelData() of the given
     * stageEstimator::fit needs to take
     * a different list of Tables from the Model::transform of the fitted Model.
     *
     * <p>Returns a list of TableIds, which represents the outputs of getModelData()Model::transform of the given
     * stagefitted Model.
     */
    public TableId[] getModelDatagetOutputs(Stage<?> stage, TableId[] estimatorInputs, TableId[] modelInputs) {...}

    /**
     * Returns an Estimator instance with the following behavior:
     *
     * <p>1) Estimator::fit should take the given inputs and return a Model with the following The setModelData() of the fitted GraphModel should invoke the setModelData() of the given
     * behavior.
stage with the given  *inputs.
     */
 <p>2) Model::transform should takepublic the given inputs and return the given outputs.
void setModelData(Stage<?> stage, TableId... inputs) {...}

     /**
     * <p>The fit methodThe getModelData() of the fitted returnedGraphModel Estimatorshould andinvoke the transformgetModelData() method of the fitted Modelgiven
     * stage.
     *
 should    invoke* the<p>Returns correspondinga methodslist of TableIds, which represents the internaloutputs stagesof as specified bygetModelData() of the given
     * GraphBuilderstage.
     */
    public Estimator<?, ?> buildEstimator(TableId[] inputs, TableId[] outputsgetModelData(Stage<?> stage) {...}

    /**
     * Returns an Estimator instance with the following behavior:
     *
     * <p>1) Estimator::fit should take the given inputs and returns a Model with the following
     * behavior.:
     *
     * <p>2<p>1) ModelEstimator::transformfit should take the given inputs and return thea givenModel outputs.
with the    *following
     * <p>3) Model::setModelData should take the given inputModelDatabehavior.
     *
     * <p>4<p>2) Model::getModelDatatransform should take the given inputs and return the given outputModelDataoutputs.
     *
     * <p>The fit method of the returned Estimator and the transform/setModelData/getModelData
     * methods transform method of the fitted Model
     * should invoke the corresponding methods of the internal stages as
 specified by the
  * specified by the* GraphBuilder.
     */
    public Estimator<?, ?> buildEstimator(TableId[] inputs, TableId[] outputs, TableId[] inputModelData, TableId[] outputModelData) {...}



    /**
     * Returns an Estimator instance with the following behavior:
     *
     /**
     * Returns an Estimator instance* <p>1) Estimator::fit should take the given inputs and returns a Model with the following
 behavior:    * behavior.
     *
     * <p>1<p>2) EstimatorModel::fittransform should take the given estimatorInputsinputs and returns a Model with thereturn the given outputs.
     *
     * following behavior <p>3) Model::setModelData should take the given inputModelData.
     *
     * <p>2<p>4) Model::transformgetModelData should takereturn the given transformerInputsoutputModelData.
 and return the given outputs.*
     * <p>The fit method of the returned Estimator and the transform/setModelData/getModelData
     * <p>3)methods of the fitted Model::setModelData should takeinvoke the givencorresponding inputModelData.
methods of the internal stages *as
     * <p>4) Model::getModelData should return * specified by the given outputModelDataGraphBuilder.
     */
    public * <p>The fit method of the returned Estimator and the transform/setModelData/getModelDataEstimator<?, ?> buildEstimator(TableId[] inputs, TableId[] outputs, TableId[] inputModelData, TableId[] outputModelData) {...}

     /**
 methods of the fitted Model should* invokeReturns thean correspondingEstimator methodsinstance ofwith the internal stages asfollowing behavior:
     *
 specified by the GraphBuilder.
 * <p>1) Estimator::fit should */
take the given estimatorInputs publicand Estimator<?, ?> buildEstimator(TableId[] estimatorInputs, TableId[] modelInputs, TableId[] outputs, TableId[] inputModelData, TableId[] outputModelData) {...}

returns a Model with the
     * following behavior.
     /**
     * Returns an AlgoOperator instance with <p>2) Model::transform should take the given transformerInputs and return the followinggiven behavior:outputs.
     *
     * <p>1<p>3) AlgoOperatorModel::transformsetModelData should take the given inputs and returnsinputModelData.
     *
     * <p>4) Model::getModelData should return the given outputsoutputModelData.
     *
     * <p>The transformfit method of the returned AlgoOperator should invoke the corresponding methods
     * Estimator and the transform/setModelData/getModelData
     * methods of the fitted Model should invoke the corresponding methods of the internal stages as
     * specified by the GraphBuilder.
     */
    public Estimator<?, ?> buildEstimator(TableId[] estimatorInputs, TableId[] modelInputs, TableId[] public AlgoOperator<?> buildAlgoOperator(outputs, TableId[] inputsinputModelData, TableId[] outputsoutputModelData) {...}

    /**
     * Returns aan ModelAlgoOperator instance with the following behavior:
     *
     * <p>1) ModelAlgoOperator::transform should take the given inputs and returns the given outputs.
     *
     * <p>The transform method of the returned ModelAlgoOperator should invoke the corresponding methods
 of the
   * of *the internal stages as specified by the GraphBuilder.
     */
    public Model<AlgoOperator<?> buildModelbuildAlgoOperator(TableId[] inputs, TableId[] outputs) {...}

    /**
     * Returns a Model instance with the following behavior:
     *
     * <p>1) Model::transform should take the given inputs and returns the given outputs.
     *
     * <p>2) Model::setModelData should take the given inputModelData <p>The transform method of the returned Model should invoke the corresponding methods of the
     * internal stages as specified by the GraphBuilder.
     */
    public * <p>3) Model::getModelData should return the given outputModelData.
Model<?> buildModel(TableId[] inputs, TableId[] outputs) {...}

     /**
     * <p>The transform/setModelData/getModelData methods of the returned Model should invoke theReturns a Model instance with the following behavior:
     *
     * corresponding methods of <p>1) Model::transform should take the internalgiven stagesinputs asand specifiedreturns bythe thegiven GraphBuilderoutputs.
     */
    public Model<?> buildModel(TableId[] inputs, TableId[] outputs, TableId[] inputModelData, TableId[] outputModelData) {...}
}

public class GraphNode {
    public final Stage<?> stage;
    public final TableId[] estimatorInputs; * <p>2) Model::setModelData should take the given inputModelData.
     *
     * <p>3) Model::getModelData should return the given outputModelData.
    public final*
 TableId[] modelInputs;
   * public final TableId[] outputs;
}

public class TableId {
    private final int tableId;

    @Override
    public boolean equals(Object obj) {...}

    @Override<p>The transform/setModelData/getModelData methods of the returned Model should invoke the
     * corresponding methods of the internal stages as specified by the GraphBuilder.
     */
    public intModel<?> hashCode(buildModel(TableId[] inputs, TableId[] outputs, TableId[] inputModelData, TableId[] outputModelData) {...}
}


Example Usage

In this section we provide examples code snippets to demonstrate how we can use the APIs proposed in this FLIP to address the use-cases in the motivation section.

...