Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagejava
@PublicEvolving
interface Stage<T extends Stage<T>> extends WithParams<T>, Serializable {

    /**
     * This method checks the compatibility between input schemas, stage's parameters and stage's
     * logic. It should raise an exception if there is any mismatch, e.g. the number of input
     * schemas is wrong, or if a required field is missing from a schema.
     *
     * <p>If there is no mismatch, the method derives and returns the output schemas from the input
     * schemas.
     *
     * <p>Note that the output schemas of a given Estimator instance should equal the output schemas
     * of the Transformer instance fitted by this Estimator instance, suppose the same list of input
     * schemas are used as inputs to the fit/transform methods respectively.
     *
     * @param schemas the list of schemas of the input tables.
     * @return the list of schemas of the output tables.
     */
    TableSchema[] transformSchemas(TableSchema... schemas);

    /** Skipped */
    default String toJson() {...}

    /** Skipped */
    default void loadJson(String json) {...}
}


@PublicEvolving
public interface Transformer<T extends Transformer<T>> extends Stage<T> {

    /**
     * Applies the Transformer on the given input tables, and returns the result tables.
     *
     * @param inputs a list of tables
     * @return a list of tables
     */
    Table[] transform(Table... inputs);

    /**
     * Uses the given list of tables to update internal states. This can be useful for e.g. online
     * learning where an Estimator fits an infinite stream of training samples and streams the model
     * diff data to this Transformer.
     *
     * <p>This method may be called at most once.
     *
     * @param inputs a list of tables
     */
    default void setStateStreams(Table... inputs) {
        throw new UnsupportedOperationException("this method is not implemented");
    }

    /**
     * Gets a list of tables representing changes of internal states of this Transformer. These
     * tables might come from the Estimator that instantiated this Transformer.
     *
     * @return a list of tables
     */
    default Table[] getStateStreams() {
        throw new UnsupportedOperationException("this method is not implemented");
    }
}


@PublicEvolving
public interface Estimator<E extends Estimator<E, M>, M extends Transformer<M>> extends Stage<E> {

    /**
     * Trains on the given inputs and produces a Transformer. If this Estimator may be used to
     * compose a Pipeline, the transform method of the returned Transformer should be able to accept
     * a list of tables of the same length and schemas as the fit method of this Estimator.
     *
     * @param inputs a list of tables
     * @return a Transformer
     */
    M fit(Table... inputs);
}

@PublicEvolving
public final class Pipeline implements Estimator<Pipeline, PipelineModel> {

    public Pipeline(List<Stage<?>> stages) {...}

    @Override
    public PipelineModel fit(Table... inputs) {...}

    /** Skipped a few methods, including the implementations of the Estimator APIs. */
}


@PublicEvolving
public final class PipelineModel implements Transformer<PipelineModel> {

    public PipelineModel(List<Transformer<?>> transformers) {...}

    /** Skipped a few methods, including the implementations of the Transformer APIs. */
}



The following code block shows the interface of Graph, GraphModel and GraphBuilder proposed by this FLIP.

...

Code Block
languagejava
/**
 * A Graph acts as an Estimator. It consists of a DAG of stages, each of which is either an
 * Estimator or Transformer.
 */
@PublicEvolving
public final class Graph implements Estimator<Graph, GraphModel> {
    public Graph(...) {...}

    @Override
    public GraphModel fit(Table... inputs) {...}

    @Override
    public TableSchema[] transformSchemas(TableSchema... schemas) {
        return schemas;
    }

    /** Skipped a few methods, including the implementations of some Estimator APIs. */
}


/** A GraphModel acts GraphBuilderas helpsa connectTransformer. StageIt instancesconsists intoof a GraphDAG orof GraphModelTransformers. */
@PublicEvolving
public final class GraphModel GraphBuilderimplements Transformer<GraphModel> {

    private/** intSkipped maxOutputLengtha = 20;

    public GraphBuilder() {

    }few methods, including the implementations of the Transformer APIs. */

}


/** A GraphBuilder helps connect Stage instances into a Graph or GraphModel. */
@PublicEvolving
public final class GraphBuilder {

    /**
     * Specifies the upper bound (could be loose) of the number of output tables that can be
 returned by the
  * returned by *the Transformer::getStateStreams and Transformer::transform methods, for any
 stage involved    * stage involved in this Graph.
     *
     * The<p>The default upper bound is 20.
     */
    public GraphBuilder setMaxOutputLength(int maxOutputLength) {
        this.maxOutputLength = maxOutputLength;
        return this;
    ...}

    /**
     * Creates a TableId associated with this GraphBuilder. It can be used to specify the passing of
     * tables between stages, as well as the input/output tables of the Graph/GraphModel generated by
     * by this builder.
     */
    public TableId createTableId() {...}

    /**
    return new TableId();
    }

    /**
     * * The Graph::fit and GraphModel::transform should invoke the fit/transform of the corresponding
 stage with the
  * stage with *the corresponding inputs.
     *
     * Returns<p>Returns a list of TableIds, which represents outputs of the Transformer::transform
     * invocation.
     */
    public TableId[] getOutputs(Stage<?> stage, TableId... inputs) {...}

    /**
    return * new TableId[maxOutputLength];
    }

    /**
     * The The GraphModel::setStateStreams should invoke the setStateStreams of the corresponding stage
 with the
   * with *the corresponding inputs.
     */
    void setStateStreams(Stage<?> stage, TableId... inputs) {...}


    /**
     * The GraphModel::getStateStreams should invoke the getStateStreams of the corresponding stage.
     *
     * Returns<p>Returns a list of TableIds, which represents outputs of the getStateStreams invocation.
     */
    TableId[] getStateStreams(Stage<?> stage) {...}

    /**
    return new TableId[maxOutputLength];
    }

    /**
     * Returns a * Returns a Graph instance which the following API specification:
     * - Graph::fit should take
     * inputs and returns a GraphModel with the following specification. - GraphModel::transform
     * - GraphModel::transform should take inputs and returns outputs.
     * - GraphModel::setStateStreams should take inputStates.
     * inputStates. - GraphModel::getStateStreams should return outputStates.
     *
     * The<p>The fit/transform/setStateStreams/getStateStreams should invoke the APIs of the internal
 stages in
   * stages *in the order specified by the DAG of stages.
     */
    Graph build(
            TableId[] inputs, TableId[] outputs, TableId[] inputStates, TableId[] outputStates) {
        return new Graph();
    ...}

    /**
     * Returns a GraphModel instance which the following API specification: - GraphModel::transform
     * - GraphModel::transform should should take inputs and returns outputs.
     * - GraphModel::setStateStreams should take inputStates.
     * inputStates. - GraphModel::getStateStreams should return outputStates.
     *
     * The<p>The transform/setStateStreams/getStateStreams should invoke the APIs of the internal
 stages in
   * stages *in the order specified by the DAG of stages.
     *
     * This<p>This method throws exception if any stage of this graph is an Estimator.
     */
    GraphModel buildModel(
            TableId[] inputs, TableId[] outputs, TableId[] inputStates, TableId[] outputStates) {...}

    // The TableId is necessary to pass the inputs/outputs of various API returncalls new GraphModel();
across the
    // Graph/GraphModel }stags.

    static class TableId {}

}








Proposed Changes

Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.

...