THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Code Block | ||
---|---|---|
| ||
@PublicEvolving
interface Stage<T extends Stage<T>> extends WithParams<T>, Serializable {
/**
* This method checks the compatibility between input schemas, stage's parameters and stage's
* logic. It should raise an exception if there is any mismatch, e.g. the number of input
* schemas is wrong, or if a required field is missing from a schema.
*
* <p>If there is no mismatch, the method derives and returns the output schemas from the input
* schemas.
*
* <p>Note that the output schemas of a given Estimator instance should equal the output schemas
* of the Transformer instance fitted by this Estimator instance, suppose the same list of input
* schemas are used as inputs to the fit/transform methods respectively.
*
* @param schemas the list of schemas of the input tables.
* @return the list of schemas of the output tables.
*/
TableSchema[] transformSchemas(TableSchema... schemas);
/** Skipped */
default String toJson() {...}
/** Skipped */
default void loadJson(String json) {...}
}
@PublicEvolving
public interface Transformer<T extends Transformer<T>> extends Stage<T> {
/**
* Applies the Transformer on the given input tables, and returns the result tables.
*
* @param inputs a list of tables
* @return a list of tables
*/
Table[] transform(Table... inputs);
/**
* Uses the given list of tables to update internal states. This can be useful for e.g. online
* learning where an Estimator fits an infinite stream of training samples and streams the model
* diff data to this Transformer.
*
* <p>This method may be called at most once.
*
* @param inputs a list of tables
*/
default void setStateStreams(Table... inputs) {
throw new UnsupportedOperationException("this method is not implemented");
}
/**
* Gets a list of tables representing changes of internal states of this Transformer. These
* tables might come from the Estimator that instantiated this Transformer.
*
* @return a list of tables
*/
default Table[] getStateStreams() {
throw new UnsupportedOperationException("this method is not implemented");
}
}
@PublicEvolving
public interface Estimator<E extends Estimator<E, M>, M extends Transformer<M>> extends Stage<E> {
/**
* Trains on the given inputs and produces a Transformer. If this Estimator may be used to
* compose a Pipeline, the transform method of the returned Transformer should be able to accept
* a list of tables of the same length and schemas as the fit method of this Estimator.
*
* @param inputs a list of tables
* @return a Transformer
*/
M fit(Table... inputs);
}
@PublicEvolving
public final class Pipeline implements Estimator<Pipeline, PipelineModel> {
public Pipeline(List<Stage<?>> stages) {...}
@Override
public PipelineModel fit(Table... inputs) {...}
/** Skipped a few methods, including the implementations of the Estimator APIs. */
}
@PublicEvolving
public final class PipelineModel implements Transformer<PipelineModel> {
public PipelineModel(List<Transformer<?>> transformers) {...}
/** Skipped a few methods, including the implementations of the Transformer APIs. */
}
|
The following code block shows the interface of Graph, GraphModel and GraphBuilder proposed by this FLIP.
...
Code Block | ||
---|---|---|
| ||
/** * A Graph acts as an Estimator. It consists of a DAG of stages, each of which is either an * Estimator or Transformer. */ @PublicEvolving public final class Graph implements Estimator<Graph, GraphModel> { public Graph(...) {...} @Override public GraphModel fit(Table... inputs) {...} @Override public TableSchema[] transformSchemas(TableSchema... schemas) { return schemas; } /** Skipped a few methods, including the implementations of some Estimator APIs. */ } /** A GraphModel acts GraphBuilderas helpsa connectTransformer. StageIt instancesconsists intoof a GraphDAG orof GraphModelTransformers. */ @PublicEvolving public final class GraphModel GraphBuilderimplements Transformer<GraphModel> { private/** intSkipped maxOutputLengtha = 20; public GraphBuilder() { }few methods, including the implementations of the Transformer APIs. */ } /** A GraphBuilder helps connect Stage instances into a Graph or GraphModel. */ @PublicEvolving public final class GraphBuilder { /** * Specifies the upper bound (could be loose) of the number of output tables that can be returned by the * returned by *the Transformer::getStateStreams and Transformer::transform methods, for any stage involved * stage involved in this Graph. * * The<p>The default upper bound is 20. */ public GraphBuilder setMaxOutputLength(int maxOutputLength) { this.maxOutputLength = maxOutputLength; return this; ...} /** * Creates a TableId associated with this GraphBuilder. It can be used to specify the passing of * tables between stages, as well as the input/output tables of the Graph/GraphModel generated by * by this builder. */ public TableId createTableId() {...} /** return new TableId(); } /** * * The Graph::fit and GraphModel::transform should invoke the fit/transform of the corresponding stage with the * stage with *the corresponding inputs. * * Returns<p>Returns a list of TableIds, which represents outputs of the Transformer::transform * invocation. */ public TableId[] getOutputs(Stage<?> stage, TableId... inputs) {...} /** return * new TableId[maxOutputLength]; } /** * The The GraphModel::setStateStreams should invoke the setStateStreams of the corresponding stage with the * with *the corresponding inputs. */ void setStateStreams(Stage<?> stage, TableId... inputs) {...} /** * The GraphModel::getStateStreams should invoke the getStateStreams of the corresponding stage. * * Returns<p>Returns a list of TableIds, which represents outputs of the getStateStreams invocation. */ TableId[] getStateStreams(Stage<?> stage) {...} /** return new TableId[maxOutputLength]; } /** * Returns a * Returns a Graph instance which the following API specification: * - Graph::fit should take * inputs and returns a GraphModel with the following specification. - GraphModel::transform * - GraphModel::transform should take inputs and returns outputs. * - GraphModel::setStateStreams should take inputStates. * inputStates. - GraphModel::getStateStreams should return outputStates. * * The<p>The fit/transform/setStateStreams/getStateStreams should invoke the APIs of the internal stages in * stages *in the order specified by the DAG of stages. */ Graph build( TableId[] inputs, TableId[] outputs, TableId[] inputStates, TableId[] outputStates) { return new Graph(); ...} /** * Returns a GraphModel instance which the following API specification: - GraphModel::transform * - GraphModel::transform should should take inputs and returns outputs. * - GraphModel::setStateStreams should take inputStates. * inputStates. - GraphModel::getStateStreams should return outputStates. * * The<p>The transform/setStateStreams/getStateStreams should invoke the APIs of the internal stages in * stages *in the order specified by the DAG of stages. * * This<p>This method throws exception if any stage of this graph is an Estimator. */ GraphModel buildModel( TableId[] inputs, TableId[] outputs, TableId[] inputStates, TableId[] outputStates) {...} // The TableId is necessary to pass the inputs/outputs of various API returncalls new GraphModel(); across the // Graph/GraphModel }stags. static class TableId {} } |
Proposed Changes
Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.
...