ID column(s): These should be present only if a ‘point_id’ parameter is provided.
Prediction for each tuple in ‘test_data_table’
‘Confidence’ if required. The sign of the confidence should be such that higher values are better (eg. cost/distance would be negative while probability would be positive )

Misc

...

Another question: should we remove obligation for data table for predict to be in the same format as training table?

Independent variables should be allowed to be SQL expressions (including *). Further a ‘exclude’ parameter could be provided to remove features from a ‘*’ list. Columns that are used in ‘id’ or ‘grouping’ should automatically be removed. See ‘tree_train’ for examples.
Internal UDAs for simpler learning algorithms should be simple enough for external users to use them in situations where a table output is not desired.
Training functions, even though storing the model in an output table, should output a string with relevant information. Example of informative string elements include output table name, time taken for training, …
Each train function name should end in ‘_train’. Further, it should create an output table and a summary table and their formats should be standard across all learning algorithms.

...

Page tree