Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • ID column(s): These should be present only if a ‘point_id’ parameter is provided.

  • Prediction for each tuple in ‘test_data_table’

  • ‘Confidence’ if required. The sign of the confidence should be such that higher values are better (eg. cost/distance would be negative while probability would be positive )

Misc

...

Another question:  should we remove obligation for data table for predict to be in the same format as training table?

Misc

  • Independent variables should be allowed to be SQL expressions (including *). Further a ‘exclude’ parameter could be provided to remove features from a ‘*’ list. Columns that are used in ‘id’ or ‘grouping’ should automatically be removed. See ‘tree_train’ for examples.

  • Internal UDAs for simpler learning algorithms should be simple enough for external users to use them in situations where a table output is not desired.

  • Training functions, even though storing the model in an output table, should output a string with relevant information. Example of informative string elements include output table name, time taken for training, …

  • Each train function name should end in ‘_train’. Further, it should create an output table and a summary table and their formats should be standard across all learning algorithms.

...