Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Another question:  should we remove obligation for data table for predict to be in the same format as training table?

Model Management

A propose approach to model management and model versioning is given in this JIRA.   It includes automatically saving models that have been run in the past, and adding more metadata to the summary table (e.g., time taken to train model).

Misc

  • Independent variables should be allowed to be SQL expressions (including *). Further a ‘exclude’ parameter could be provided to remove features from a ‘*’ list. Columns that are used in ‘id’ or ‘grouping’ should automatically be removed. See ‘tree_train’ for examples decision tree as an example.

  • Internal UDAs for simpler learning algorithms should be simple enough for external users to use them in situations where a table output is not desired. Training functions, even though storing the model in an output table, should output a string with relevant information. Example of informative string elements include output table name, time taken for training, …

  • Each train function name should end in ‘_train’. Further, it should create an output table and a summary table and their formats should be standard across all learning algorithms.

  • Should prediction metrics be part of the training output?

Output formats

In the first form the model is a composite type and opaque to the user. We can provide introspection functions to understand the model.  

...