THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
Input interface
Train
Algorithm | Type of function | Input (table name) | Output (table name) | Dependent variable | Independent variable | Optimizer params | Contains verbose | Notes |
linregr_train | Stored procedure | source_table | out_table | dependent_varname | independent_varname | |||
logregr_train | Stored procedure | source_table | out_table | dependent_varname | independent_varname | Optimizer is separate param that takes values 'newton', 'cg', 'igd' | ||
glm | Stored procedure | source_table | model_table | dependent_varname | independent_varname | max_iter=100, optimizer=irls,tolerance=1e-6 | Called 'optim_params' | |
multinom | Stored procedure | source_table | model_table | dependent_varname | independent_varname | max_iter=100,optimizer=irls,tolerance=1e-6 | Called 'optim_params' | |
ordinal | Stored procedure | source_table | model_table | dependent_varname | independent_varname | max_iter=100,optimizer=irls,tolerance=1e-6 | ||
elastic_net_train | Stored procedure | tbl_source | tbl_result | col_dep_var | col_ind_var | Two sets of parameters that have no overlap | - Contains multiple other parameters making it a long function - Allows 'col_ind_var' = * but the excluded column parameter at the end (no immediately after) - 'optimizer' param not a part of the 'optimizer_params' list | |
coxph_train | Stored procedure | source_table | output_table | dependent_variable | independent_variable | max_iter=100, optimizer=newton, tolerance=1e-8, array_agg_size=10000000, sample_size=1000000 | - Also has another Cox specific function: 'cox_zph' - There are couple of deprecated functions that should be removed in next major version | |
svm_classification | Stored procedure | source_table | model_table | dependent_varname | independent_varname | Multiple parameters including max_iter=100, tolerance=1e-10 | Yes | - Optimizer params and regularization are combined into 'params' - 'kernel_func' and 'kernel_params' can potentially be combined |
svm_regression | Stored procedure | source_table | model_table | dependent_varname | independent_varname | Multiple parameters including max_iter=100, tolerance=1e-10 | Yes | - Optimizer params and regularization are combined into 'params' - 'kernel_func' and 'kernel_params' can potentially be combined |
svm_one_class | Stored procedure | source_table | model_table | independent_varname | Multiple parameters including max_iter=100, tolerance=1e-10 | Yes | - Optimizer params and regularization are combined into 'params' - 'kernel_func' and 'kernel_params' can potentially be combined | |
tree_train | Stored procedure | training_table_name | output_table_name | dependent_variable | list_of_features | Yes | - Contains an 'id_col_name' before the 'dependent_variable' - 'list_of_features_to_exclude' right after 'list_of_features' - Contains many tree tuning parameters separated out: max_depth, min_split, min_bucket, num_splits etc - Verbose input is called 'verbosity' - Additional functions include tree_display and tree_surr_display | |
forest_train | Stored procedure | training_table_name | output_table_name | dependent_variable | list_of_features | Yes | - Contains an 'id_col_name' before the 'dependent_variable' - 'list_of_features_to_exclude' right after 'list_of_features' - Contains multiple forest tuning parameters: num_trees, num_random_features, importance, num_permutations - Contains many tree tuning parameters separated out: max_depth, min_split, min_bucket, num_splits etc - There is a 'sample_ratio' parameter after 'verbose' - Additional functions include get_tree and get_tree_surr | |
arima_train | Stored procedure | input_table | output_table | timestamp_column | timeseries_column | |||
assoc_rules | Stored procedure | input_table | output_schema | - The input_table and output_schema are not the first arguments - verbose is not the last argument | ||||
kmeans_* | Stored procedure | rel_source | <composite type output> | expr_point | - max_num_iterations instead of max_iter - There are multiple forms of function, each one returning the output as a composite type instead of storing results in a table. - Other related function: closest_column(m, x) with meaningless argument names | |||
simple_silhouette | Stored procedure | rel_source | <double output> | expr_point | ||||
lda_train | Stored procedure | data_table | model_table + output_data_table | - lda_get_perplexity(model_table, output_data_table) |
Predict
Algorithm | Type of function | Input (table name) | Output (table name) | Dependent variable | Independent variable | Optimizer params | Contains verbose | Notes |
linregr_predict | UDF | coef | col_ind | |||||
logregr_predict | UDF | coefficients | ind_var | |||||
glm_predict | UDF | coef | col_ind_var | Additional param of 'link' which is supposed to match the one used in training | ||||
multinom_predict | Stored procedure | model_table + predict_table_input | output_table | Yes | - Response or probability determined by 'predict_type' - Contains 'id_column' as final optional param | |||
ordinal_predict | Stored procedure | model_table + predict_table_input | output_table | Yes | - Response or probability determined by 'predict_type' - No 'id_column' in this one | |||
coxph_predict | Stored procedure | model_table + source_table | output_table | - 'id_col_name' is mandatory and is placed before 'output_table' - Response or probability determined by 'pred_type' | ||||
svm_predict | Stored procedure | model_table + new_data_table | output_table | "- 'id_col_name' is mandatory and is placed before 'output_table' - No predict type input. Both 'prediction' and 'distance'/'probability' provided in output | ||||
tree_predict | Stored procedure | tree_model + new_data_table | output_table | - Response or prob is determined by 'type' | ||||
forest_predict | Stored procedure | random_forest_model + new_data_table | output_table | - Response or prob is determined by 'type' | ||||
arima_forecast | Stored procedure | model_table | output_table | - Additional argument 'steps_ahead' - Called 'forecast' instead of 'predict' since they have different meanings in ARIMA | ||||
lda_predict | Stored procedure | data_table + model_table | output_table |
Output table
Algorithm | Output table | Summary table | ||||||
linregr_train | <...>, coef, r2, std_err, t_stats, p_values, condition_no, bp_stats, bp_p_value, num_rows_processed, num_missing_rows_skipped | source_table, out_table, dependent_varname, independent_varname, num_rows_processed, num_missing_rows_skipped | ||||||
logregr_train | <...>, coef, log_likelihood, std_err, z_stats, p_values, odds_ratios, condition_no, num_iterations, num_rows_processed, num_missing_rows_skipped | source_table, out_table, dependent_varname, independent_varname, optimizer_params, num_all_groups, num_failed_groups, num_rows_processed, num_missing_rows_skipped | ||||||
glm | <...>, coef, log_likelihood, std_err, z_stats or t_stats, p_values, dispersion, num_rows_processed, num_rows_skipped, num_iterations | method, source_table, model_table, dependent_varname, independent_varname, family_params, grouping_col, optimizer_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped | ||||||
multinom | <...>, coef, log_likelihood, std_err, z_stats or t_stats, p_values, dispersion, num_rows_processed, num_rows_skipped, num_iterations | method, source_table, model_table, dependent_varname, independent_varname, family_params, grouping_col, optimizer_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped | ||||||
ordinal | <...>, coef_threshold, std_err_threshold, z_stats_threshold, p_values_threshold, log_likelihood, coef_feature, std_err_feature, z_stats_feature, p_values_feature, num_rows_processed, num_rows_skipped, num_iterations | method, source_table, model_table, dependent_varname, independent_varname, family_params, grouping_col, optimizer_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped | ||||||
elastic_net_train | regress_family, features, features_selected, coef_nonzero, coef_all, intercept, log_likelihood, standardize, iteration_run | method, source_table, out_table, dependent_varname, independent_varname, family, alpha, lambda_value, grouping_col, num_all_groups, num_failed_groups | ||||||
coxph_train | coef, loglikelihood, std_err, stats, p_values, hessian, num_iterations | source_table, dependent_variable, independent_variable, right_censoring_status, strata, num_processed, num_missing_rows_skipped | ||||||
svm_classification | coef, grouping_key, num_rows_processed, num_rows_skipped, num_iterations, loss, norm_of_gradient, __dep_var_mapping | method, version_number, source_table, model_table, dependent_varname, independent_varname, kernel_func, kernel_parameters, grouping_col, optim_params, reg_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped | ||||||
svm_regression | (same as above) | |||||||
svm_one_class | (same as above) | |||||||
tree_train | <...>, tree, cat_levels_in_text, cat_n_levels, tree_depth, pruning_cp | method, is_classification, source_table, model_table, id_col_name, dependent_varname, independent_varname, cat_features, con_features, grouping_col, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped, dependent_var_levels, dependent_var_type, input_cp, independent_var_types | ||||||
forest_train | gid, sample_id, tree | method, is_classification, source_table, model_table, id_col_name, dependent_varname, independent_varname, cat_features, con_features, grouping_col, num_trees, num_random_features, max_tree_depth, min_split, min_bucket, num_splits, verbose, importance, num_permutations, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped, dependent_var_levels, dependent_var_type | ||||||
arima_train | mean, mean_std_error, ar_params, ar_std_errors, ma_params, ma_std_errors | input_table, timestamp_col, timeseries_col, non_seasonal_orders, include_mean, residual_variance, log_likelihood, iter_num, exec_time | ||||||
assoc_rules | ruleid, pre, post, count, support, confidence, lift, conviction | |||||||
kmeans_* | (no output tables) | |||||||
simple_silhouette | (no output tables) | |||||||
lda_train | voc_size, topic_num, alpha, beta, model | docid, wordcount, words, counts, topic_count, topic_assignment |