Input interface

Train

 

AlgorithmType of functionInput (table name)Output (table name)Dependent variableIndependent variableOptimizer paramsContains verboseNotes
linregr_trainStored proceduresource_tableout_tabledependent_varnameindependent_varname   
logregr_trainStored proceduresource_tableout_tabledependent_varnameindependent_varname  Optimizer is separate param that takes values 'newton', 'cg', 'igd'
glmStored proceduresource_tablemodel_tabledependent_varnameindependent_varnamemax_iter=100, optimizer=irls,tolerance=1e-6 Called 'optim_params'
multinomStored proceduresource_tablemodel_tabledependent_varnameindependent_varnamemax_iter=100,optimizer=irls,tolerance=1e-6 Called 'optim_params'
ordinalStored proceduresource_tablemodel_tabledependent_varnameindependent_varnamemax_iter=100,optimizer=irls,tolerance=1e-6  
elastic_net_trainStored proceduretbl_sourcetbl_resultcol_dep_varcol_ind_varTwo sets of parameters that have no overlap 
- Contains multiple other parameters making it a long function
- Allows 'col_ind_var' = * but the excluded column parameter at the end (no immediately after)
- 'optimizer' param not a part of the 'optimizer_params' list
coxph_trainStored proceduresource_tableoutput_tabledependent_variableindependent_variable
max_iter=100, optimizer=newton, tolerance=1e-8, array_agg_size=10000000, sample_size=1000000
 
- Also has another Cox specific function: 'cox_zph'
- There are couple of deprecated functions that should be removed in next major version
svm_classificationStored proceduresource_tablemodel_tabledependent_varnameindependent_varnameMultiple parameters including max_iter=100, tolerance=1e-10Yes- Optimizer params and regularization are combined into 'params'
- 'kernel_func' and 'kernel_params' can potentially be combined
svm_regressionStored proceduresource_tablemodel_tabledependent_varnameindependent_varnameMultiple parameters including max_iter=100, tolerance=1e-10Yes- Optimizer params and regularization are combined into 'params'
- 'kernel_func' and 'kernel_params' can potentially be combined
svm_one_classStored proceduresource_tablemodel_table independent_varnameMultiple parameters including max_iter=100, tolerance=1e-10Yes- Optimizer params and regularization are combined into 'params'
- 'kernel_func' and 'kernel_params' can potentially be combined
tree_trainStored proceduretraining_table_nameoutput_table_namedependent_variablelist_of_features Yes
- Contains an 'id_col_name' before the 'dependent_variable'
- 'list_of_features_to_exclude' right after 'list_of_features'
- Contains many tree tuning parameters separated out: max_depth, min_split, min_bucket, num_splits etc
- Verbose input is called 'verbosity'
- Additional functions include tree_display and tree_surr_display
forest_trainStored proceduretraining_table_nameoutput_table_namedependent_variablelist_of_features Yes
- Contains an 'id_col_name' before the 'dependent_variable'
- 'list_of_features_to_exclude' right after 'list_of_features'
- Contains multiple forest tuning parameters: num_trees, num_random_features, importance, num_permutations
- Contains many tree tuning parameters separated out: max_depth, min_split, min_bucket, num_splits etc
- There is a 'sample_ratio' parameter after 'verbose'
- Additional functions include get_tree and get_tree_surr
arima_trainStored procedureinput_tableoutput_tabletimestamp_columntimeseries_column   
assoc_rulesStored procedureinput_tableoutput_schema    - The input_table and output_schema are not the first arguments
- verbose is not the last argument
kmeans_*Stored procedurerel_source
<composite type output>
 expr_point  
- max_num_iterations instead of max_iter
- There are multiple forms of function, each one returning the output as a composite type instead of storing results in a table.
- Other related function: closest_column(m, x) with meaningless argument names
simple_silhouetteStored procedurerel_source<double output> expr_point   
lda_trainStored proceduredata_table
model_table + output_data_table
    - lda_get_perplexity(model_table, output_data_table)


Predict

AlgorithmType of functionInput (table name)Output (table name)Dependent variableIndependent variableOptimizer paramsContains verboseNotes
linregr_predictUDFcoef  col_ind   
logregr_predictUDFcoefficients  ind_var   
glm_predictUDFcoef  col_ind_var  
Additional param of 'link' which is supposed to match the one used in training
multinom_predictStored proceduremodel_table + predict_table_inputoutput_table   Yes- Response or probability determined by 'predict_type'
- Contains 'id_column' as final optional param
ordinal_predictStored proceduremodel_table + predict_table_inputoutput_table   Yes- Response or probability determined by 'predict_type'
- No 'id_column' in this one
coxph_predictStored proceduremodel_table + source_tableoutput_table    - 'id_col_name' is mandatory and is placed before 'output_table'
- Response or probability determined by 'pred_type'
svm_predictStored proceduremodel_table + new_data_tableoutput_table    
"- 'id_col_name' is mandatory and is placed before 'output_table'
- No predict type input. Both 'prediction' and 'distance'/'probability' provided in output
tree_predictStored proceduretree_model + new_data_tableoutput_table    - Response or prob is determined by 'type'
forest_predictStored procedurerandom_forest_model + new_data_tableoutput_table    - Response or prob is determined by 'type'
arima_forecastStored proceduremodel_tableoutput_table    
- Additional argument 'steps_ahead'
- Called 'forecast' instead of 'predict' since they have different meanings in ARIMA
lda_predictStored proceduredata_table + model_tableoutput_table     

 

Output table

AlgorithmOutput tableSummary table
linregr_train<...>, coef, r2, std_err, t_stats, p_values, condition_no, bp_stats, bp_p_value, num_rows_processed, num_missing_rows_skippedsource_table, out_table, dependent_varname, independent_varname, num_rows_processed, num_missing_rows_skipped
logregr_train<...>, coef, log_likelihood, std_err, z_stats, p_values, odds_ratios, condition_no, num_iterations, num_rows_processed, num_missing_rows_skippedsource_table, out_table, dependent_varname, independent_varname, optimizer_params, num_all_groups, num_failed_groups, num_rows_processed, num_missing_rows_skipped
glm<...>, coef, log_likelihood, std_err, z_stats or t_stats, p_values, dispersion, num_rows_processed, num_rows_skipped, num_iterationsmethod, source_table, model_table, dependent_varname, independent_varname, family_params, grouping_col, optimizer_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped
multinom<...>, coef, log_likelihood, std_err, z_stats or t_stats, p_values, dispersion, num_rows_processed, num_rows_skipped, num_iterationsmethod, source_table, model_table, dependent_varname, independent_varname, family_params, grouping_col, optimizer_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped
ordinal<...>, coef_threshold, std_err_threshold, z_stats_threshold, p_values_threshold, log_likelihood, coef_feature, std_err_feature, z_stats_feature, p_values_feature, num_rows_processed, num_rows_skipped, num_iterationsmethod, source_table, model_table, dependent_varname, independent_varname, family_params, grouping_col, optimizer_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped
elastic_net_trainregress_family, features, features_selected, coef_nonzero, coef_all, intercept, log_likelihood, standardize, iteration_runmethod, source_table, out_table, dependent_varname, independent_varname, family, alpha, lambda_value, grouping_col, num_all_groups, num_failed_groups
coxph_traincoef, loglikelihood, std_err, stats, p_values, hessian, num_iterationssource_table, dependent_variable, independent_variable, right_censoring_status, strata, num_processed, num_missing_rows_skipped
svm_classificationcoef, grouping_key, num_rows_processed, num_rows_skipped, num_iterations, loss, norm_of_gradient, __dep_var_mappingmethod, version_number, source_table, model_table, dependent_varname, independent_varname, kernel_func, kernel_parameters, grouping_col, optim_params, reg_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped
svm_regression(same as above) 
svm_one_class(same as above) 
tree_train<...>, tree, cat_levels_in_text, cat_n_levels, tree_depth, pruning_cpmethod, is_classification, source_table, model_table, id_col_name, dependent_varname, independent_varname, cat_features, con_features, grouping_col, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped, dependent_var_levels, dependent_var_type, input_cp, independent_var_types
forest_traingid, sample_id, treemethod, is_classification, source_table, model_table, id_col_name, dependent_varname, independent_varname, cat_features, con_features, grouping_col, num_trees, num_random_features, max_tree_depth, min_split, min_bucket, num_splits, verbose, importance, num_permutations, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped, dependent_var_levels, dependent_var_type
arima_trainmean, mean_std_error, ar_params, ar_std_errors, ma_params, ma_std_errorsinput_table, timestamp_col, timeseries_col, non_seasonal_orders, include_mean, residual_variance, log_likelihood, iter_num, exec_time
assoc_rulesruleid, pre, post, count, support, confidence, lift, conviction 
kmeans_*(no output tables) 
simple_silhouette(no output tables) 
lda_trainvoc_size, topic_num, alpha, beta, modeldocid, wordcount, words, counts, topic_count, topic_assignment


  • No labels