Input interface

Train

Algorithm	Type of function	Input (table name)	Output (table name)	Dependent variable	Independent variable	Optimizer params	Contains verbose	Notes
linregr_train	Stored procedure	source_table	out_table	dependent_varname	independent_varname
logregr_train	Stored procedure	source_table	out_table	dependent_varname	independent_varname			Optimizer is separate param that takes values 'newton', 'cg', 'igd'
glm	Stored procedure	source_table	model_table	dependent_varname	independent_varname	max_iter=100, optimizer=irls,tolerance=1e-6		Called 'optim_params'
multinom	Stored procedure	source_table	model_table	dependent_varname	independent_varname	max_iter=100,optimizer=irls,tolerance=1e-6		Called 'optim_params'
ordinal	Stored procedure	source_table	model_table	dependent_varname	independent_varname	max_iter=100,optimizer=irls,tolerance=1e-6
elastic_net_train	Stored procedure	tbl_source	tbl_result	col_dep_var	col_ind_var	Two sets of parameters that have no overlap		- Contains multiple other parameters making it a long function - Allows 'col_ind_var' = * but the excluded column parameter at the end (no immediately after) - 'optimizer' param not a part of the 'optimizer_params' list
coxph_train	Stored procedure	source_table	output_table	dependent_variable	independent_variable	max_iter=100, optimizer=newton, tolerance=1e-8, array_agg_size=10000000, sample_size=1000000		- Also has another Cox specific function: 'cox_zph' - There are couple of deprecated functions that should be removed in next major version
svm_classification	Stored procedure	source_table	model_table	dependent_varname	independent_varname	Multiple parameters including max_iter=100, tolerance=1e-10	Yes	- Optimizer params and regularization are combined into 'params' - 'kernel_func' and 'kernel_params' can potentially be combined
svm_regression	Stored procedure	source_table	model_table	dependent_varname	independent_varname	Multiple parameters including max_iter=100, tolerance=1e-10	Yes	- Optimizer params and regularization are combined into 'params' - 'kernel_func' and 'kernel_params' can potentially be combined
svm_one_class	Stored procedure	source_table	model_table		independent_varname	Multiple parameters including max_iter=100, tolerance=1e-10	Yes	- Optimizer params and regularization are combined into 'params' - 'kernel_func' and 'kernel_params' can potentially be combined
tree_train	Stored procedure	training_table_name	output_table_name	dependent_variable	list_of_features		Yes	- Contains an 'id_col_name' before the 'dependent_variable' - 'list_of_features_to_exclude' right after 'list_of_features' - Contains many tree tuning parameters separated out: max_depth, min_split, min_bucket, num_splits etc - Verbose input is called 'verbosity' - Additional functions include tree_display and tree_surr_display
forest_train	Stored procedure	training_table_name	output_table_name	dependent_variable	list_of_features		Yes	- Contains an 'id_col_name' before the 'dependent_variable' - 'list_of_features_to_exclude' right after 'list_of_features' - Contains multiple forest tuning parameters: num_trees, num_random_features, importance, num_permutations - Contains many tree tuning parameters separated out: max_depth, min_split, min_bucket, num_splits etc - There is a 'sample_ratio' parameter after 'verbose' - Additional functions include get_tree and get_tree_surr
arima_train	Stored procedure	input_table	output_table	timestamp_column	timeseries_column
assoc_rules	Stored procedure	input_table	output_schema					- The input_table and output_schema are not the first arguments - verbose is not the last argument
kmeans_*	Stored procedure	rel_source	<composite type output>		expr_point			- max_num_iterations instead of max_iter - There are multiple forms of function, each one returning the output as a composite type instead of storing results in a table. - Other related function: closest_column(m, x) with meaningless argument names
simple_silhouette	Stored procedure	rel_source	<double output>		expr_point
lda_train	Stored procedure	data_table	model_table + output_data_table					- lda_get_perplexity(model_table, output_data_table)

Predict

Algorithm	Type of function	Input (table name)	Output (table name)	Dependent variable	Independent variable	Optimizer params	Contains verbose	Notes
linregr_predict	UDF	coef			col_ind
logregr_predict	UDF	coefficients			ind_var
glm_predict	UDF	coef			col_ind_var			Additional param of 'link' which is supposed to match the one used in training
multinom_predict	Stored procedure	model_table + predict_table_input	output_table				Yes	- Response or probability determined by 'predict_type' - Contains 'id_column' as final optional param
ordinal_predict	Stored procedure	model_table + predict_table_input	output_table				Yes	- Response or probability determined by 'predict_type' - No 'id_column' in this one
coxph_predict	Stored procedure	model_table + source_table	output_table					- 'id_col_name' is mandatory and is placed before 'output_table' - Response or probability determined by 'pred_type'
svm_predict	Stored procedure	model_table + new_data_table	output_table					"- 'id_col_name' is mandatory and is placed before 'output_table' - No predict type input. Both 'prediction' and 'distance'/'probability' provided in output
tree_predict	Stored procedure	tree_model + new_data_table	output_table					- Response or prob is determined by 'type'
forest_predict	Stored procedure	random_forest_model + new_data_table	output_table					- Response or prob is determined by 'type'
arima_forecast	Stored procedure	model_table	output_table					- Additional argument 'steps_ahead' - Called 'forecast' instead of 'predict' since they have different meanings in ARIMA
lda_predict	Stored procedure	data_table + model_table	output_table

Output table

Algorithm	Output table	Summary table
linregr_train	<...>, coef, r2, std_err, t_stats, p_values, condition_no, bp_stats, bp_p_value, num_rows_processed, num_missing_rows_skipped	source_table, out_table, dependent_varname, independent_varname, num_rows_processed, num_missing_rows_skipped
logregr_train	<...>, coef, log_likelihood, std_err, z_stats, p_values, odds_ratios, condition_no, num_iterations, num_rows_processed, num_missing_rows_skipped	source_table, out_table, dependent_varname, independent_varname, optimizer_params, num_all_groups, num_failed_groups, num_rows_processed, num_missing_rows_skipped
glm	<...>, coef, log_likelihood, std_err, z_stats or t_stats, p_values, dispersion, num_rows_processed, num_rows_skipped, num_iterations	method, source_table, model_table, dependent_varname, independent_varname, family_params, grouping_col, optimizer_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped
multinom	<...>, coef, log_likelihood, std_err, z_stats or t_stats, p_values, dispersion, num_rows_processed, num_rows_skipped, num_iterations	method, source_table, model_table, dependent_varname, independent_varname, family_params, grouping_col, optimizer_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped
ordinal	<...>, coef_threshold, std_err_threshold, z_stats_threshold, p_values_threshold, log_likelihood, coef_feature, std_err_feature, z_stats_feature, p_values_feature, num_rows_processed, num_rows_skipped, num_iterations	method, source_table, model_table, dependent_varname, independent_varname, family_params, grouping_col, optimizer_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped
elastic_net_train	regress_family, features, features_selected, coef_nonzero, coef_all, intercept, log_likelihood, standardize, iteration_run	method, source_table, out_table, dependent_varname, independent_varname, family, alpha, lambda_value, grouping_col, num_all_groups, num_failed_groups
coxph_train	coef, loglikelihood, std_err, stats, p_values, hessian, num_iterations	source_table, dependent_variable, independent_variable, right_censoring_status, strata, num_processed, num_missing_rows_skipped
svm_classification	coef, grouping_key, num_rows_processed, num_rows_skipped, num_iterations, loss, norm_of_gradient, __dep_var_mapping	method, version_number, source_table, model_table, dependent_varname, independent_varname, kernel_func, kernel_parameters, grouping_col, optim_params, reg_params, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped
svm_regression	(same as above)
svm_one_class	(same as above)
tree_train	<...>, tree, cat_levels_in_text, cat_n_levels, tree_depth, pruning_cp	method, is_classification, source_table, model_table, id_col_name, dependent_varname, independent_varname, cat_features, con_features, grouping_col, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped, dependent_var_levels, dependent_var_type, input_cp, independent_var_types
forest_train	gid, sample_id, tree	method, is_classification, source_table, model_table, id_col_name, dependent_varname, independent_varname, cat_features, con_features, grouping_col, num_trees, num_random_features, max_tree_depth, min_split, min_bucket, num_splits, verbose, importance, num_permutations, num_all_groups, num_failed_groups, total_rows_processed, total_rows_skipped, dependent_var_levels, dependent_var_type
arima_train	mean, mean_std_error, ar_params, ar_std_errors, ma_params, ma_std_errors	input_table, timestamp_col, timeseries_col, non_seasonal_orders, include_mean, residual_variance, log_likelihood, iter_num, exec_time
assoc_rules	ruleid, pre, post, count, support, confidence, lift, conviction
kmeans_*	(no output tables)
simple_silhouette	(no output tables)
lda_train	voc_size, topic_num, alpha, beta, model	docid, wordcount, words, counts, topic_count, topic_assignment

Page tree

Interface 2.0 - IO Standardization

Input interface

Train

Predict

Output table