...
A common discrepancy not highlighted in the document above is `col` vs `column`. Some arguments names are of the form `*_column_name` while others are `*_col_name`. One of these two must be chosen and applied across the whole product (including internal source code).
Named parameters
Change the parameter lists to named parameters like scikit-learn, rather than the ordered set of parameters currently used in MADlib where you can't do things out of order.
Interfaces for Cross validation
...
group_col1 | group_col2 | coef | std_err | ... |
u1 | v1 | <coef for u1, v1> | <std. err for u1, v1> | |
u1 | v2 | <coef for u1, v2> | ||
... | ||||
u2 | v1 | <coef for u2, v1> |
Summary table could include:
...
- total_rows_processed
- total_rows_skipped skipped
- time_stamp_start
- time_stamp_end
- elapsed_time
- user_string_1 (user label or name)
- user_string_2 (user description)