THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Anchor | ||||
---|---|---|---|---|
|
...
Code Block | ||
---|---|---|
| ||
AnyType avgvartransition::run(AnyType& args) { // get current state value AvgVarTransitionState<MutableArrayHandle<double> > state = args[0]; // get current row value double x = args[1].getAs<double>(); double d = (x - state.avg); // online update mean state.avg += d / static_cast<double>(state.numRows + 1); double new_d = (x - state.avg); double a = static_cast<double>(state.numRows) / static_cast<double>(state.numRows + 1); // online update variance state.var = state.var * a + d * new_d / static_cast<double>(state.numRows + 1); state.numRows ++; return state; } |
- there
- There are two arguments for
avg_var_transition
, as specified inavg_var.sql_in
. The first one is an array of SQLdouble
...
- type, corresponding to the current mean, variance, and number of rows traversed and the second one is a
double
...
- representing the current tuple value.
...
- We will describe
...
classAvgVarTransitionState
later. Basically it takesargs[0]
...
- , a SQL
double
...
- array, passes the data to the appropriate C++ types and stores them in the state instance.
...
- Both the mean and the variance are updated in an online manner to avoid accumulating large intermediate sum.
Merge function
Code Block | ||
---|---|---|
| ||
AnyType
avg_var_merge_states::run(AnyType& args) {
AvgVarTransitionState<MutableArrayHandle<double> > stateLeft = args[0];
AvgVarTransitionState<ArrayHandle<double> > stateRight = args[1];
// Merge states together and return
stateLeft += stateRight;
return stateLeft;
} |
- again, Again: the arguments contained in
AnyType& args
are defined inavg_var.sql_in
. - the The details are hidden in method of class
AvgVarTransitionState
which overloads the operator+=
Final function
Code Block | ||
---|---|---|
| ||
AnyType
avg_var_final::run(AnyType& args) {
AvgVarTransitionState<MutableArrayHandle<double> > state = args[0];
// If we haven't seen any data, just return Null. This is the standard
// behavior of aggregate function on empty data sets (compare, e.g.,
// how PostgreSQL handles sum or avg on empty inputs)
if (state.numRows == 0)
return Null();
return state;
} |
- class Class
AvgVarTransitionState
overloads theAnyType()
operator such that we can directly return state, an instance ofAvgVarTransitionState
, while the function is expected to return aAnyType
.
...
Code Block | ||
---|---|---|
| ||
/** * @brief Merge with another State object * * We update mean and variance in a online fashion * to avoid intermediate large sum. */ template <class OtherHandle> AvgVarTransitionState &operator+=( const AvgVarTransitionState<OtherHandle> &inOtherState) { if (mStorage.size() != inOtherState.mStorage.size()) throw std::logic_error("Internal error: Incompatible transition " "states"); double avg_ = inOtherState.avg; double var_ = inOtherState.var; uint16_t numRows_ = static_cast<uint16_t>(inOtherState.numRows); double totalNumRows = static_cast<double>(numRows + numRows_); double p = static_cast<double>(numRows) / totalNumRows; double p_ = static_cast<double>(numRows_) / totalNumRows; double totalAvg = avg * p + avg_ * p_; double a = avg - totalAvg; double a_ = avg_ - totalAvg; numRows += numRows_; var = p * var + p_ * var_ + p * a * a + p_ * a_ * a_; avg = totalAvg; return *this; } |
- Given the mean, variance and the size of two data sets, Welford’s method, computes computes the mean and variance of the two data sets combined.
...