Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

TypeData GranularitySynchronization Pattern Bounded / UnboundedExamples
Non-SGD-basedEpochMostly SynchronousBoundedK-Means, Apriori, Decision Tree, Random Walk

SGD-Based Synchronous Offline algorithm

BatchBatch → Epoch*SynchronousBoundedLinear Regression, Logistic Regression, Deep Learning algorithms
SGD-Based Asynchronous Offline algorithmBatchBatch → Epoch*AsynchronousBoundedSame to the above
SGD-Based Asynchronous Online algorithmBatchSynchronousUnboundedOnline version of the above algorithm
SGD-Based Asynchronous Online algorithmBatchAsynchronousUnboundedOnline version of the above algorithm

According to how the model is updated, the algorithms could be classified into the following types:

...

*Although SGD-based algorithms are also batch-based, it could be implemented with an Epoch-based method if intermediate state is allowed: the subtasks could sample a batch from all the records from the position of the last batch. 


Based on the above classification and the replacement implementation for SGD-based algorithms with bounded dataset, we mainly need to support

...



Besides, the previous DataStream and DataSet iteration APIs also have some caveats to support algorithm implementation:

...