Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It is common for teams using Cassandra to find themselves looking for a way to interact with large amounts of data for analytics workloads. However, Cassandra’s standard APIs aren’t designed for large scale data egress/ingest as the native read/write paths weren’t designed for bulk analytics.

The We're proposing the contribution of this CEP which would include a library component called Cassandra Spark Bulk Analytics (CSBA) library was developed for this exact purpose. It enables the implementation of custom Spark applications that can either read or write large amounts of Cassandra data at up to 1.7 Gbps/instance reads and up to 7 Gbps/instance writes (depending on hardware), by accessing the persistent storage of nodes in the cluster via the Cassandra Sidecar.

...