Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current stateDraft
Proof of concept demo available here: https://github.com/yukim/cassandra-opentelemetry-demo

Discussion thread: -

JIRA: -

...

OpenTelemetry spec and libraries are still evolving. Opentelemetry Java library provides stable support for Tracing and Metrics, however Logging support is experimental, as of the time of writing (May 2023).

See https://opentelemetry.io/status/ for up to date status.

The implementation will be separated in three parts:

...

Mailing list / Slack channels

Mailing list: 

Slack channel: 

Discussion threads: 

Related JIRA tickets

JIRA(s): 

  • -

...

Motivation

Troubleshooting Apache Cassandra can be time-consuming and challenging when faced with failures or performance issues. Without a proper observability system in place, it becomes difficult to identify the root cause of these problems.

Apache Cassandra already implements its own methods or relies on external libraries to provide operators and administrators with deep insights into the complex distributed database using the three pillars of observability: Tracing, Metrics, and Logging.

  • For tracing, Apache Cassandra has its own Tracing API.
  • For metrics collection and reporting, Apache Cassandra uses Codahale’s Metrics library to collect and expose through JMX.
  • For the logging, Apache Cassandra uses Slf4J/Logback logging library.

However, these features become significantly more valuable when available within an observability system that can correlate these telemetries together. Otherwise, operators and admins are left manually pulling out individual telemetries and assembling information by hand to make assumptions about the root cause of a problem.

To implement observability in Apache Cassandra, operators must devise their own methods to extract these telemetries and establish a monitoring stack. This often involves using open-source software like Prometheus/Grafana or commercial services like Datadog. The process of setting up the stack varies depending on the software used, making it complex and oftentimes overlooked by operators.

OpenTelemetry is the project hosted at CNCF to provide "A single, vendor-agnostic instrumentation library per language with support for both automatic and manual instrumentation"(https://opentelemetry.io/docs/concepts/what-is-opentelemetry/). It specifies APIs to collect tracing, metrics, and logging, and protocols to export to the external observation software.

...

Operators can provide the necessary jars and configuration to use other exporters (i.e. <appender name="OpenTelemetry"

            class="io.opentelemetry.instrumentation.logback.appender.v1_0.OpenTelemetryAppender">

  </appender> for tracing) as exporters as well. For example, if you want to export tracing to Jaeger, you need to add opentelemetry-exporter-jaeger.jar file in the classpath, and configure through jvm-server.options:

...