Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Contents

...

GSOC: Varnish Cache support in Apache Traffic Control

Background
Apache Traffic Control is a Content Delivery Network (CDN) control plane for large scale content distribution.

Traffic Control currently requires Apache Traffic Server as the underlying cache. Help us expand the scope by integrating with the very popular Varnish Cache.

There are multiple aspects to this project:

  • Configuration Generation: Write software to build Varnish configuration files (VCL). This code will be implemented in our Traffic Ops and cache client side utilities, both written in Go.
  • Health Monitoring: Implement monitoring of the Varnish cache health and performance. This code will run both in the Traffic Monitor component and within Varnish. Traffic Monitor is written in Go and Varnish is written in C.
  • Testing: Adding automated tests for new code

Skills:

  • Proficiency in Go is required
  • A basic knowledge of HTTP and caching is preferred, but not required for this project.
Difficulty: Major
Potential mentors:
Eric Friedrich, mail: friede (at) apache.org
Project Devs, mail: dev (at) trafficcontrol.apache.org

Cassandra

Improve visibility into repair state

Most of the repair flow is fire and forget, where we send a message, or start a job with a remote component, and then wait for a response of some type, or for the failure detector to tell us a node has died. This leaves several cases where the repair can hang, and operators have to guess about it’s state, and the best course of action. It would help if the state of a given repair could be polled, and possibly cancelled. This is going to involve touching the validation, anti-compaction, and streaming code.

Difficulty:
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Provide easy copypasta config formatting for nodetool get commands

Allow all nodetool commands which print out the state of the node or cluster to do so in a way that makes it easy to re-use or paste on other nodes or config files.

For example, the command getcompactionthroughput formats its output like this:

[jshook@cass4 bin]$ ./nodetool getcompactionthroughput      
            Current compaction throughput: 64 MB/s
            

But with an --as-yaml option, it could do this instead:

[jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-yaml
            compaction_throughput_mb_per_sec: 64

and with an --as-cli option, it could do this:

[jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-cli
            ./nodetool setcompactionthroughput 64

Any other nodetool standard options should simply be carried along to the --as-cli form, with the exception of -pw.

Any -pw options should be elided with a warning in comments, but -pwf options should be allowed. This would allow users using -pw to append a password at their discretion, but would allow -pwf to work as usual.

In the absence of either of the options above (--as-yaml or --as-cli) the formatting should not be changed to avoid breaking extant tool integrations.


Difficulty: Normal
Potential mentors:
paulo, mail: paulo (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Prevent and fail-fast any attempts to incremental repair cdc/mv tables

Running incremental repairs on CDC or MV tables breaks them.

Attempting to run incremental repair on such should fail-fast and be prevented, with a clear error message.

Difficulty: Normal
Potential mentors:
paulo, mail: paulo (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Script to autogenerate cassandra.yaml

It would be useful to have a script that can ask the user a few questions and generate a recommended cassandra.yaml based on their answers. This will help solve issues like selecting num_tokens. It can also be integrated into OS specific packaging tools such as debconf[1]. Rather than just documenting on the website, it is best to provide a simple script to auto-generate configuration based on common use-cases.

[1] https://wiki.debian.org/debconf

Difficulty: Normal
Potential mentors:
paulo, mail: paulo (at) apache

Per-node overrides for table settings

There is a few cases where it's convenient to set some table parameters on only one of a few nodes. For instance, it's useful for experimenting with settings like caching options, compaction, compression, read repair chance, gcGrace ... Another case is when you want to completely migrate to a new setting, but want to do that node-per-node (mainly useful when switching compaction strategy, see CASSANDRA-10898).

I'll note that we can already do some of this through JMX for some of the settings as we have methods like ColumnFamilyStoreMBean.setCompactionParameters(), but:

  1. parameters settings are initially set in CQL. Having to go to JMX for this sounds less consistent to me. The fact we have both a ColumnFamilyStoreMBean.setCompactionParameters() and a ColumnFamilyStoreMBean.setCompactionParametersJson() (as I assume the former one is inconvenient to use) is also proof to me than JMX ain't terribly appropriate.
  2. I think this can be potentially useful for almost all table settings, but we don't expose JMX methods for all settings, and it would be annoying to have to. The method suggested below wouldn't have to be updated every time we add a new settings (if done right).
  3. Changing options through JMX is not persistent across restarts. This may arguably be fine in some cases, but if you're trying to migrate your compaction strategy node per node, or want to experiment with a setting over a mediumish time period, it's mostly a pain.

So what I suggest would be add node overrides in the normal table setting (which would be part of the schema as any other setting). In other words, if you want to set LCS for only one specific node, you'd do:

ALTER TABLE foo WITH node_overrides = { '192.168.0.1' : { 'compaction' : { 'class' : 'LeveledCompactionStrategy' } }
            }
            

I'll note that I already suggested that idea on CASSANDRA-10898, but as it's more generic than what that latter ticket is about, so creating its own ticket.

Difficulty:
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Improve visibility into repair state

Most of the repair flow is fire and forget, where we send a message, or start a job with a remote component, and then wait for a response of some type, or for the failure detector to tell us a node has died. This leaves several cases where the repair can hang, and operators have to guess about it’s state, and the best course of action. It would help if the state of a given repair could be polled, and possibly cancelled. This is going to involve touching the validation, anti-compaction, and streaming code.

Difficulty: Challenging
Potential mentors:
paulo, mail: paulo

Expose application_name and application_version in virtual table system_views.clients

Recent java-driver's com.datastax.oss.driver.api.core.session.SessionBuilder respects properties ApplicationName and ApplicationVersion.

It would be helpful to exposed this information via virtual table system_views.clients and with nodetool clientstats.

Difficulty: Low Hanging Fruit
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Prevent and fail-fast any attempts to incremental repair cdc/mv tables

Running incremental repairs on CDC or MV tables breaks them.

Attempting to run incremental repair on such should fail-fast and be prevented, with a clear error message.

Difficulty:
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Migrate use of maven-ant-tasks to resolver-ant-tasks

Cassandra resolves dependencies and generates maven pom files through the use of maven-ant-tasks. This is no longer a supported project.

The recommended upgrade is to resolver-ant-tasks. It follows similar APIs so shouldn't be too impactful a change.

The existing maven-ant-tasks has caused some headaches already with internal super poms referencing insecure http:// central maven repository URLs that are no longer supported.

We should also take the opportunity to

  • define the "test" scope (classpath) for those dependencies only used for tests (currently we are packaging test dependencies into the release binary artefact),
  • remove the jar files stored in the git repo under the "lib/" folder.

These two above points have to happen in tandem, as the jar files under lib/ are those that get bundled into the build/dist/lib/ and hence the binary artefact. That is, all jar files under lib/ are the project's "compile" scope, and all other dependencies defined in build.xml are either "provided" or "test" scope. These different scopes for dependencies are currently configured in different maven-ant-tasks poms. See https://github.com/apache/cassandra/commit/d43b9ce5092f8879a1a66afebab74d86e9e127fb#r45659668

Difficulty: Normal
Potential mentors:
mck, mail: mck (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Add Plugin Support for CQLSH

Currently the Cassandra drivers offer a plugin authenticator architecture for the support of different authentication methods. This has been leveraged to provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, cqlsh, the included CLI tool, does not offer such support. Switching to a new enhanced authentication scheme thus means being cut off from using cqlsh in normal operation.

We should have a means of using the same plugins and authentication providers as the Python Cassandra driver.

Here's a link to an initial draft of CEP.

Difficulty: Normal
Potential mentors:
paulo, mail: paulo

Global configuration parameter to reject repairs with anti-compaction

We have moved from Cassandra 2.1 to 3.0 and from an operational aspect, the Cassandra repair area changed significantly / got more complex. Beside incremental repairs not working reliably, also full repairs (-full command-line option) are running into anti-compaction code paths, splitting repaired / non-repaired data into separate SSTables, even with full repairs.

Casandra 4.x (with repair enhancements) is quite away for us (for production usage), thus we want to avoid anti-compactions with Cassandra 3.x at any cost. Especially for our on-premise installations at our customer sites, with less control over on how e.g. nodetool is used, we simply want to have a configuration parameter in e.g. cassandra.yaml, which we could use to reject any repair invocations that results in anti-compaction being active.

I know, such a flag still can be flipped then (by the customer), but as a first safety stage possibly sufficient enough to reject anti-compaction repairs, e.g. if someone executes nodetool repair ... the wrong way (accidentally).

Difficulty:
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Add ability to ttl snapshots

It should be possible to add a TTL to snapshots, after which it automatically cleans itself up.

This will be useful together with the auto_snapshot option, where you want to keep an emergency snapshot in case of accidental drop or truncation but automatically remove it after a specified period when it's no longer useful. So in addition to allowing a user to specify a snapshot ttl on nodetool snapshot we should have a auto_snapshot_ttl option that allows a user to set a ttl for automatic snaphots on drop/truncate.

Difficulty: Normal
Potential mentors:
paulo, mail: paulo

Cleanup key ranges during compaction

Currently cleanup is considered an optional, manual operation that users are told to run to free disk space after a node was affected by topology changes. However, unmanaged key ranges could also end up on a node through other ways, e.g. manual added sstable files by an admin.

I'm also not sure unmanaged data is really that harmless and cleanup should really be optional, if you don't need to reclaim the disk space. When it comes to repairs, users are expected to purge a node after downtime in case it was not fully covered by a repair within gc_grace afterwards, in order to avoid re-introducing deleted data. But the same could happen with unmanaged data, e.g. after topology changes activate unmanaged ranges again or after restoring backups.

I'd therefor suggest to avoid rewriting key ranges no longer belonging to a node and older than gc_grace during compactions.

Maybe we could also introduce another CLEANUP_COMPACTION operation to find candidates based on SSTable.first/last in case we don't have pending regular or tombstone compactions.

Difficulty:
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Add ability to ttl snapshots

It should be possible to add a TTL to snapshots, after which it automatically cleans itself up.

This will be useful together with the auto_snapshot option, where you want to keep an emergency snapshot in case of accidental drop or truncation but automatically remove it after a specified period when it's no longer useful. So in addition to allowing a user to specify a snapshot ttl on nodetool snapshot we should have a auto_snapshot_ttl option that allows a user to set a ttl for automatic snaphots on drop/truncate.

Global configuration parameter to reject repairs with anti-compaction

We have moved from Cassandra 2.1 to 3.0 and from an operational aspect, the Cassandra repair area changed significantly / got more complex. Beside incremental repairs not working reliably, also full repairs (-full command-line option) are running into anti-compaction code paths, splitting repaired / non-repaired data into separate SSTables, even with full repairs.

Casandra 4.x (with repair enhancements) is quite away for us (for production usage), thus we want to avoid anti-compactions with Cassandra 3.x at any cost. Especially for our on-premise installations at our customer sites, with less control over on how e.g. nodetool is used, we simply want to have a configuration parameter in e.g. cassandra.yaml, which we could use to reject any repair invocations that results in anti-compaction being active.

I know, such a flag still can be flipped then (by the customer), but as a first safety stage possibly sufficient enough to reject anti-compaction repairs, e.g. if someone executes nodetool repair ... the wrong way (accidentally).

Difficulty: Normal
Potential mentors:
paulo, mail: paulo
Difficulty:
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Allow table property defaults (e.g. compaction, compression) to be specified for a cluster/keyspace

During an IRC discussion in cassandra-dev it was proposed that we could have table property defaults stored on a Keyspace or globally within the cluster. For example, this would allow users to specify "All new tables on this cluster should default to LCS with SSTable size of 320MiB" or "all new tables in Keyspace XYZ should have Zstd commpression with a 8 KiB block size" or "default_time_to_live should default to 3 days" etc ... This way operators can choose the default that makes sense for their organization once (e.g. LCS if they are running on fast SSDs), rather than requiring developers creating the Keyspaces/Tables to make the decision on every creation (often without context of which choices are right).

A few implementation options were discussed including:

  • A YAML option
  • Schema provided at the Keyspace level that would be inherited by any tables automatically
  • Schema provided at the Cluster level that would be inherited by any Keyspaces or Tables automatically

In IRC it appears that rough consensus was found in having global -> keyspace -> table defaults which would be stored in schema (no YAML configuration since this isn't node level really, it's a cluster level config).

Difficulty: Normal
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Add ability to disable schema changes, repairs, bootstraps, etc (during upgrades)

There are a lot of operations that aren't supposed to be run in a mixed version cluster: schema changes, repairs, topology changes, etc. However, it's easily possible for these operations to be accidentally run by a script, another user unaware of the upgrade, or an operator that's not aware of these rules.

We should make it easy to follow the rules by making it possible to prevent/disable all of these operations through nodetool commands. At the start of an upgrade, an operator can disable all of these until the upgrade has been completed.

Difficulty:
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Per-node overrides for table settings

There is a few cases where it's convenient to set some table parameters on only one of a few nodes. For instance, it's useful for experimenting with settings like caching options, compaction, compression, read repair chance, gcGrace ... Another case is when you want to completely migrate to a new setting, but want to do that node-per-node (mainly useful when switching compaction strategy, see CASSANDRA-10898).

I'll note that we can already do some of this through JMX for some of the settings as we have methods like ColumnFamilyStoreMBean.setCompactionParameters(), but:

  1. parameters settings are initially set in CQL. Having to go to JMX for this sounds less consistent to me. The fact we have both a ColumnFamilyStoreMBean.setCompactionParameters() and a ColumnFamilyStoreMBean.setCompactionParametersJson() (as I assume the former one is inconvenient to use) is also proof to me than JMX ain't terribly appropriate.
  2. I think this can be potentially useful for almost all table settings, but we don't expose JMX methods for all settings, and it would be annoying to have to. The method suggested below wouldn't have to be updated every time we add a new settings (if done right).
  3. Changing options through JMX is not persistent across restarts. This may arguably be fine in some cases, but if you're trying to migrate your compaction strategy node per node, or want to experiment with a setting over a mediumish time period, it's mostly a pain.

So what I suggest would be add node overrides in the normal table setting (which would be part of the schema as any other setting). In other words, if you want to set LCS for only one specific node, you'd do:

ALTER TABLE foo WITH node_overrides = { '192.168.0.1' : { 'compaction' : { 'class' : 'LeveledCompactionStrategy' } }
            }
            

I'll note that I already suggested that idea on CASSANDRA-10898, but as it's more generic than what that latter ticket is about, so creating its own ticket.

Difficulty: Challenging
Potential mentors:
paulo, mail: paulo (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Expose application_name and application_version in virtual table system_views.clients

Recent java-driver's com.datastax.oss.driver.api.core.session.SessionBuilder respects properties ApplicationName and ApplicationVersion.

It would be helpful to exposed this information via virtual table system_views.clients and with nodetool clientstats.

Difficulty: Normal
Potential mentors:
paulo, mail: paulo

Script to autogenerate cassandra.yaml

It would be useful to have a script that can ask the user a few questions and generate a recommended cassandra.yaml based on their answers. This will help solve issues like selecting num_tokens. It can also be integrated into OS specific packaging tools such as debconf[1]. Rather than just documenting on the website, it is best to provide a simple script to auto-generate configuration based on common use-cases.

[1] https://wiki.debian.org/debconf

Difficulty:
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Add

Plugin Support for CQLSH

ability to disable schema changes, repairs, bootstraps, etc (during upgrades)

There are a lot of operations that aren't supposed to be run in a mixed version cluster: schema changes, repairs, topology changes, etc. However, it's easily possible for these operations to be accidentally run by a script, another user unaware of the upgrade, or an operator that's not aware of these rules.

We should make it easy to follow the rules by making it possible to prevent/disable all of these operations through nodetool commands. At the start of an upgrade, an operator can disable all of these until the upgrade has been completed.

Difficulty: Normal
Potential mentors:
paulo, mail: paulo

Currently the Cassandra drivers offer a plugin authenticator architecture for the support of different authentication methods. This has been leveraged to provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, cqlsh, the included CLI tool, does not offer such support. Switching to a new enhanced authentication scheme thus means being cut off from using cqlsh in normal operation.

We should have a means of using the same plugins and authentication providers as the Python Cassandra driver.

Here's a link to an initial draft of CEP.

Difficulty:
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Migrate use of maven-ant-tasks to resolver-ant-tasks

Cassandra resolves dependencies and generates maven pom files through the use of maven-ant-tasks. This is no longer a supported project.

The recommended upgrade is to resolver-ant-tasks. It follows similar APIs so shouldn't be too impactful a change.

The existing maven-ant-tasks has caused some headaches already with internal super poms referencing insecure http:// central maven repository URLs that are no longer supported.

We should also take the opportunity to

  • define the "test" scope (classpath) for those dependencies only used for tests (currently we are packaging test dependencies into the release binary artefact),
  • remove the jar files stored in the git repo under the "lib/" folder.

These two above points have to happen in tandem, as the jar files under lib/ are those that get bundled into the build/dist/lib/ and hence the binary artefact. That is, all jar files under lib/ are the project's "compile" scope, and all other dependencies defined in build.xml are either "provided" or "test" scope. These different scopes for dependencies are currently configured in different maven-ant-tasks poms. See https://github.com/apache/cassandra/commit/d43b9ce5092f8879a1a66afebab74d86e9e127fb#r45659668

Difficulty: Normal
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

Allow table property defaults (e.g. compaction, compression) to be specified for a cluster/keyspace

During an IRC discussion in cassandra-dev it was proposed that we could have table property defaults stored on a Keyspace or globally within the cluster. For example, this would allow users to specify "All new tables on this cluster should default to LCS with SSTable size of 320MiB" or "all new tables in Keyspace XYZ should have Zstd commpression with a 8 KiB block size" or "default_time_to_live should default to 3 days" etc ... This way operators can choose the default that makes sense for their organization once (e.g. LCS if they are running on fast SSDs), rather than requiring developers creating the Keyspaces/Tables to make the decision on every creation (often without context of which choices are right).

A few implementation options were discussed including:

  • A YAML option
  • Schema provided at the Keyspace level that would be inherited by any tables automatically
  • Schema provided at the Cluster level that would be inherited by any Keyspaces or Tables automatically

In IRC it appears that rough consensus was found in having global -> keyspace -> table defaults which would be stored in schema (no YAML configuration since this isn't node level really, it's a cluster level config).

Difficulty: Challenging
Potential mentors:
paulo, mail: paulo

Provide easy copypasta config formatting for nodetool get commands

Allow all nodetool commands which print out the state of the node or cluster to do so in a way that makes it easy to re-use or paste on other nodes or config files.

For example, the command getcompactionthroughput formats its output like this:

[jshook@cass4 bin]$ ./nodetool getcompactionthroughput      
            Current compaction throughput: 64 MB/s
            

But with an --as-yaml option, it could do this instead:

[jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-yaml
            compaction_throughput_mb_per_sec: 64

and with an --as-cli option, it could do this:

[jshook@cass4 bin]$ ./nodetool getcompactionthroughput --as-cli
            ./nodetool setcompactionthroughput 64

Any other nodetool standard options should simply be carried along to the --as-cli form, with the exception of -pw.

Any -pw options should be elided with a warning in comments, but -pwf options should be allowed. This would allow users using -pw to append a password at their discretion, but would allow -pwf to work as usual.

In the absence of either of the options above (--as-yaml or --as-cli) the formatting should not be changed to avoid breaking extant tool integrations.

Difficulty:
Potential mentors:
, mail: (at) apache.org
Project Devs, mail: dev (at) cassandra.apache.org

...