Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The lack of official drivers does not favor cohesion around the Cassandra project. Many participants in the Cassandra ecosystem have created forks of the DataStax drivers but have not historically contributed changes back to the original drivers.
  • The dependency of the Apache Cassandra server itself on two of the DataStax drivers (Java and Python) has implications for the project's independence. This tension has been exacerbated by the release, in 2019, of version 4.0 of the Java Driver with major API changes. This was followed by a transition to maintenance mode of the 3.x series which is still used by the server and widely in the community.

Goals

Contributors employed by DataStax are is offering to donate to the Apache Software Foundation all seven DataStax-funded its seven Apache Cassandra drivers, currently hosted in the following GitHub repositories:

...

We think it is important to maintain the drivers together to retain cohesive API semantics and make sure they have similar functionality and feature support. It is therefore requested that all drivers be accepted eventually; see "Approach" and "Timeline" below for practicalities on the transfer operation itself, which does not need to be an all-at-once transfer.

Approach

Transferring large pieces of software like the drivers will require requires a strong level of coordination and involvement from designated members of both organizations involved in this operation (DataStax contributors, legal, and the ASF). 

The details of various aspects of this operation, such as governance, source code hosting, intellectual property, etc. are covered in detail in "Proposed Changes" below, where we discuss ways to deal with the challenges they pose and avoid identified potential pitfalls.

In order to minimize the risks of creating a suboptimal situation both for the drivers and the Cassandra project itself in the future, the donation process will be iterative, and will start with only the Java driver in a first phase; then, in a second phase, it will be extended to the remaining drivers. 

Once the Java driver is transferred, and before the others are transferred, we will revise the methodology described in this CEP, and if necessary, revise its parameters and adjust them accordingly. A second CEP may be required if the changes to the methodology are found to be substantial.

Timeline

There are two phases to be considered:

  1. Java driver donation and transfer to the ASF: We believe that this should be executed after the Cassandra 4.0 GA release, in order to not disturb the current efforts towards this major milestone.
    1. The timeline should allow for the whole intellectual property clearance process to take place, see below "Intellectual property".
  2. Remaining 6 drivers donation and transfer to the ASF: TBD based on discoveries from 1.

Mailing list / Slack channels

Mailing list: 

Slack channel: 

  • #cassandra-cep-drivers-donation

Discussion threads: 

Related JIRA tickets

JIRA(s): 

No existing Apache Cassandra Jira tickets relate to this CEP. However the following ticket can be mentioned for its historical relevance:

  • CASSANDRA-2761: Initial discussion around the removal of CQL drivers from the 0.8 branch.

Also, see below "Proposed Changes" for proposed new Jira projects.

We plan to donate the drivers in an iterative way, starting with the Java driver.

Mailing list / Slack channels

Mailing list: 

Slack channel: 

  • #cassandra-cep-drivers-donation

Discussion threads: 

Related JIRA tickets

JIRA(s): 

  • No existing Apache Cassandra Jira tickets relate to this CEP. However the following ticket can be mentioned for its historical relevance:

    • CASSANDRA-2761: Initial discussion around the removal of CQL drivers from the 0.8 branch.

    Also, see below "Proposed Changes" for proposed new Jira projects.



...

Motivation

By donating all its drivers to the Apache Cassandra project, we hope to:

  • Provide the Apache Cassandra project with "official" drivers and resolve the concern by the project community from the lack of drivers governance.
  • Demonstrate community goodwill and address the ask from some Cassandra PMC members that drivers should not be controlled by any organization external to the ASF.
  • Increase the cohesion of the Cassandra ecosystem by hosting together again both the Cassandra server and its most popular CQL drivers.
  • Provide the Cassandra project with a client-side reference implementation of its own native protocol. The DataStax Java driver, indeed, has served so far as the de facto reference implementation of said protocol.

Audience

The donation outlined in this CEP would be beneficial to the entire Cassandra community and ecosystem. 

Depending on the persona, two main audience groups can be outlined:

  • Apache Cassandra committers and PMC members, as well as DataStax driver committers will likely be affected directly to some level, but hopefully such impacts will remain limited mostly to adaptation to new governing bodies and rules, and to communication channels: Jira, mailing lists, Slack, etc. – which will be monitored closely.
  • Apache Cassandra users in the broad sense should benefit from the donation by having a stronger community built around the main project. However users should not be affected by the practicalities of the change proposed in this document. In particular, we would like to minimize the disruption caused by the donation and avoid massive user-facing changes to any of the drivers or server APIs, and to the drivers release funnels. See below "New or Changed Public Interfaces" for a detailed discussion.

 

Proposed Changes

A. Governance

We propose the creation of a Drivers subproject that will be responsible for the different drivers being donated. The subproject will be governed according to the subproject governance procedures.

B. Source Repositories

Each driver source repository will be transferred to a separate git repository, to be created. The intention is to donate the entire Git repository of each driver, including all existing commits, branches and tags. 

Practicalities:

If possible, we should keep the drivers hosted on GitHub; this way we could grant ownership of the current Github projects to the new GitHub organization, and redirects would be automatically created. This would reduce the disruption for those checking in the drivers codebase, or building them from the source.

C

Motivation

By donating all its drivers to the ASF, we hope to:

  • Provide the Apache Cassandra project with "official" drivers and resolve the concern by the project community from the lack of drivers governance.
  • Demonstrate community goodwill and address the ask from some Cassandra PMC members that drivers should not be controlled by any organization external to the ASF.
  • Increase the cohesion of the Cassandra ecosystem by hosting together again both the Cassandra server and its most popular CQL drivers.
  • Provide the Cassandra project with a client-side reference implementation of its own native protocol. The DataStax Java driver, indeed, has served so far as the de facto reference implementation of said protocol.

We should however avoid the situation we had back in 2011. In particular, this CEP attempts to strike a balance between the need for independent stewardship for both drivers and server, especially for day-to-day work; while keeping a reasonable amount of common shared governance for high-level decisions (roadmap, common features, etc.).

Audience

The donation outlined in this CEP would be beneficial to the entire Cassandra community and ecosystem. 

Depending on the persona, two main audience groups can be outlined:

  • Apache Cassandra committers and PMC members, as well as DataStax driver committers will likely be affected directly to some level, but hopefully such impacts will remain limited mostly to adaptation to new governing bodies and rules, and to communication channels: Jira, mailing lists, Slack, etc. – which will be monitored closely.
  • Apache Cassandra users in the broad sense should benefit from the donation by having a stronger community built around the main project. However users should not be affected by the practicalities of the change proposed in this document. In particular, we would like to minimize the disruption caused by the donation and avoid massive user-facing changes to any of the drivers or server APIs, and to the drivers release funnels. See below "New or Changed Public Interfaces" for a detailed discussion.

 

Proposed Changes

A. Governance

We think it is best to avoid creating a separate top-level Apache project, and suggest that the drivers should be included in Apache Cassandra as a single subproject under the governance of  the Apache Cassandra Committee.

There is precedent for incubating drivers as separate top-level projects: for instance Apache Curator, which is an Apache Zookeeper client donated by Netflix, is a top-level Apache project. It seems however that this dual-project approach has caused significant disruption to the projects when coordinating releases and addressing legal concerns.

On the other hand, many Apache projects have subprojects: Apache Felix and Apache Cocoon, for instance. 

In summary, the subproject approach seems to bring a good trade-off between project independence and coordination, thus appearing as the best option to start with:

  • On one side, for most users, drivers are indissociable from the server, and it simply makes more sense to see both hosted together. On a practical level, new feature development will likely require coordination between server and drivers, and future roadmap topics can overlap between server and drivers; by having the drivers as a subproject, CEPs can easily be created for these situations. Similarly, major architectural or API changes in the drivers could impact the server, and thus also require coordination, especially given that some of the donated drivers are being nested and used extensively in the server (internode communication, cqlsh, tests, etc.). By having the drivers in the same project as the server, we can more easily detect and prepare the impacts of such API changes.
  • On the other side, by accommodating the drivers in a separate subproject, we still can guarantee a minimal level of independence, especially for daily maintenance and release procedures. See below for in-depth discussion of these matters. 

Note that a recurrent concern has been voiced already: current Apache Cassandra committers would have to become knowledgeable of the incoming drivers, and maintain the new code body going forward; this is exacerbated due to different programming languages being incorporated. This legitimate concern will be hopefully mitigated by accepting new driver committers, see "Committership" below.

Also note that it has been considered to further distribute the PMC members in different groups to better differentiate each subproject, e.g.:

               PMC

             /    \

      Drivers      Cassandra

    /    |    \          \

Drv1   Dvr2   Dvr3...   Sidecar?

However in order to avoid any risk of management overhead, we think that members should fully trust each other to only intervene in domains where they are knowledgeable, and therefore think that such groups are not necessary, at least for the initial transfer phase. This might of course be reviewed in the future.

B. Source Repositories

Each driver source repository will be transferred to a separate git repository, to be created. Our intention is to donate the entire Git repository of each driver, including all existing commits, branches and tags. 

Subprojects are usually hosted in separate source repositories: Apache Spark, Apache Beam and Apache Hadoop for instance have various repositories under the general project umbrella.

The single repo approach was also considered, but we have reasons to believe it will be inappropriate for the present case:

  • That was the situation back in 2011 with clearly articulated downsides.
  • Release cycles: drivers should keep fairly independent release cycles, which doesn't play nicely with the single repo approach.
  • Drivers need to maintain compatibility with a variety of server versions; having drivers and server in the same repo would inevitably lead to constant confusion and overhead about whether a given driver version only works for a given server version (especially if release cycles were coupled, and even more so if versions were aligned, which we do not want – again, see below for in-depth discussion).

Practicalities:

  • If possible, we should keep the drivers hosted on GitHub; this way we could grant ownership of the current Github projects to the new GitHub organization, and redirects would be automatically created. This would reduce the disruption for those checking in the drivers codebase, or building them from the source.

C. Commitership

As stated above, we can reasonably assume that the current Apache Cassandra project contributors will not have all the expertise (or bandwidth) to develop drivers in seven different languages. 

Several members of the PMC and committers on Cassandra have stated that they think that driver contributors should be made committers on the Cassandra project upon this donation in order to continue developing and maintaining these projects.

Following is an initial list of individuals who have made meaningful contributions to drivers now or in the recent past:

...

Contributor

...

Relevant Driver Expertise

...

Olivier Michallat

...

Java

...

Alexandre Dutra

...

Java

...

Andrew Tolbert

...

Java, Node.js

...

Erik Merkle

...

Java

...

Greg Bestland

...

Java, Python

...

Tomasz Lelek

...

Java

...

Bret McGuire

...

Java

...

Adam Holmberg

...

Python, C++

...

Alan Boudreault

...

Python

...

Jim Witschey

...

Python

...

Jorge Bay Gondra

...

Node.js, C#

...

Joao Reis

...

C#

...

Michael Penick

...

C++, PHP

...

Michael Fero

...

C++, PHP

...

Sandeep Tamhankar

...

Ruby, C++, Java

...

Bulat Shakirzyanov

...

Ruby, PHP

The same people will need to hold credentials or be assigned owner status of the artifacts in package indices, such as Maven Central, PyPI, NPM and Nuget.

It is worth noting the variety of employers of the above individuals; there is no guarantee that they are still involved on the project nor have a patron to fund their working on the project, and accepting the committer role is a personal decision made on a case-by-case basis.

It is also worth noting that two drivers are currently considered in maintenance mode: PHP and Ruby. This is due mostly to their most active developers not being able to work on these drivers anymore; this situation is unfortunately not expected to change in the near future.

...

. Mailing Lists

We suggest that the donated drivers should use the existing Apache Cassandra "user" and "developer" mailing lists, but distinct, per-driver lists for Jira notifications and commits. 

...

I. Intellectual Property

As we are not advocating for the drivers to will be donated as a separate project, the whole incubation procedure is not required. But this does not mean that the donated code is ready to be integrated. Instead, we subproject they will have to abide by the "lightweight" incubation procedure described here, which is "designed to allow code to be imported with alacrity while still providing for oversight".

...

We suggest that these features be donated as well. Indeed, the removal of such features, a few months only after their inclusion in the unified drivers, would be a source of confusion for users. We as driver contributors are committed to not including any more proprietary features in the drivers once they are transferred to the ASF The community will then decide how they wish to proceed with those features after the donation.

Note that some of the DataStax-specific features require external dependencies; and notably DSE Graph, which requires Apache Tinkerpop artifacts. In the case of the Java driver, this is a series of Maven artifacts that the driver declares as mandatory dependencies; however, it is smart enough to live without them if they are excluded. In the case of the other drivers, this is a dependency on the corresponding GLV, and can be excluded as well.

...