Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents



...

Status

Note: This CEP is based on a Draft CEP that was started here

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).


...

Scope


In 2011, drivers were removed from the Apache Cassandra project. An inspection of the project history shows that in-tree drivers weren't working well "because the people who wanted to contribute to the drivers were for the most part not Committers, and the committers for the most part weren't interested in reviewing drivers patches". 

...

  • The lack of official drivers does not favor cohesion around the Cassandra project. Many participants in the Cassandra ecosystem have created forks of the DataStax drivers but have not historically contributed changes back to the original drivers.
  • The dependency of the Apache Cassandra server itself on two of the DataStax drivers (Java and Python) has implications for the project's independence. This tension has been exacerbated by the release, in 2019, of version 4.0 of the Java Driver with major API changes. This was followed by a transition to maintenance mode of the 3.x series which is still used by the server and widely in the community.

Goals

Contributors employed by DataStax are offering to donate to the Apache Software Foundation all seven DataStax-funded drivers, currently hosted in the following GitHub repositories:

...

We think it is important to maintain the drivers together to retain cohesive API semantics and make sure they have similar functionality and feature support. It is therefore requested that all drivers be accepted eventually; see "Approach" and "Timeline" below for practicalities on the transfer operation itself, which does not need to be an all-at-once transfer.

Approach

Transferring large pieces of software like the drivers will require a strong level of coordination and involvement from designated members of both organizations involved in this operation (DataStax contributors, legal, and the ASF). 

...

Once the Java driver is transferred, and before the others are transferred, we will revise the methodology described in this CEP, and if necessary, revise its parameters and adjust them accordingly. A second CEP may be required if the changes to the methodology are found to be substantial.

Timeline

There are two phases to be considered:

  1. Java driver donation and transfer to the ASF: We believe that this should be executed after the Cassandra 4.0 GA release, in order to not disturb the current efforts towards this major milestone.
    1. The timeline should allow for the whole intellectual property clearance process to take place, see below "Intellectual property".
  2. Remaining 6 drivers donation and transfer to the ASF: TBD based on discoveries from 1.

Mailing list / Slack channels

Mailing list: 

...

Related JIRA tickets

JIRA(s): 

  • No existing Apache Cassandra Jira tickets relate to this CEP. However the following ticket can be mentioned for its historical relevance:

    • CASSANDRA-2761: Initial discussion around the removal of CQL drivers from the 0.8 branch.

    Also, see below "Proposed Changes" for proposed new Jira projects.



...

Motivation

By donating all its drivers to the ASF, we hope to:

...

We should however avoid the situation we had back in 2011. In particular, this CEP attempts to strike a balance between the need for independent stewardship for both drivers and server, especially for day-to-day work; while keeping a reasonable amount of common shared governance for high-level decisions (roadmap, common features, etc.).

Audience

The donation outlined in this CEP would be beneficial to the entire Cassandra community and ecosystem. 

...

  • Apache Cassandra committers and PMC members, as well as DataStax driver committers will likely be affected directly to some level, but hopefully such impacts will remain limited mostly to adaptation to new governing bodies and rules, and to communication channels: Jira, mailing lists, Slack, etc. – which will be monitored closely.
  • Apache Cassandra users in the broad sense should benefit from the donation by having a stronger community built around the main project. However users should not be affected by the practicalities of the change proposed in this document. In particular, we would like to minimize the disruption caused by the donation and avoid massive user-facing changes to any of the drivers or server APIs, and to the drivers release funnels. See below "New or Changed Public Interfaces" for a detailed discussion.

 

Proposed Changes

A. Governance

We think it is best to avoid creating a separate top-level Apache project, and suggest that the drivers should be included in Apache Cassandra as a single subproject under the governance of  the Apache Cassandra Committee.

...

However in order to avoid any risk of management overhead, we think that members should fully trust each other to only intervene in domains where they are knowledgeable, and therefore think that such groups are not necessary, at least for the initial transfer phase. This might of course be reviewed in the future.

B. Source Repositories

Each driver source repository will be transferred to a separate git repository, to be created. Our intention is to donate the entire Git repository of each driver, including all existing commits, branches and tags. 

...

  • If possible, we should keep the drivers hosted on GitHub; this way we could grant ownership of the current Github projects to the new GitHub organization, and redirects would be automatically created. This would reduce the disruption for those checking in the drivers codebase, or building them from the source.


C. Commitership

As stated above, we can reasonably assume that the current Apache Cassandra project contributors will not have all the expertise (or bandwidth) to develop drivers in seven different languages. 

...

Following is an initial list of individuals who have made meaningful contributions to drivers now or in the recent past:



Contributor

Relevant Driver Expertise

Olivier Michallat

Java

Alexandre Dutra

Java

Andrew Tolbert

Java, Node.js

Erik Merkle

Java

Greg Bestland

Java, Python

Tomasz Lelek

Java

Bret McGuire

Java

Adam Holmberg

Python, C++

Alan Boudreault

Python

Jim Witschey

Python

Jorge Bay Gondra

Node.js, C#

Joao Reis

C#

Michael Penick

C++, PHP

Michael Fero

C++, PHP

Sandeep Tamhankar

Ruby, C++, Java

Bulat Shakirzyanov

Ruby, PHP


The same people will need to hold credentials or be assigned owner status of the artifacts in package indices, such as Maven Central, PyPI, NPM and Nuget.

...

It is also worth noting that two drivers are currently considered in maintenance mode: PHP and Ruby. This is due mostly to their most active developers not being able to work on these drivers anymore; this situation is unfortunately not expected to change in the near future.

D. Mailing Lists

We suggest that the donated drivers should use the existing Apache Cassandra "user" and "developer" mailing lists, but distinct, per-driver lists for Jira notifications and commits. 

...

Migrating the whole email database seems impractical; we suggest that these groups be closed and users redirected to the "user" mailing list.

E. Issue Tracking

We suggest distinct Jira projects, one per driver, all to be created.

...

Migrating the whole Jira database seems intractable; we suggest that these groups be closed and users redirected to the new Jira projects.

F. Documentation

Documentation should move from docs.datastax.com to a new subsection in cassandra.apache.org/doc

...

Finally, some of the committers will likely need access to the documentation site in order to update the driver docs whenever necessary.

G. Versioning and Release cycle

Drivers will keep an independent release cycle and versioning scheme.

...

Future releases should be proposed, discussed and decided by mail threads on the developer mailing list.

H. Continuous Integration

For the indefinite future, DataStax will continue to test the drivers against Cassandra, DataStax Astra and DataStax Enterprise using existing, private CI infrastructure. Note that DataStax is  assessing the viability of making this CI infrastructure public but this is out of the scope of this CEP.

...

  • Drivers builds can take up to a few hours when the full integration suite is run against an extensive variety of Cassandra and DSE backends. This is currently done by Jenkins Pipelines multi-job builds.
  • Drivers use CCM (Cassandra local cluster manager, written in Python) and Simulacron (Cassandra protocol emulator, written in Java) extensively for their integration tests. The CI containers must have both libraries installed and available on the PATH. It is however not in the scope of this CEP to also donate CCM and/or Simulacron to the ASF.
  • Tests related to the DataStax cloud platform Astra also require a predefined Docker image containing a single-node Astra cluster and its proxy.
  • Some drivers require building against different platforms, including *nix, Windows, and MacOS.


I. Intellectual Property

As we are not advocating for the drivers to be donated as a separate project, the whole incubation procedure is not required. 

...

As the clearance document states, "the receiving PMC is responsible for doing the work".

New or Changed Public Interfaces

The existing Cassandra codebase should see no immediate changes, if the guidelines below for compatibility and migration are fully executed.

Compatibility, Deprecation, and Migration Plan

A. Driver APIs

In order to minimize the disruption caused to users by the donation, we suggest that all the following items remain unchanged:

...

We suggest that the word datastax be left as is both in the Java driver API and in artifact names in order to avoid disruption until a future major revision release. We have confirmed that this is viable in an inquiry to the Apache Legal group.

B. DataStax proprietary software

For a long time, DataStax maintained, for each programming language, two different driver flavors: one for Apache Cassandra, and another for DataStax Enterprise (DSE).

...

Note that some of the DataStax-specific features require external dependencies; and notably DSE Graph, which requires Apache Tinkerpop artifacts. In the case of the Java driver, this is a series of Maven artifacts that the driver declares as mandatory dependencies; however, it is smart enough to live without them if they are excluded. In the case of the other drivers, this is a dependency on the corresponding GLV, and can be excluded as well.

C. Apache Cassandra internal usage of the drivers

Apache Cassandra itself uses two drivers internally: Java 3.x and Python 3.x.

...

The Python driver is currently working on its new major version, 4.0. We are similarly offering to migrate the server codebase when the driver release is ready.

Test Plan

Drivers will be ported over ASF jenkins or CircleCI as described above and on a per project basis.

Rejected Alternatives

  • Continue to use Datastax OSS native drivers as they are today, developed out of datastax.oss.* repositories.
  • Create new drivers from scratch.