This page is meant to document the various steps to working with git to contribute or review Kafka code. There are probably a lot of bugs in these steps or possible better recipes, so help make this page better. If you want to push your commits without passwd, please see apache git wiki.
Overview
The Kafka project development ecosystem involves git (for version control), JIRA (for issue tracking) and Review Board (for reviewing code changes made by contributors). To make it easier for both the contributors and the reviewers to manage the contributions, the Kafka project also ships a (python based) script which automates the steps that are involved in the context of a patch submission. These steps involve:
- Creating a patch/diff between the local git repo against the project remote repo
- Creating a review task in Review Board and publish the patch/diff that was generated for the changes
- Updating the JIRA, related to these changes, with a comment about a patch being made available and ready for review at Review Board
As you'll notice this requires (automated) integration between JIRA and Review Board. The (python based) script, which is named kafka-patch-review.py (and present in the checked out code of Kafka project), acts as a wrapper around the scripts/tools that are shipped by JIRA and Review Board for such integrations. Since the kafka-patch-review.py is merely a wrapper around those tools, you'll have to install those tools locally to be able to use the kafka-patch-review.py script. This document helps you in setting up those tools as well as helping you understand the usage of the kafka-patch-review.py itself.
Patch Review Tool
We are currently in the process of moving our contribution workflow to Github pull requests, so the patch review tool is slowly getting phased out. If you are still interested in using it, or learning more about it, you can find the instructions here.
Contributor and Reviewer Workflow
The process for contributing or reviewing a patch is documented in the Contributing Code Changes page.
Commiter Workflow
If you are merging a patch attached to a JIRA (and not a github PR), here is a suggested workflow.
Github Workflow
Apache doesn't seem to provide a place to stash your work-in-progress branches or provide some of the nice social features github has. This can be a problem for larger features. Here are instructions for using github as a place to stash your work in progress changes.
Setting Up
1. As in the other workflows begin by checking out kafka (if you haven't already):
git clone https://git-wip-us.apache.org/repos/asf/kafka.git
This sets up the remote alias "origin" automatically which refers back to the Apache repo.
2. Create a new github repository on your github account to use for stashing changes. There are various ways to do this, I just forked the apache/kafka repo (https://github.com/apache/kafka) which creates a repo https://github.com/jkreps/kafka (where jkreps would be your user name).
3. Add an alias on your local repository to github to avoid typing:
git remote add github https://github.com/<your_user>/kafka.git
Now you can push either to origin or to github.
Doing Work
1. You can create a branch named xyz in your local repository and check it out
git checkout -b xyz remotes/origin/trunk
2. To set up a second machine to work on you can clone the github url.
3. To save your branch to your github repo do
git push github xyz
4. To pull these changes onto the other machine where you have a copy of the repository you can do:
git fetch github git checkout xyz git merge remotes/github/xyz
Review and pushing changes back to Apache works just as before.
Merging GitHub Pull Requests
This section documents the process for reviewing and merging code changes contributed via Github Pull Requests. It assumes you have a clone of Kafka's Git repository.
kafka-merge-pr.py
is a script that automates the process of accepting a code change into the project. It creates a temporary branch from apache/trunk
, squashes the commits in the pull request, rewrites the commit message in the squashed commit to follow a standard format including information about each original commit, merges the squashed commit into the temporary branch, pushes the code to apache/trunk
and closes the JIRA ticket. The push will then be mirrored to apache-github/trunk
, which will cause the PR to be closed due to the pattern in the commit message. Note that the script will ask the user before executing remote updates (ie git push and closing JIRA ticket), so it can still be used even if the user wants to skip those steps.
Setting Up
1. Add aliases for the remotes expected by the merge script (if you haven't already):
git remote add apache https://git-wip-us.apache.org/repos/asf/kafka.git git remote add apache-github https://github.com/apache/kafka.git
2. Install jira-python as described above.
Merging
Once the pull request is ready to be merged (it has been reviewed, feedback has been addressed, CI build has been successful and the branch merges cleanly into trunk):
1. Set the JIRA_USERNAME
and JIRA_PASSWORD
environment variables with the appropriate credentials if you intend to ask the script to close the issue associated with the pull request.
2. Run the merge script:
python kafka-merge-pr.py
3. Answer the questions prompted by the script.
How to get your patches reviewed
Please ping the dev mailing list if you have a patch that needs a review and it will be added to the queue. The following (JIRA link) are issues that currently have patches available and have an assigned reviewer: