Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note: This content was moved over from https://wiki.apache.org/hadoop/

...

GitAndHadoop


This page tells you how to work with Git. See See HowToContribute for instructions on building and testing Hadoop.

...

  1.  Git doesn't store changes, it snapshots the entire source tree. Good for fast switch and rollback, bad for binaries. (as an enhancement, if a file hasn't changed, it doesn't re-replicate it).
  2.  Git stores all "events" as SHA1 checksummed objects; you have deltas, tags and commits, where a commit describes the status of items in the tree.
  3.  Git is very branch centric; you work in your own branch off local or central repositories
  4.  You had better enjoy merging.


GitHub provide some good lessons on git at http://learn.github.com

Apache serves up read-only Git versions of their source at http://git.apache.org/. Committers can commit changes to writable Git repository. See https://wiki.apache.org/hadoop/HowToCommit

Checking out the source

You need a copy of git on your system. Some IDEs ship with Git support; this page assumes you are using the command line.

Clone a local Git repository from the Apache repository. The Hadoop subprojects (common, HDFS, and MapReduce) live inside a combined repository called `hadoop.git`.

...

  1. Create a GitHub login at http://github.com/ ; Add your public SSH keys
  2. Go to https://github.com/apache/hadoop/
  3. Click fork in the github UI. This gives you your own repository URL.
  4. In the existing clone, add the new repository:
Code Block
git remote add -f github git@github.com:MYUSERNAMEHERE/hadoop.git



This gives you a local repository with two remote repositories: origin and github. origin has the Apache branches, which you can update whenever you want to get the latest ASF version:

Code Block
 git checkout -b trunk origin/trunk
 git pull origin

...



Then generate a patch file listing the differences between your trunk and your branch

Code Block
git diff --no-prefix trunk > ../hadoop-patches/HDFS-775-1.patch



The patch file is an extended version of the unified patch format used by other tools; type {{{git help diff}}} to get more details on it. Here is what the patch file in this example looks like

...

like$ cat ../outgoing/HDFS-775-1.patch

Code Block

diff --git src/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java src/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
index 42ba15e..6383239 100644
--- src/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
+++ src/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
@@ -355,12 +355,14 @@ public class FSDataset implements FSConstants, FSDatasetInterface {
       return dfsUsage.getUsed();
     }

+    /**
+     * Calculate the capacity of the filesystem, after removing any
+     * reserved capacity.
+     * @return the unreserved number of bytes left in this filesystem. May be zero.
+     */
     long getCapacity() throws IOException {
-      if (reserved > usage.getCapacity()) {
-        return 0;
-      }
-
-      return usage.getCapacity()-reserved;
+      long remaining = usage.getCapacity() - reserved;
+      return remaining > 0 ? remaining : 0;
     }

     long getAvailable() throws IOException {

...

If your patch is not immediately accepted, do not be offended: it happens to us all. It introduces a problem: your branches become out of date. You need to check out the latest apache version, merge your branches with it, and then push the changes back to githubGitHub.

Code Block
 git checkout trunk
 git pull apache
 git checkout mybranch
 git merge trunk
 git push github mybranch

...


Pull down the latest release and verify that the patch branch is synchronized

Code Block
 git checkout trunk
 git pull apache
 git checkout mybranch
 git merge trunk
 git diff trunk



the The output of the last command should be nothing: the two branches should be identical. You can then prove to git that this is true by switching back to the trunk branch and merging in the branch, an operation which will not change the source tree, but update Gitgit's branch graph.

Code Block
 git checkout trunk
 git merge mybranch


Now you can delete the branch without being warned by git

Code Block
 git branch -d mybranch



Finally, propagate that deletion to your private github repository

Code Block
 git push github --delete mybranch

...