Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Be online for that first build, on a good network 
  2. To set the Maven proxy setttings, see http://maven.apache.org/guides/mini/guide-proxies.html 

  3. Because Maven doesn't pass proxy settings down to the Ant tasks it runs HDFS-2381 some parts of the Hadoop build may fail. The fix for this is to pass down the Ant proxy settings in the build Unix: mvn $ANT_OPTS; Windows: mvn %ANT_OPTS%. 

  4. Tomcat is always downloaded, even when building offline. Setting -Dtomcat.download.url to a local copy and -Dtomcat.version to the version pointed to by the URL will avoid that download.

Native libraries

...

If you are failing to fetch a artifact from the external maven repository, you may need

...

to delete the related files from your local cache (i.e. ~/.m2 directory).

  • Ref: 
    Jira
    serverASF JIRA
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyHADOOP-16577

Native libraries

On Linux, you need the tools to create the native libraries: LZO headers,zlib headers, create the native libraries: LZO headers,zlib headers, gcc, OpenSSL headers, cmake, protobuf dev tools, and libtool, and the GNU autotools (automake, autoconf, etc).  

...

Before you start, send a message to the Hadoop developer mailing list, or file a bug report in Jira. Describe your proposed changes and check that they fit in with what others are doing and have planned for the project. Be patient, it may take folks a while to understand your requirements. If you want to start with pre-existing issues, look for Jiras labeled newbie. You can find them using this filter.  

...

  • All public classes and methods should have informative Javadoc comments. 

    • Do not use @author tags. 
  • Code must be formatted according to Sun's conventions, with one exception: 

    • Indent two spaces per level, not four. 
  • Code formatter xml is present here: https://github.com/apache/hadoop/tree/trunk/dev-support/code-formatter . IntelliJ users can directly import hadoop_idea_formatter.xml
  • Contributions must Contributions must pass existing unit tests. 
    • New unit tests should be provided to demonstrate bugs and fixes. JUnit is our test framework: 

    • You must implement a class that uses @Test annotations for all test methods. Please note, Hadoop uses JUnit v4. 

    • Define methods within your class whose names begin with test, and call JUnit's many assert methods to verify conditions; these methods will be executed when you run mvn test. Please add meaningful messages to the assert statement to facilitate diagnostics. 

    • By default, do not let tests write any temporary files to /tmp. Instead, the tests should write to the location specified by the test.build.data system property. 

    • If a HDFS cluster or a MapReduce/YARN cluster is needed by your test, please use org.apache.hadoop.dfs.MiniDFSCluster and org.apache.hadoop.mapred.MiniMRCluster (or org.apache.hadoop.yarn.server.MiniYARNCluster), respectively. TestMiniMRLocalFS is an example of a test that uses MiniMRCluster. 

    • Place your class in the src/test tree. 

    • TestFileSystem.java and TestMapRed.java are examples of standalone MapReduce-based tests. 

    • TestPath.java is an example of a non MapReduce-based test. 

    • You can run all the project unit tests with mvn test, or a specific unit test with mvn -Dtest=<class name without package prefix> test. Run these commands from the hadoop-trunk directory. 

  • If you modify the Unix shell scripts, see the UnixShellScriptProgrammingGuide.  

...

Please also check the javadoc.  

 mvn process-sources javadoc:javadoc-no-fork
 firefox target/site/api/index.html

...

Jenkins includes a javadoc run on Java 8 and Java 11, it will fail if there are unbalanced HTML tags or <p/> clauses (use <p> here.  

If Jenkins rejects a patch due to Java 8 or Java 11 javadoc failures, it is considered an automatic veto for the patch.

Provide a patch

There are two patterns to provide a patch.

  • Create and attach a diff in ASF JIRA
  • Create a pull request in GitHub

Creating a patch

Check to see what files you have modified with:

 git status

Add any new files with:

 git add src/.../MyNewClass.java
 git add src/.../TestMyNewClass.java

In order to create a patch, type (from the base directory of hadoop):  

 git diff trunk...HEAD > HADOOP-1234.patch

This will report all modifications done on Hadoop sources on your local disk and save them into the HADOOP-1234.patch file. Read the patch file. Make sure it includes ONLY the modifications required to fix a single issue.  

Please do not:

  • reformat code unrelated to the bug being fixed: formatting changes should be separate patches/commits.
  • comment out code that is now obsolete: just remove it.
  • insert comments around each change, marking the change: folks can use git to figure out what's changed and by whom.
  • make things public which are not required by end users.

Please do:  

  • try to adhere to the coding style of files you edit; 
  • comment code whose function or rationale is not obvious; 
  • update documentation (e.g., package.html files, this wiki, etc.)  

If you need to rename files in your patch:  

  1. Write a shell script that uses 'git mv' to rename the original files. 
  2. Edit files as needed (e.g., to change package names). 
  3. Create a patch file with 'git diff trunk'. 
  4. Submit both the shell script and the patch file.  

This way other developers can preview your change by running the script and then applying the patch.  

Naming your patch

Patches for trunk should be named according to the Jira, with a version number: <jiraName>.<versionNum>.patch, e.g. HADOOP-1234.001.patch, HDFS-4321.002.patch.  

Patches for a non-trunk branch should be named <jiraName>-<branchName>.<versionNum>.patch, e.g. HDFS-1234-branch-2.003.patch. The branch name suffix should be the exact name of a git branch, such as "branch-2". Jenkins will check the name of the patch and detect the appropriate branch for testing.  

Please note that the Jenkins precommit build will not run against branches that use ant.  

It's also OK to upload a new patch to Jira with the same name as an existing patch. If you select the "Activity>All" tab then the different versions are linked in the comment stream, providing context. However, many reviewers find it helpful to include a version number in the patch name (three-digit version number is recommended), so including a version number is the preferred style.  

NOTE: Our Jenkins configuration uses Apache Yetus. More advanced patch file names are documented on their patch names page.  

Creating a GitHub pull request

Create a pull request in https://github.com/apache/hadoop.  

You need to set the title which starts with the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)  Jenkins precommit job will search the corresponding GitHub pull request and apply the diff automatically.  If there is a corresponding pull request, you don't need to attach a patch in this issue because the precommit job always runs on pull request instead of the attached patch.

If there is no corresponding issue, please create a issue in ASF JIRA before creating a pull request, or comment the patch URL (https://github.com/apache/hadoop/pull/XXX.patch) to the JIRA issue. This is because Jenkins precommit job will search the URL that starts with "https://github.com" and ends with ".patch" in the JIRA issue.

Testing your patch

Before submitting your patch, you are encouraged to run the same tools that the automated Jenkins patch test system will run on your patch. This enables you to fix problems with your patch before you submit it. The dev-support/bin/test-patch script in the trunk directory will run your patch through the same checks that Jenkins currently does except for executing the unit tests. (See TestPatchTips for some tricks.)  

Run this command from a clean workspace (ie git status shows no modifications or additions) as follows:  

 dev-support/bin/test-patch [options] patch-file | defect-number

At the end, you should get a message on your console that is similar to the comment added to Jira by Jenkins's automated patch test system, listing +1 and -1 results. Generally you should expect a +1 overall in order to have your patch committed; exceptions will be made for false positives that are unrelated to your patch. The scratch directory (which defaults to the value of ${user.home}/tmp) will contain some output files that will be useful in determining cause if issues were found in the patch.  

Some things to note:  

  • the optional cmd parameters will default to the ones in your PATH environment variable 

  • the grep command must support the -o flag (Both GNU grep and BSD grep support it) 

  • the patch command must support the -E flag  

Run the same command with no arguments to see the usage options.  

Applying a patch

To apply a patch either you generated or found from JIRA, you can issue  

 git apply -p0 --verbose cool_patch.patch

If you are an Eclipse user, you can apply a patch by :  

  1. Right click project name in Package Explorer 
  2. Team -> Apply Patch  

Changes that span projects

You may find that you need to modify both the common project and MapReduce or HDFS. Or perhaps you have changed something in common, and need to verify that these changes do not break the existing unit tests for HDFS and MapReduce. Hadoop's build system integrates with a local maven repository to support cross-project development. Use this general workflow for your development:  

  • Make your changes in common 
  • Run any unit tests there (e.g. 'mvn test') 
  • Publish your new common jar to your local mvn repository:

     hadoop-common$ mvn clean install -DskipTests
  • A word of caution: mvn install pushes the artifacts into your local Maven repository which is shared by all your projects. 

  • Switch to the dependent project and make any changes there (e.g., that rely on a new API you introduced in hadoop-common). 
  • Finally, create separate patches for your common and hdfs/mapred changes, and file them as separate JIRA issues associated with the appropriate projects.  

Contributing your work

You need to create a pull request in https://github.com/apache/hadoop. Now attaching a patch in ASF JIRA does not work. You need to set the title which starts with the corresponding JIRA issue number (e.g. HADOOP-XXXXX. Fix a typo in YYY.) to integrate with the issue. If there is no corresponding issue, please create an issue in ASF JIRA before creating a pull request.

See also: GitHub Integration

Testing your patch

Before submitting your patch, you are encouraged to run the same tools that the automated Jenkins patch test system will run on your patch. This enables you to fix problems with your patch before you submit it. The dev-support/bin/test-patch script in the trunk directory will run your patch through the same checks that Jenkins currently does except for executing the unit tests. (See TestPatchTips for some tricks.)  

Run this command from a clean workspace (ie git status shows no modifications or additions) as follows:

 dev-support/bin/test-patch [options] patch-file | defect-number

At the end, you should get a message on your console that is similar to the comment added to Jira by Jenkins's automated patch test system, listing +1 and -1 results. Generally you should expect a +1 overall in order to have your patch committed; exceptions will be made for false positives that are unrelated to your patch. The scratch directory (which defaults to the value of ${user.home}/tmp) will contain some output files that will be useful in determining cause if issues were found in the patch.  

Some things to note:

  • the optional cmd parameters will default to the ones in your PATH environment variable 

  • the grep command must support the -o flag (Both GNU grep and BSD grep support it) 

  • the patch command must support the -E flag  

Run the same command with no arguments to see the usage options.  

Changes that span projects

You may find that you need to modify both the common project and MapReduce or HDFS. Or perhaps you have changed something in common, and need to verify that these changes do not break the existing unit tests for HDFS and MapReduce. Hadoop's build system integrates with a local maven repository to support cross-project development. Use this general workflow for your development:  

  • Make your changes in common 
  • Run any unit tests there (e.g. 'mvn test') 
  • Publish your new common jar to your local mvn repository:

     hadoop-common$ mvn clean install -DskipTests
  • A word of caution: mvn install pushes the artifacts into your local Maven repository which is shared by all your projects. 

  • Switch to the dependent project and make any changes there (e.g., that rely on a new API you introduced in hadoop-common). 
  • Finally, create separate patches for your common and hdfs/mapred changes, and file them as separate JIRA issues associated with the appropriate projects.  

Contributing your work

  1. Please note that the commits in the GitHub PR should be granted license to ASF for inclusion in ASF works (as per the Apache License §5). 

  2. Folks should run mvn clean install javadoc:javadoc checkstyle:checkstyle before opening PR

  3. Finally, patches should be attached to an issue report in Jira via the Attach File link on the issue's Jira. Please add a comment that asks for a code review following our code review checklist. Please note that the attachment should be granted license to ASF for inclusion in ASF works (as per the Apache License §5). 

  4. When you believe that your patch is ready to be committed, select the Submit Patch link on the issue's Jira. Submitted patches will be automatically tested against "trunk" by Jenkins, the project's continuous integration engine. Upon test completion, Jenkins will add a success ("+1") message or failure ("-1") to your issue report in Jira. If your issue contains multiple patch versions, Jenkins tests the last patch uploaded. It is preferable to upload the trunk version last. 

  5. Folks should run mvn clean install javadoc:javadoc checkstyle:checkstyle before selecting Submit Patch. 

    1. Tests must all pass. 
    2. Javadoc should report no warnings or errors. 

    3. The Javadoc on java 8 must not fail. 
    4. Checkstyle's error count should not exceed that listed at lastSuccessfulBuild/artifact/trunk/build/test/checkstyle-errors.html 
  6. Jenkins's tests are meant to double-check things, and not be used as a primary patch tester, which would create too much noise on the mailing list and in Jira. Submitting patches that fail Jenkins testing is frowned on, (unless the failure is not actually due to the patch). much noise on the mailing list and in PR.
  7. If your patch involves performance optimizations, they should be validated by benchmarks that demonstrate an improvement. 
  8. If your patch creates an incompatibility with the latest major release, then you must set the Incompatible change flag on the issue's Jira 'and' fill in the Release Note field with an explanation of the impact of the incompatibility and the necessary steps users must take. 

  9. If your patch implements a major feature or improvement, then you must fill in the Release Note field on the issue's Jira with an explanation of the feature that will be comprehensible by the end user.  

...

Please be patient. Committers are busy people too. If no one responds to your patch after a few days, please make friendly reminders. Please incorporate other's suggestions into your patch if you think they're reasonable. Finally, remember that even a patch that is not committed is useful to the community.  

...

.

...

Submitting patches against object stores such as Amazon S3, OpenStack Swift and Microsoft Azure

...

 hadoop-common-project/hadoop-common/src/test/resources/contract-test-options.xml
 hadoop-tools/hadoop-openstack/src/test/resources/contract-test-options.xml
 hadoop-tools/hadoop-aws/src/test/resources/auth-keys.xml
 hadoop-tools/hadoop-aws/src/test/resources/contract-test-options.xml
 hadoop-tools/hadoop-azure/src/test/resources/azure-auth-keys.xml

Please state which infrastructures you have tested against, —including which regions you tested against. If you have not tested the patch yourself, do not expect anyone to look at the patch.  

...

-keys.xml

Please state which infrastructures you have tested against, —including which regions you tested against. If you have not tested the patch yourself, do not expect anyone to look at the patch.  

We welcome anyone who can test these patches: please do so and again, declare what you have tested against. That includes in-house/proprietary implementations of the APIs as well as public infrastructures.  

Requesting for a Jira account

If you wish to contribute to Hadoop and require a Jira account, such requests can be made via: https://selfserve.apache.org/jira-account.html

...

Note:

  • Jira account is required only if someone has to report a bug or plan to contribute, other cases are unnecessary to request account.
  • Serving such requests can take time upto one week or even more during holidays.
  • Please refrain from sending the form multiple times or sending follow-ups on the mailing lists.

Jira Guidelines

Please comment on issues in Jira, making their concerns known. Please also vote for issues that are a high priority for you.  

...