NOTE: Please do not make any edits to this cwiki page as the vote is concluded. (Except typo/grammar/clarifications)
Abstract
Apache Hadoop Ozone provides storage for Hadoop-based and Cloud-native environments.
It provides Object Storage semantics (like S3) and scales up to billions of objects.
It provides S3, Hadoop Compatible File System, and CSI interfaces.
The Hadoop community believes that further development on Ozone can be done better as a separate project as discussed on the Hadoop email lists
History
- Apache Hadoop Ozone development started on a feature branch of Hadoop repository (HDFS-7240)
- In October of 2017 a discussion has been started to merge it to the Hadoop main branch
- After a long discussion, it's merged to Hadoop trunk at the March of 2018
- During the discussion of the merge, it was suggested multiple times to create a separate project for the Ozone. But at that time:
- Ozone was tightly integrated with Hadoop/HDFS
- There was an action plan to use Block layer of Ozone (HDDS or HDSL at that time) as the block level of HDFS
- The community of Ozone was a subset of the HDFS community
The first beta release of Ozone was just released and it was agreed that the next release will be announced as GA release. It seems to be a good time before the first GA to make a decision about the future of the project
(Note: if you need more information about the history of the project, you can check the detailed version from the source repository)
Move to a separated, Apache TLP
During the last years, Ozone became more and more independent both at the community and the code side. The separation has been suggested again and again (for example by Owen [2] and Vinod [3])
From COMMUNITY point of view:
- Fortunately more and more new contributors are helping Ozone. Originally the Ozone community was a subset of HDFS project. But now a bigger and bigger part of the community is related to Ozone only.
- It seems to be easier to build the community as a separated project
- A new, younger project might have different practices (communication, committer criteria, development style) compared to old, mature project
- It's easier to communicate (and improve) these standards in a separated project with clean boundaries
- Separated project/brand can help to increase the adoption rate and attract more individual contributor (AFAIK it has been seen in Submarine after a similar move)
- Contribution process can be communicated more easily, we can make the first-time contribution easier
- Apache Board is monitoring community activities. As the volume of Ozone contributions are slightly more than the 50% of all the Hadoop contributions, it seems to be reasonable to follow community development and activity on Ozone level (today it's hard to differentiate between the Ozone part and core Hadoop part)
From CODE point of view Ozone became more and more independent:
- Ozone has a different release cycle
- Code is already separated from the Hadoop codebase (apache/hadoop-ozone.git)
- It has separated CI (GitHub actions)
- Ozone uses different (more strict) coding style (zero toleration of unit test / checkstyle errors)
- The code itself became more and more independent of Hadoop on Maven level. Originally it was compiled together with the in-tree latest Hadoop snapshot. Now it depends on released Hadoop artifacts (RPC, Configuration...)
- It starts to use multiple version of Hadoop (on client-side)
- The volume of resolved issues is already very high on Ozone side (Ozone had slightly more resolved issues than HDFS/YARN/MAPREDUCE/COMMON all together in the last 2-3 months)
Current Status of the project
Ozone is already part of a Top Level Apache Project and already created multiple separated releases that are approved by the Hadoop PMC and operating according to the Apache guidelines without any reported issues. As such a project it's already passed the Apache project maturity model. Therefore, in this section, we ignore some obvious statements which are a usual part of the project adoptions (Ozone source code is already part of the Apache, and it's already governed by Apache PMC) and focus on what project did so far for building a stronger community.
- Ozone is a new (sub) project and it's a top priority to make it more inclusive and contributor friendly.
- (moved) Ozone Community Calls are weekly calls between Ozone developers and users. This is the same call where the active developers sync with each other and open for anybody.
- The meeting minutes are posted either to the mailing list (or added to the wiki and references to the mailing list). To have an open and async conversation about all the related topics.
- Recently a new survey is initialized to make this call more inclusive. (The current time of this call is not China friendly, and we tried to identify the best way to include all the contributors from all the timezones)
- Based on the feedback we started to record the meetings and an additional APAC friendly sync is initiated.
- The pull request queue is monitored frequently and it's the number one priority to keep the number of open pull requests low. All the pull requests are looked into within a reasonable time.
- The only possible way to close/abandon a Pull request is when the author is not responsive. Pull requests are not closed due to inactivity if a committer action is required.
- The documentation and the developer process tries to make the development process easy and developer-friendly. A "newbie" label is added to issues in Jira that can be taken up by first-time contributors.
- The community contributions are monitored. And new contributors are proposed based on pure merits.
- Ozone is a distributed storage project and as is, the initial contribution can be hard. To make it easier to understand the main architecture decisions a new video series has been started for developers and/or users.
To make it easier to understand the earlier decisions, the design docs are added to the documentation page (from the next release https://github.com/apache/hadoop-ozone/tree/master/hadoop-hdds/docs/content/design). Ozone doesn't require a very formal proposal process (like Flink or Kafka), yet, but it's a continuous effort to make all the design discussions open and transparent.
Building a community is a continuous effort. We are at the beginning of a journey and moving Ozone to a separated TLP is a very important step. Some of the current challenges:
- The ratio of the paid developers: Today most of the full-time developers are paid by two companies: a vendor (Cloudera) and a user (Tencent). Based on the analyses of the Github contributions we see increasing numbers of new contributors from other companies and see increasing interest from other companies.
- One of the main goals for coming months is to make the community more diverse
PMC/Committers
Ozone as a new project requires an initial PMC and committer list. But first, we need to define how the lists are created/selected:
- PMC: As Ozone is a Hadoop subproject today, all the existing Hadoop PMCs with noticeable Ozone contributions are added to the initial list. (Definition of noticeable contribution: all the related GitHub / Jira content is downloaded, and we selected all the Hadoop PMCs with at least 30 comments AND/OR commits since the beginning of 2019.
- A discussion is started with these people about
- what are the important factors of being a committer / PMC.
- Who should be added to the initial list?
- Committer: similar to submarine Hadoop committers can get opt-in committer membership to the Ozone project (except PMC veto)
Some points which are named as an important factor of being PMC:
- Involvement in releases (being RM, validating and voting on releases, roadmap for future releases)
- Being involved constructively in design discussions, keeping the big picture, and project direction in mind.
- Investing in build/CI quality. Ensuring that contributors and committers have a solid infrastructure to develop the project.
- Responsiveness on security, trademark, copyright issues.
- Positive involvement in the community (mailing lists, raising committer candidates).
- Keeping an eye on what needs to go better in the project (documentation, test quality, wiki pages). A meta-view beyond regular contributions and releases.
It's also found especially important to include the user community to the project governance. End-users and adopters – who are actively helping with the projects with feedback during the design discussions – should be invited to the PMC (even without code contribution). (During the discussion they are called as "user-seats" in PMC)
The initial selection rules and PMC list is shared on the ozone-dev mailing list (people who are nominated in 2b are added explanatio) where additional methods are suggested (add everybody to the PMC who are Hadoop PMC and contributed at least 10 patches in this year) and accepted.
Proposed Chair:
- Sammi Chen [sammichen] (Hadoop PMC)
Proposed PMC (Hadoop PMC)
- Arpit Agarwal [arp] (member, Hadoop PMC)
- Shashikant Banerjee [shashikant] (Hadoop PMC)
- Li Cheng [licheng] (Hadoop committer)
- Dinesh Chitlangia [dineshc] (Hadoop committer)
- Clay Baenziger
- Attila Doroszlai [adoroszlai] (Hadoop committer)
- Junping Du [junping_du] (member, Hadoop PMC)
- Márton Elek [elek] (Hadoop PMC)
- Anu Engineer [aengineer] (Hadoop PMC)
- Uma Maheswara Rao G [umamahesh] (member, Hadoop PMC)
- Lokesh Jain [ljain] (Hadoop PMC)
- Hanisha Koneru [hanishakoneru] (Hadoop PMC)
- Yiqun Lin [yqlin] (Hadoop PMC)
- Siyao Meng [siyao] (Hadoop committer)
- Jitendra Nath Pandey [jitendra] (member, Hadoop PMC)
- Rakesh Radhakrishnan [rakeshr] (Hadoop PMC)
- Matt Sharp
- Mukul Kumar Singh [msingh] (Hadoop PMC)
- Tsz-Wo Nicholas Sze [szetszwo] (member, Hadoop PMC)
- Xiaoyu Yao [xyao] (Hadoop PMC)
- Nandakumar Vadivelu [nanda] (Hadoop PMC)
- Bharat Viswanadham [bharat] (Hadoop PMC)
- Siddharth Wagle [swagle] (Hadoop committer)
- Stephen O'Donnell [sodonnell] (Hadoop committer)
- Vivek Ratnavel Subramanian [vivekratnavel] (Hadoop committer)
- Aravindan Vijayan [avijayan] (Hadoop committer)
Proposed committer list
- Wei-Chiu Chuang [weichiu] (Hadoop PMC)
- István Fajth
- Nilotpal Nandi [nilotpalnandi] (Hadoop committer)
- Yisheng Lien [yisheng] (Hadoop committer)
- Baoloong Mao (github.com/maobaolong)
- Neo Yang (github.com/cku328)
- WeiWei Yang (wwei) (Hadoop committer)
- Jie Wang [runzhiwang]
- Xiang Zhang (github.com/iamabug)
- Micah Zhao (github.com/captainzmc)
- Masatake Iwasaki [iwasakims] (Hadoop committer)
- Prabhu Joseph [prabhujoseph] (Hadoop committer)
- Ayush Saxena [ayushsaxena] (Hadoop PMC)
- He Xiaoqiao [hexiaoqiao] (Hadoop committer)
- Surendra Singh Lilhore [surendralilhore] (Hadoop PMC)
- Vinayakumar B [vinayakumarb] (Hadoop PMC)
- Bibin A Chundatt [bibinchundatt] (Hadoop PMC)
- Hemanth Boyina [hemanthboyina] (Hadoop Committer)
- Hui Fei
- Lisheng Sun
Affiliations
Induviduals of the initial PMC are employed by Cloudera, Tencent, Bloomberg, Ebay, Target, SirionLabs. Cloudera and Tencent are known for employing significant number of full-time Ozone developers.
The committer list also contains induvidals employed by Microsoft, Huawei and others (including induvidal contributors.)
Required Resources
- Mailing lists: ozone-dev mailing list already exists it can be moved to a separated TLP domain
- Source code: Source code is already separated from the main Hadoop repository (apache/hadoop-ozone and apache/hadoop-docker-ozone). It can be moved easily to separated project
- Issue tracker: Ozone already uses separated Jira subproject (HDDS)
- Github repositories
- Rename apache/hadoop-ozone to apache/ozone
- Rename/move apache/hadoop-docker-ozone to apache/ozone-docker-release
1 Comment
Arpit Agarwal
I have moved a few folks to the initial PMC who have over 50+ contributions to Ozone with significant design/feature inputs. They deserve to be on the PMC by any objective measure. Please see the wiki history to see who I have moved. If there's any disagreements, let's discuss.