Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

THIS IS STILL WIP

Abstract

Submarine is a project which allows infra engineers/data scientists to build deep learning applications (TensorFlow, PyTorch, etc.) end to end on cluster management platforms (like YARN).

...

The Hadoop community believes that further development on Submarine can be done better as a separate project as discussed on the Hadoop email lists (here [submarine-dev@See: https://lists.apache.org/thread.html/70cd31946bdffaef6f83bfd32da031d2eb2710a263da27907914491a@%3Cyarn-dev.hadoop.apache.org]org%3E).

Proposal

Although Submarine was originally developed in Apache Hadoop, there are several forces that are encouraging it to move to a separate project:

...

The traditional path at Apache would have been to create an incubator project, but the code is already being released by Apache and most of the developers are familiar with Apache rules and guidelines. In particular, the proposed PMC has [4] Apache members and incubator PMC members from three companies2 Apache TLP PMCs and proposed initial committers have 4 Apache TLP PMCs (from 3 different companies). They will provide oversight and guidance for the developers that are less experienced in the Apache Way. Therefore, the Submarine project would like to propose becoming a Top Level Project at Apache.

...

Submarine’s development team seeks to foster the development and user communities. We feel that becoming a separate project will improve both communities by being smaller and more focused than Hadoop and bring tighter integration with various Apache projects and other open source projects that either doesn’t want to or can’t accept the large list of dependencies from Hadoop.

Core Developers

Hadoop Submarine is being primarily developed by Cloudera, NetEase, LinkedIn, JD, Dahua and , Ke.com., Facebook, Alibaba

Alignment

The ASF is a natural host for Submarine given that it is already the home of Hadoop, Spark, Hive, Arrow and other emerging distributed computing software projects. Submarine was designed to offer improved user experiences of deep-learning/machine-learning model training, serving, management which can be part of big data pipeline and leverages the power of Apache Spark, Apache Arrow, Apache Zeppelin, etc.

...

The potential PMC of the new project has extensive experience with Apache projects and includes [TODO number] Apache members and Incubator PMC members. The Submarine PMC and the more 2 PMC members from Apache top level projects, and the potential initial committers of Submarine have 7 committers from 3 different Apache top level projects. These experienced committers will be responsible for training the committers that are less familiar with the Apache Way.

...

The developers include employees from Cloudera, Netease, NetEase, LinkedIn, JD, Dahua, Ke.com, Facebook, Alibaba TODO. Apache projects encourage an open and diverse meritocratic community and Submarine team is very motivated to increase the size and diversity of the development team.

...

Git is the preferred source control system. We need a separate Git repository after spinoff to a new Apache TLP, Hadoop community has already voted to move Submarine source code to a separate Git repo, which is tracked by:

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyINFRA-18964
. Existing Submarine source code will be moved to the new created repo.

Issue Tracking

Submarine already has a separate JIRA instance (SUBMARINE-) to track issues.

...

PMC/Committers

Initial PMC

  • Wangda Tan (wangda at apache dot org) (Hadoop PMC)
  • Xun Liu (liuxun at apache dot org) (Zeppelin Committer)
  • Sunil Govind (sunilg at apache dot org) (Hadoop PMC)
  • Zhankun Tang (ztang at apache dot org) (Hadoop Committer)
  • Zac Zhou

...

  • (zhouquan at apache dot org) (Hadoop Committer)
  • Owen O'Malley (omalley at apache dot org) (Ambari PMC, Apsite Committer, Bigtop Committer, Chukwa Committer, Giraph PMC, Hadoop PMC, Hawq Committer, Helix PMC, Hive PMC, Iceberg Committer, Incubator PMC, Kafa Committer, Knox PMC, Kylin PMC, member, Metron Committer, Orc PMC, PMC Chairs, Ranger PMC, Reef, Tez PMC)

We’d like to propose Wangda Tan as the initial VP for the Submarine project.

Initial Committers

...

  • Szilard Nemeth (snemeth at apache dot org) (Hadoop Committer)
  • Jeff Zhang (zjffdu at apache dot org) (Member, Incubator,  Livy Committer, Pig Committer, Tez PMC, Zeppelin PMC)
  • Yanbo Liang (yliang at apache dot org) (Spark PMC)
  • Naganarasimha Garla (naganarasimha_gr at apache dot org) (Hadoop PMC)

  • Devaraj K (devaraj at apache dot org) (Hadoop PMC)

  • Rakesh Radhakrishnan (rakeshr at apache dot org) (bookkeeper PMC, Hadoop PMC, incubator, Mnemonic PMC, Zookeeper PMC)

  • Vinayakumar B (vinayakumarb at apache dot org) (Hadoop PMC, incubator PMC)

  • Ayush Saxena (ayushsaxena at apache dot org) (Hadoop Committer)

  • Bibin Chundatt (bibinchundatt at apache dot org) (Hadoop PMC)

  • Bharat Viswanadham (bharat at apache dot org) (Hadoop PMC)

  • Brahma Reddy Battula (brahma at apache dot org)) (Hadoop PMC)

  • Abhishek Modi (abmodi at apache dot org) (Hadoop Committer)

  • Wei-Chiu Chuang (weichiu at apache dot org) (Hadoop PMC)

  • Junping Du (junping_du at apache dot org) (Hadoop PMC, member) 

  • Rohith Sharma K S (rohithsharmaks@apache.org) (Hadoop PMC)

  • Zhe Zhang (zhz@apache.org) (Hadoop PMC)

  • Sammi Chen (sammichen@apache.org) (Hadoop PMC)

  • Jian He (jianhe@apache.org) (Hadoop PMC) 

  • Varun Saxena (varunsaxena@apache.org) (Hadoop PMC) 

  • Chen Liang (cliang@apache.org) (Hadoop Committer)

  • Plus All initial PMC members

Affiliations

The initial PMC is employed at Cloudera, NetEase TODO, LinkedIn

The initial committers are employed by Cloudera, NetEase, Alibaba TODOLinkedIn, Alibaba, Facebook, Tencent, Intel, Huawei

For anybody who wants to be included in this list, please let us know publicly during the proposal voting time.