You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Abstract

Gravitino is a high-performance, geo-distributed, and federated metadata lake designed to manage metadata seamlessly across diverse data sources, vendors, and regions. Its primary goal is to provide users with unified metadata access for both data and AI assets.

Background

Gravitino addresses the growing need for multi-cloud, vendor-neutral solutions in the metadata management space. Recognizing the importance of data interoperability, Gravitino ensures compatibility across various cloud providers, stopping vendor lock-in, fostering an open and collaborative ecosystem.

Rationale

The demand for vendor-agnostic solutions is evident in today's tech landscape. Gravitino fulfills this need, offering a flexible and interoperable platform. As enterprises increasingly adopt diverse cloud environments, Gravitino's role in providing a unified metadata layer, under an open source license, is an attractive solution.

Current Status

Last year (2023), a startup company Datastrato, founded by three apache members, initiated the development of Gravitino, initially want to solve multi-cloud data silo issue. The scope of Gravitino is to build a unified metadata lake which consolidate different catalog systems for table and non-table data for analytics and AI workloads.

Having transitioned to open source a few months ago, Gravitino has made multiple releases, attracted contributors from industry-leading companies, such as Amazon, Apple, eBay, Pinterest, Tencent, XiaoMi, etc. Fortune 500 companies are actively participating in testing Gravitino and using it in production, underscoring its reliability, scalability and usefulness.

Meritocracy

Gravitino currently operates with a governance structure similar to established ASF projects. While some adjustments to the release process and committer selection process may be required, these changes are minor. The project already has several experienced ASF members among its contributors.

Community

Over the past several months, Gravitino has cultivated a diverse and vibrant community including contributions from developers across different backgrounds and vendors. The project's adoption by Fortune 500 companies shows its real-world applicability.

Core Developers

While the majority of the core development team hails from a single company. That company, Datastrato, has a commitment to open source and being sponsor of ASF and LF. The founders all have experience with open source and ASF projects, such as: Apache Hadoop, Spark, YuniKorn, etc.

Alignment

Gravitino leverages various ASF projects, including Apache Iceberg, Apache Hive, Apache Spark, and Apache Hadoop. This should foster collaboration within these Apache communities.

Known Risks

The project is developed in the modern age when usage of generative AI technology is very popular in most open source projects. Inevitably, some AI generated code could be involved during some contributors’ development. A fragment checker has been used to ensure no accidental inclusion of any incompatible third-party code.

The Gravitino Web UI project currently includes a dependency with an incompatible license (CC-BY-4.0). Plans are in place to address this licensing issue.

Project Name

An informal search has been conducted, revealing no significant trademark conflicts or clashes with existing open-source software names. The chosen name, Gravitino, comes from the idea that data has weight and can grow over time.

Orphaned Products

The commitment of the core developers to the project's longevity and community health minimizes the risk of abandonment and provides assurance of ongoing development and support. During incubation we hope to further widen our contributor base to decrease the risk of this happening.

Inexperience with Open Source

The project's composition includes individuals with extensive prior experience in open source software and ASF projects. This collective experience will ensure  a smooth transition into the ASF ecosystem.

Length of Incubation

Gravitino anticipates an incubation period of up to one year, during which it aims to grow its community base, enhance diversity among contributors, and align further with ASF practices.

Homogeneous Developers

Gravitino is committed to expanding its contributor base beyond its current single-company core developer team. Efforts during incubation will focus on fostering a more diverse community.

Reliance on Salaried Developers

While a majority of contributors are currently affiliated with Datastrato, the organization's foundation by open source enthusiasts ensures a genuine commitment to the principles of open source.

Relationships with Other Apache Products

Gravitino's integration with Apache Iceberg, Apache Hive, Apache Spark, and Apache Hadoop enhances its collaborative potential within the Apache ecosystem. The project looks forward to engaging with these communities to broaden its contributor base.

Excessive Fascination with the Apache Brand

Gravitino's interest in joining the ASF is rooted in its extensive use of ASF technologies, and its alignment with the principles and practices of the Apache communities. The focus is on collaboration and integration rather than promotion by being part of the ASF.

Documentation

Comprehensive information about Gravitino is available on its website (https://datastrato.ai/) and GitHub pages (https://github.com/datastrato/gravitino).

Initial Source

The source code for Gravitino can be found in the following repositories:

Source and Intellectual Property Submission Plan

Datastrato intends to submit a software grant for the mentioned GitHub repositories. The code is licensed under the Apache license or compatible licenses, and all significant contributors have agreed to provide their contributions under the Apache license.

The project uses AI generated code  in some places. A fragment checker has been used to ensure no accidental inclusion of any incompatible third-party code. 

External Dependencies

Gravitino's external dependencies align with the ASF license, as outlined in the LICENSE and NOTICE files. Plans are in place to rectify the Gravitino W eb UI's dependency with an incompatible license (CC-BY-4.0) promptly after entering the Incubator.

Required Resources

Mailing Lists

Git Repositories

Issue Tracking

The project will use GitHub for issue tracking.

Other Resources

Gravitino makes extensive use of GitHub actions. Recognizing the need for compliance with ASF's usage of GitHub actions, the project is prepared to make necessary adjustments during incubation.

Initial Committers

Sponsors

Initial committers paid to work on the project by Datastrato include:

  • TODO

Champion

Justin Mclean

Nominated Mentors

Jean-Baptiste Onofré (jbonofre@apache.org)

Junping Du (junping_du@apache.org)

Sponsoring Entity

We are expecting the Apache Incubator could sponsor this project.



  • No labels