Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Proposers

  • @rmahindra
  • ...

Approvers

  • @<approver1 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]@
  • @<approver2 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
  • ...

...

Released: <Hudi Version>

Abstract

The goal is to build a Kafka Connect Sink that can ingest/stream records from Apache Kafka to Hudi Tables. Since Hudi is a transaction based data lake platform, we have to overcome a few challenges to coordinate the transactions across the tasks and workers in the Kafka Connect framework. In addition, the Hudi platform runs multiple coordinated data and file management services and optimizations, that have to be coordinated with the write transactions.

To achieve this goal today, we can use the [deltastreamer](<https://hudi.apache.org/docs/writing_data/#deltastreamer>) tool provided with Hudi, which runs within the Spark Engine to pull records from Kafka, and ingests data to Hudi tables. Giving users the ability to ingest data via the Kafka connect framework has a few advantages. Current Connect users can readily ingest their Kafka data into Hudi tables, levering the power of Hudi's platform without the overhead of deploying a spark environment.<Describe the problem you are trying to solve and a brief description of why it’s needed>

Background

<Introduce any much background context which is relevant or necessary to understand the feature and design choices.>

...