You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Proposers

  • @rmahindra
  • ...

Approvers

  • @
  • @<approver2 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
  • ...

Status

Current state


Current State

UNDER DISCUSSION

(tick)

IN PROGRESS


ABANDONED


COMPLETED


INACTIVE


Discussion thread: here

JIRA: here

Released: <Hudi Version>

Abstract

The goal is to build a Kafka Connect Sink that can ingest/stream records from Apache Kafka to Hudi Tables. Since Hudi is a transaction based data lake platform, we have to overcome a few challenges to coordinate the transactions across the tasks and workers in the Kafka Connect framework. In addition, the Hudi platform runs multiple coordinated data and file management services and optimizations, that have to be coordinated with the write transactions.

To achieve this goal today, we can use the [deltastreamer](<https://hudi.apache.org/docs/writing_data/#deltastreamer>) tool provided with Hudi, which runs within the Spark Engine to pull records from Kafka, and ingests data to Hudi tables. Giving users the ability to ingest data via the Kafka connect framework has a few advantages. Current Connect users can readily ingest their Kafka data into Hudi tables, levering the power of Hudi's platform without the overhead of deploying a spark environment.

Background

<Introduce any much background context which is relevant or necessary to understand the feature and design choices.>

Implementation

<Describe the new thing you want to do in appropriate detail, how it fits into the project architecture. Provide a detailed description of how you intend to implement this feature.This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.>

Rollout/Adoption Plan

  • <What impact (if any) will there be on existing users?>
  • <If we are changing behavior how will we phase out the older behavior?>
  • <If we need special migration tools, describe them here.>
  • <When will we remove the existing behavior?>

Test Plan

<Describe in few sentences how the RFC will be tested. How will we know that the implementation works as expected? How will we know nothing broke?>







  • No labels