Div

class	home-banner

Proposers

@rmahindraRajesh Mahindra
...

Approvers

Vinoth Chandar
Balaji Varadarajan
@
@<approver2 JIRA username> : [APPROVED/REQUESTED_INFO/REJECTED]
...

Status

Current state:

Current State

Status

title	Under Discussion

Status

colour	Yellow
title	In Progress

Status


colour	Red
title	ABANDONED

Status

colour	Green
title	Completed

Status


colour	Blue
title	INactive

...

To achieve this goal today, we can use the [deltastreamer](<https://hudi.apache.org/docs/writing_data/#deltastreamer>) tool provided with Hudi, which runs within the Spark Engine to pull records from Kafka, and ingests data to Hudi tables. Giving users the ability to ingest data via the Kafka connect framework has a few advantages. Current Connect users can readily ingest their Kafka data into Hudi tables, levering the power of Hudi's platform without the overhead of deploying a spark environment.

Background

<Introduce any much background context which is relevant or necessary to understand the feature and design choices.>

Implementation

...

To appreciate the design proposed in this RFC, it is important to understand Kafka Connect. It is a framework to stream data into and out of Apache Kafka. The core components of the Connect framework that are relevant to this RFC are connectors, tasks, and workers. The connector instance is a logical job that manages the copying of data from Kafka to another system. A connector instance manages a set of tasks that actually copy the data. Using multiple tasks allows for parallelism and scalable data copying. Connectors and tasks are logical execution units that are scheduled within workers. In distributed mode, workers are run across a cluster to provide scalability and fault tolerance. All workers can be configured with the same [group.id](<http://group.id>) and the connect framework automatically manages the execution of tasks across all available workers. As shown in the figure below, tasks are distributed across workers, and each task manages one or more distinct partitions of the Kafka topic.

Image Added

On system initialization, the workers rebalance the set of tasks so that each worker has a similar amount of work. Dynamically, the system may rebalance when the number of partitions or tasks changes. In addition, on the failure of a worker, the tasks are re-assigned to the other workers to ensure fault tolerance as shown in the figure below.

Image Added

Implementation

Rollout/Adoption Plan

<What impact (if any) will there be on existing users?>
<If we are changing behavior how will we phase out the older behavior?>
<If we need special migration tools, describe them here.>
<When will we remove the existing behavior?>

...

Space shortcuts

Page tree

Versions Compared

Old Version 5

New Version 6

Key

Table of Contents

Proposers

Approvers

Status

Background

Implementation

Implementation

Rollout/Adoption Plan

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 5

New Version 6

Key

Table of Contents

Proposers

Approvers

Status

Background

Implementation

Implementation

Rollout/Adoption Plan