Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Kafka connect calls the PRECOMMIT periodically (the interval can be configured by user) to commit the last Kafka offset. The offset is returned by TransactionParticipant , which is updated based on the written records that were committed. Since Kafka offsets are committed as Hudi files are committed, we suggest setting the interval for PRECOMMIT similar to the transaction intervals.

Implementation


Rollout/Adoption Plan

  • <What impact (if any) will there be on existing users?>
  • <If we are changing behavior how will we phase out the older behavior?>
  • <If we need special migration tools, describe them here.>
  • <When will we remove the existing behavior?>

Test Plan

<Describe in few sentences how the RFC will be tested. How will we know that the implementation works as expected? How will we know nothing broke?>We have validated the working of the protocol by building a PoC. In the current PoC, we have not integrated with the Hudi Write Client, but we have implemented the transaction protocol within the Connect platform. We have implemented a Simple File Writer that mimics the Hudi writer, and have validated that no duplicate or missing records were found. We also tested for cases of worker failures which caused either the Coordinator instance to fail and restart or caused one or more Participant instances to get re-assigned to another worker.