Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It is a common data integration technique to capture RDMS database changes as they are made to data entities using Change Data Capture (CDC) platform and send these changes as messages to Kafka topics.

...

Gliffy Diagram
macroId93230620-dd07-435c-b0fb-e1502ffba955
displayNameDB Entities
nameDB Entities
pagePin3

Streams falls short here and the workaround (group by - join - lateral view) is not well supported as well and is not in line with the idea of record based processing.

In order to integrate RDBMS data entity with consumers using RDBMS->CDC->Kafka->KStreams->Consumer pipeline the following sequence of steps could be used:

  • Replicate each entity table to its own topic with CDC. Typically this is only possible using Table PK as Kafka Message Key. We will call these topics "Data Topics" here.
  • Create Kafka Stream KTables from all "Data Topics"
  • Publish business events that trigger data entity extraction from Kafka topics to its own topic. We can call this topic "Trigger Topic" here
  • Use existing stream/table join feature in Kafka Streams that will join "Trigger Topic" based stream with parent table of data entity (Orders) in our example
  • Use new Kafka Streams feature proposed in this KIP-955 to join the stream resulting from the previous step with the rests of KTables using left Foreign Key Join

This KIP makes data aggregation semantic consistent between SQL and Kafka StreamsThis KIP makes relational data liberated by connection mechanisms far easier for teams to use, smoothing a transition to natively-built event-driven services.

Public Interfaces

Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.

...