Discussion thread
Vote thread
ISSUE
Release	0.6

Motivation

Background

In data pipelines built with Flink and CDC technologies, there is often a requirement to enrich data in real time by merging multiple tables into a single wide table. Consider the scenario with three interconnected tables A, B, and C, forming a join chain:

...

Our goal is to update the aggregate view of tables A, B, and C in real time.

...

Flink Dual-Stream Join: Utilizing Flink for Join Operations Between Two Data Streams

Image Added

Core Challenge:

Growth of State Storage: When performing real-time stream joins, Flink needs to maintain a state that holds the data pending to be joined. If the data arrival rate of one stream far exceeds that of the other, this state can grow significantly large, leading to storage and performance issues.

Flink Lookup Join: Flink's Method for Searching and Joining Data

Image Added

Core Challenge:

The traditional lookup join approach is limited by its sole response to changes in the main data stream. This means that if a related dimension table (such as B or C) undergoes changes, the data that has already been joined cannot be dynamically updated. In other words, a lookup operation is only triggered when a new event enters the main stream, and updates in dimension tables do not result in changes to the join results.

New Strategy: Dynamic Dimension Table-Driven Lookup Join

...

Page tree

Versions Compared

Old Version 9

New Version 10

Key

Motivation

Background

Flink Dual-Stream Join: Utilizing Flink for Join Operations Between Two Data Streams

Flink Lookup Join: Flink's Method for Searching and Joining Data

Image Added

Core Challenge:

New Strategy: Dynamic Dimension Table-Driven Lookup Join

Page tree

Page History

Versions Compared

Old Version 9

New Version 10

Key

Motivation

Background

Flink Dual-Stream Join: Utilizing Flink for Join Operations Between Two Data Streams

Flink Lookup Join: Flink's Method for Searching and Joining Data

Image Added

Core Challenge:

New Strategy: Dynamic Dimension Table-Driven Lookup Join