...
Pay attention, that both streams are use as examples for KStream (ie, record stream) and KTable (ie, changelog stream) with different semantics. For KTable, so-called tombstone records with format key:null
are of special interest, as they delete a key (those records are shown as null in all examples to highlight tombstone semantics). Last but not least, in Kafka Streams each join is "customized" by the user with a ValueJoiner
function that compute the actual result. Hence, we show output records as "X - Y" with X and Y being the left and right value, respectively, given to the value joiner. If the output is shown as null (ie, tombstone message), ValueJoiner
will not be called because a result record will be deleted.
New Join Semantics (current trunk
)
...
This is a symmetric non-window join. The basic semantics is a KTable lookup in the "other" stream for each KTable update. The result is a (continuously updating) KTable (ie, a changelog stream that can contain tombstone message with format format <key:null>
; those tombstone are shown as null belowas null in the result in contrast to results "X - null" indicating a valid join result with only one join partner). Pay attention, that the KTable lookup is done on the current KTable state, and thus, out-of-order records can yield non-deterministic result. Furthermore, in practice Kafka Streams does not guarantee that all records will be processed in timestamp order (even if processing records in timestamp order is the goal, it is only best effort).
ts | left | right | innerJoin | leftJoin | outerJoin |
1 | null |
|
|
| |
2 | null |
|
|
| |
3 | A |
| A - null | A - null | |
4 | a | A - a | A - a | A - a | |
5 | B | B - a | B - a | B - a | |
6 | b | B - b | B - b | B - b | |
7 | null | null | null | null - b | |
8 | null |
|
| null | |
9 | C |
| C - null | C - null | |
10 | c | C - c | C - c | C - c | |
11 | null | null | C - null | C - null | |
12 | null |
| null | null | |
13 | null |
|
|
| |
14 | d |
|
| null - d | |
15 | D | D - d | D - d | D - d |
Old Join Semantics (v0.10.1
and older)
...
ts | STREAM_1 (left) | STREAM_2 (right) | leftJoin |
1 | null | null - null | |
2 | null | ||
3 | A | A - null | |
4 | a | ||
5 | B | B - a | |
6 | b | ||
7 | null | null - b | |
8 | null | ||
9 | C | C - null | |
10 | c | ||
11 | null | ||
12 | null | null - null | |
13 | null | ||
14 | d | ||
15 | D | D - d |
KTable-KTable Join
This is a symmetric non-window join. The basic semantics is a KTable lookup in the "other" stream for each KTable update. The result is a (continuously updating) KTable (ie, a changelog stream that can contain tombstone message with format <key:null>
; those tombstone are shown as null in the result in contrast to results "X - null" indicating a valid join result with only one join partner). Pay attention, that the KTable lookup is done on the current KTable state, and thus, out-of-order records can yield non-deterministic result. Furthermore, in practice Kafka Streams does not guarantee that all records will be processed in timestamp order (even if processing records in timestamp order is the goal, it is only best effort).
ts | STREAM_1 (left) | STREAM_2 (right) | innerJoin | leftJoin | outerJoin |
1 | null | null | null | null | |
2 | null | null | null | null | |
3 | A | null | A - null | A - null | |
4 | a | A - a | A - a | A - a | |
5 | B | B - a | B - a | B - a | |
6 | b | B - b | B - b | B - b | |
7 | null | null | null | null - b | |
8 | null | null | null | null | |
9 | C | null | C - null | C - null | |
10 | c | C - c | C - c | C - c | |
11 | null | null | C - null | C - null | |
12 | null | null | null | null | |
13 | null | null | null | null | |
14 | d | null | null | null - d | |
15 | D | D - d | D - d | D - d |