Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • STREAM_1:   1:null, 3:A, 5:B 7:null, 9:C, 12:null, 15:D
  • STREAM_2:   2:null, 4:a, 6:b, 8:null, 10:c, 11:null, 13:null, 14:d

Pay attention, that both streams are use as examples for KStream (ie, record stream) and KTable (ie, changelog stream) with different semantics. For KTable, so-called tombstone records with format key:null are of special interest, as they delete a key.

New Join Semantics (current trunk)

...

This is a sliding window join, ie, all tuples that are "close" to each other with regard to time (ie, time difference up to window size) are joined. The result is a KStream. The table below shows the output (for each processed input record) for all three join variantvariants. Pay attention, that some input records do not produce output records.

...

This is an asymmetric non-window join. The basic semantics is a KTable lookup for each KStream record. The result is a KStream. Pay attention, that the KTable lookup is done on the current KTable state, and thus, out-of-order records can yield non-deterministic result. Furthermore, in practice Kafka Streams does not guarantee that all records will be processed in timestamp order (even if processing records in timestamp order is the goal, it is only best effort). The table below shows the output (for each processed input record) for both offered join variants. Pay attention, that some input records do not produce output records.

(suggested) (suggested to add)D

ts

STREAM_1 (left)

right

leftJoin (currentSTREAM_2 (right)

leftJoininnerJoin

1

null

 

null - null

  

2

 

null

 

  

3

A

 A - null

A - null

 

4

 

a

   

5

B

 B - aB - aB - a

6

 

b

   

7

null

 

null - b

  

8

 

null

 

  

9

C

 C - null

C - null

 

10

 

c

 

  

11

 

null

   

12

null

 

null - null

  

13

 

null

   

14

 

d

   

15

D

 D - dD - d

KTable-

...

KTable Join

This is a symmetric non-window join. The basic semantics is a KTable lookup in the "other" stream for each KTable update. The result is a KTable (ie, a changelog stream that can contain tombstone message with format <key:null>; those tombstone are shown as null below). Pay attention, that the KTable lookup is done on the current KTable state, and thus, out-of-order records can yield non-deterministic result. Furthermore, in practice Kafka Streams does not guarantee that all records will be processed in timestamp order (even if processing records in timestamp order is the goal, it is only best effort).

...

 

Old Join Semantics (v0.10.1 and older)

...

This is a sliding window join, ie, all tuples that are "close" to each other with regard to time (ie, time difference up to window size) are joined. The result is a KStream. The table below shows the output (for each processed input record) for all three join variantvariants.  Pay Pay attention, that some input records do not produce output records.

ts

STREAM_1 (left)

STREAM_2 (right)

innerJoin

leftJoin

outerJoin

1

null

  

null - null

null - null

2

 

null

  

null - null

3

A

  

A - null

A - null

4

 

a

A - a

 

A - a

5

B

 

B - a

B - a

B - a

6

 

b

A - b

B - b

 

A - b

B - b

7

null

 

null - a

null - b

null - a

null - b

null - a

null - b

8

 

null

A - null

B - null

 

A - null

B - null

9

C

 

C - a

C - b

C - a

C - b

C - a

C - b

10

 

c

A - c

B - c

C - c

 

A - c

B - c

C - c

11

 

null

A - null

B - null

C - null

 

A - null

B - null

C - null

12

null

 

null - a

null - b

null - c

null - a

null - b

null - c

null - a

null - b

null - c

13

 

null

A - null

B - null

C - null

 

A - null

B - null

C - null

14

 

d

A - d

B - d

C - d

 

A - d

B - d

C - d

15

D

 

D - a

D - b

D - c

D - d

D - a

D - b

D - c

D - d

D - a

D - b

D - c

D - d

KStream-KTable Join

This is an asymmetric non-window join. The basic semantics is a KTable lookup for each KStream record. The result is a KStream. Pay attention, that the KTable lookup is done on the current KTable state, and thus, out-of-order records can yield non-deterministic result. Furthermore, in practice Kafka Streams does not guarantee that all records will be processed in timestamp order (even if processing records in timestamp order is the goal, it is only best effort). The table below shows the output (for each processed input record) for both offered join variants. Pay attention, that some input records do not produce output records.

ts

STREAM_1 (left)

STREAM_2 (right)

leftJoin

1

null

  null - null

2

 

null

 

3

A

 A - null

4

 

a

 

5

B

 B - a

6

 

b

 

7

null

  null - b

8

 

null

 

9

C

 C - null

10

 

c

 

11

 

null

 

12

null

  null - null

13

 

null

 

14

 

d

 

15

D

 D - d

KTable-KTable join