Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A pre-KIP-320 consumer fetches offset 2 from broker B will get msg 2 (offset 2, LE-0).

Follower to leader transition

A follower can be considered as a leader by controller based on its replica configuration. When a follower becomes a leader it needs to findout the offset from which the segments to be copied to remote storage. This is found by traversing from the the latest leader epoch from leader epoch history and find the highest offset of a segment with that epoch copied into remote storage. If it can not find an entry then it checks for the previous leader epoch till it finds an entry, If there are no entries till the earliest leader epoch in leader epoch cache then it starts copying the segments from the earliest epoch entry’s offset.


Broker A (Leader)

Broker B (Follower)

Remote Storage

RL metadata storage

0: msg 0 LE-0

1: msg 1 LE-0

2: msg 2 LE-0

3: msg 3 LE-1

4: msg 4 LE-1

5: msg 5 LE-1

6: msg 6 LE-2 (HW)

7: msg 7 LE-2 

8: msg 8 LE-2 



leader_epochs

LE-0, 0

LE-1, 3

LE-2, 6

0: msg 0 LE-0

1: msg 1 LE-0

2: msg 2 LE-0

3: msg 3 LE-1

4: msg 4 LE-1

5: msg 5 LE-1

6: msg 6 LE-2 (HW)





leader_epochs

LE-0, 0

LE-1, 3

LE-2, 6



seg-0-2, uuid-1

  log:

  0: msg 0 LE-0

  1: msg 1 LE-0

  2: msg 2 LE-0

  epochs:

  LE-0, 0


seg 3-4, uuid-2

  log:

  3: msg 3 LE-1

  4: msg 4 LE-1

  epochs:

  LE-0, 0

  LE-1, 3



seg-0-2, uuid-1

Segment epochs

LE-0, 0





seg-3-4, uuid-2

Segment epochs

 LE-1, 3


Step 2: 

Broker A is crashed/stopped and Broker B became a leader. It checks from leader epoch-2 whether there are any segments and it traverses back till it finds a segment for the leader epoch. In this case it finds offset 4 for leader epoch-1 from RLMM. It needs to copy segments containing offset 5. So, it starts copying from the “seg-4-6” segment. 

Broker A (Stopped)

Broker B (Leader)

Remote Storage

RL metadata storage


0: msg 0 LE-0

1: msg 1 LE-0

2: msg 2 LE-0

3: msg 3 LE-1

4: msg 4 LE-1

5: msg 5 LE-1

6: msg 6 LE-2 (HW)

7: msg 7 LE-2 

8: msg 8 LE-2 



leader_epochs

LE-0, 0

LE-1, 3

LE-2, 6


0: msg 0 LE-0

1: msg 1 LE-0

2: msg 2 LE-0

3: msg 3 LE-1

4: msg 4 LE-1

5: msg 5 LE-1

6: msg 6 LE-2 (HW)

7: msg 8 LE-3




leader_epochs

LE-0, 0

LE-1, 3

LE-2, 6

LE-3, 7



seg-0-2, uuid-1

  log:

  0: msg 0 LE-0

  1: msg 1 LE-0

  2: msg 2 LE-0

  epochs:

  LE-0, 0


seg-3-4, uuid-2

  log:

  3: msg 3 LE-1

  4: msg 4 LE-1

  epochs:

  LE-0, 0

  LE-1, 3


Seg-4-6, uuid-3

4: msg 4 LE-1

5: msg 5 LE-1

6: msg 6 LE-2

epochs:

  LE-0, 0

  LE-1, 3

  LE-2, 6

seg-0-2, uuid-1

Segment epochs

LE-0, 0








seg-3-4, uuid-2

Segment epochs

 LE-1, 3









seg-4-6, uuid-3

Segment epochs

 LE-1, 3

 LE-2, 6


Transactional support

RemoteLogManager copies transaction index and producer-id-snapshot along with the respective log segment earlier to last-stable-offset. This is used by the followers to return aborted transactions in fetch requests with isolation level as READ_COMMITTED. 

...