Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The first partition discovery is so slow, before which the checkpoint is executed and then job is restarted . At this time, the restored unassignedInitialPartitons is an empty set, which means non-discovery. The next discovery will be treated as first discovery.
  • The first time the partition is discovered is empty, and new partitions can only be found after multiple partition discoveries. If a restart occurs between this period, the restored unassignedInitialPartitons is also an empty set, which means empty-discovery.The next discovery will be treated as new discovery.

When the second case will occur? Currently, there are three ways to specify partitions in Kafka: by topic, by partition, and by matching the topic with a regular expression. Currently, if the initial partition number is 0, an error will occur for the first two methods. However, when using a regular expression to match topics, it is allowed to have 0 matched topics.


The following is a modification of KafkaSourceEnumState:

...

Changing KafkaSourceEnumState also requires changing the corresponding serialization and deserialization methods. Compatibility issues should be taken into consideration during deserializing. Thus,  pay attentions to KafkaSourceEnumStateSerializer.



Test Plan

This feature will be guarded by unit tests and integration tests in the Kafka connector's source code.

...