Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Users can only provide a function mapping from a record to a split. Since the key group should be transparent to the user, the mapping of Split to KeyGroup has to be established by the Flink.
    1. However, the paradox is that for a specific subtask, the key group range assignment is static while the split assignment is actually dynamic and irregular by different Sources. For example, the split assignment for Kafka Source is push-based, which would round-robin assigning the new split to source readers. Another example is that the split assignment for File Source is pull-based, which means that the split assignment is totally irregular. 
    2. Therefore, the mapping of a split and a key group cannot build until the split is found and assigned.
    3. Another possible solution is that the split is assigned to a key group before the job started, therefore, the split would be bound with a specific source operator. However, this might not be compatible with all of the Sources, e.g., File Source.
  2. The mapping of a key to a split could still be hard for the user to provide since it can be done automatically through an external system, e.g., Kafka.
  3. The maximum number of currently existing splits must not be larger than the maximum number of key groups. Because every split would be mapped to one key group. 

...