Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languageyml
public static final String STANDBYTASK_ASSIGNMENT_REPLICARACK_AWARENESS_CONFIG = "standbytask.assignment.replicasrack.awareness";
public static final String STANDBYTASK_ASSIGNMENT_REPLICARACK_AWARENESS_DOC = "List of instanceclient tag keys used to distribute standby replicas across Kafka Streams instances." +
                                                              " Tag keys
 must be set in an order of precedence." +                                                       " When configured, Kafka Streams will make a best effort to distribute" +
                                            
               " When configures, Kafka Streams will make a best effort to distribute" +
                    " the standby tasks over each client tag dimension.";



When client.tag.* dimensions are configured, Kafka Streams will read this information from the configuration and encode it into SubscriptionInfoData as key-value pairs.


Code Block
{ 
  "name": "SubscriptionInfoData",
  "validVersions": "1-10",
 //version bump
  "fields": [
    ...
    {
      "name": "clientTags",
      "versions": "10+",
         "type": "[]ClientTag"
    }
  ],
  "commonStructs": [
    {
      "name": the"ClientTag",
 standby tasks over each instance tag dimension.";

Example of standby task allocation

Absolute Preferred Standby Task Distribution

Suppose we have the following infrastructure setup: Three Kubernetes Clusters, let us call them K8s_Cluster1, K8s_Cluster2, and K8s_Cluster3. Each Kubernetes cluster is spanned across three availability zones: eu-central-1a, eu-central-1b, eu-central-1c. 

Our use-case is to have a distribution of the standby tasks across different Kubernetes clusters and AZs so we can be Kubernetes cluster and AZ failure tolerant.

With the new configuration options presented in this KIP, we will have the following:

Info
iconfalse

Node-1:
client.tag.cluster: K8s_Cluster1
client.tag.zone: eu-central-1a
standby.replicas.awareness: zone,cluster
num.standby.replicas2

Node-2:
client.tag.clusterK8s_Cluster1
client.tag.zoneeu-central-1b
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-3:
client.tag.clusterK8s_Cluster1
client.tag.zoneeu-central-1c
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-4:
client.tag.clusterK8s_Cluster2
client.tag.zoneeu-central-1a
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-5:
client.tag.clusterK8s_Cluster2
client.tag.zoneeu-central-1b
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-6:
client.tag.clusterK8s_Cluster2
client.tag.zoneeu-central-1c
standby.replicas.awarenesszone,cluster
num.standby.replicas: 2

Node-7:
client.tag.clusterK8s_Cluster3
client.tag.zoneeu-central-1a
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-8:
client.tag.clusterK8s_Cluster3
client.tag.zoneeu-central-1b
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-9:
client.tag.clusterK8s_Cluster3
client.tag.zoneeu-central-1c
standby.replicas.awarenesszone,cluster
num.standby.replicas: 2

With the infrastructure topology and configuration presented above, we can easily achieve Absolute Preferred standby task distribution. Absolute Preferred standby task distribution is achievable because we have to allocate three tasks for any given stateful task (1 active task + 2 standby task), and it corresponds to unique values for each tag. So the formula for determining if Absolute Preferred standby task allocation is achievable can be something like this :

...

iconfalse

...

F(cluster:[K8s_Cluster1, K8s_Cluster2, K8s_Cluster3], zone:[eu-central-1a, eu-central-1b, eu-central-1c]) will return [cluster: 3, zone: 3]

F(cluster:[K8s_Cluster1, K8s_Cluster2], zone:[eu-central-1a, eu-central-1b, eu-central-1c]) will return [cluster: 2, zone: 3]

F(cluster:[K8s_Cluster1, K8s_Cluster2], zone:[eu-central-1a]) will return [cluster: 2, zone: 1]

1. Formula for determining if Absolute Preferred distribution is possible

Assuming active stateful task 0_0 is in Node-1, Absolute Preferred standby task distribution might look like this:

  1. Node-5 (different cluster, different zone), Node-9 (different cluster, different zone)
  2. Node-6 (different cluster, different zone), Node-8 (different cluster, different zone)

Partially Preferred Standby Task Distribution

Suppose we have the following infrastructure setup: Two Kubernetes Clusters, let us call them K8s_Cluster1, K8s_Cluster2, and each Kubernetes cluster spanned across three availability zones: eu-central-1a, eu-central-1b, eu-central-1c. 

Our use-case is similar to the previous section - to have a distribution of the standby tasks across different Kubernetes clusters and AZs so we can be Kubernetes cluster and AZ failure tolerant.

With the new configuration options presented in this KIP, we will have the following:

Info
iconfalse

Node-1:
client.tag.cluster: K8s_Cluster1
client.tag.zone: eu-central-1a
standby.replicas.awareness: zone,cluster
num.standby.replicas2

Node-2:
client.tag.clusterK8s_Cluster1
client.tag.zoneeu-central-1b
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-3:
client.tag.clusterK8s_Cluster1
client.tag.zoneeu-central-1c
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-4:
client.tag.clusterK8s_Cluster2
client.tag.zoneeu-central-1a
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-5:
client.tag.clusterK8s_Cluster2
client.tag.zoneeu-central-1b
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-6:
client.tag.clusterK8s_Cluster2
client.tag.zoneeu-central-1c
standby.replicas.awarenesszone,cluster
num.standby.replicas: 2

With the infrastructure topology presented above, we can't achieve Absolute Preferred standby task distribution because we only have two unique cluster tags in the topology. The Absolute Preferred distribution could have been achieved with a third Kubernetes cluster (K8s_Cluster3) spanned across three AZs (as seen in the previous section).

Even though we can't achieve Absolute Preferred standby task distribution with the configuration presented above, we can still achieve Partially Preferred distribution.

Partially Preferred distribution can be achieved by distributing standby tasks over different zone tags. Zone has higher precedence than cluster in the standby.replicas.awareness configuration. Therefore, Kafka Streams would prefer to distribute standby tasks over the different zone, rather than different cluster when Absolute Preferred distribution check formula [1] returns false.

Kafka Streams will be eligible to perform Partially Preferred standby task distribution when at least one of the instance tag unique values is >= num.standby.replicas. So formula of determining if Partially Preferred standby task allocation is doable, will look like this: 

...

iconfalse
"versions": "1+",
      "fields": [
        {
          "name": "key",
          "versions": "1+",
          "type": "bytes"
        },
        {
          "name": "value",
          "versions": "1+",
          "type": "bytes"
        }
      ]
    },
    ...
  ]
}

Kafka Streams's Task Assignor will make a decision on how to distribute standby tasks over the clients based on received clientTags with the subscription info and configured task.assignment.rack.awareness configured

Info
Standby task distribution algorithm is not specified in this KIP, but is left as an implementation detail. However, every distribution algorithm must handle gracefully when ideal standby task distribution is not possible; In that case, Kafka Streams must not fail the assignment but try to find the subsequent most optimal distribution. The ideal distribution means there is no repeated client dimension amongst clients assigned to the active task and all standby tasks.

Changes in HighAvailabilityTaskAssignor

Implementation of this KIP must not affect HighAvailabilityTaskAssignor in a breaking way, meaning that all the existing behavior should stay unchanged (e.g., when new configurations are not specified). Once required configurations are set, the main change should happen within the code that deals with standby task allocation, specifically:

HighAvailabilityTaskAssignor#assignStandbyReplicaTasks and HighAvailabilityTaskAssignor#assignStandbyTaskMovements

...

F(cluster:[K8s_Cluster1, K8s_Cluster2], zone:[eu-central-1a, eu-central-1b, eu-central-1c]) will return [cluster: 2, zone: 3]

F(cluster:[K8s_Cluster1, K8s_Cluster2], zone:[eu-central-1a]) will return [cluster: 2, zone: 1]

2. Formula for determining if Partially Preferred distribution is possible

Assuming active stateful task 0_0 is in Node-1, Partially Preferred standby task distribution will look like this:

  1. Node-5 (different cluster, different zone), LL([(Node-3, Node-6] different zones but in same cluster as previous standby or active task).
  2. Node-6 (different cluster, different zone), LL([(Node-2, Node-5] different zones but in same cluster as previous standby or active task). 

Where LL is a function determining the least-loaded client based on active + standby task assignment.

As previously mentioned, In both cases, Kafka Streams will prefer to distribute standby over different zones, since zone has higher precedence than cluster in the standby.replicas.awareness configuration. For instanceIn the case of scenario 1, both Node-3 and Node-6 are in different zone (eu-central-1c) compared to Node-1 (eu-central-1a) and Node-5 (eu-central-1b). As a result overall task distribution will be spanned across three availability zones - active task in Node-1 (eu-central-1a) and standby tasks in Node-5 (eu-central-1b) and Node-3 (eu-central-1c) OR Node-6 (eu-central-1c).

The Least Preferred Standby Task Distribution

The Least Preferred standby task distribution is eligible when none of the Absolute Preferred and Partially Preferred standby task distributions can be satisfied.

Suppose we have the following infrastructure setup: Two Kubernetes Clusters, lets call them K8s_Cluster1, K8s_Cluster2  and each Kubernetes cluster is spanned across three availability zones:  eu-central-1a, eu-central-1b, eu-central-1c

With the new configuration options presented in this KIP, we will have the following:

Info
iconfalse

Node-1:
client.tag.cluster: K8s_Cluster1
client.tag.zone: eu-central-1a
standby.replicas.awareness: zone,cluster
num.standby.replicas2

Node-2:
client.tag.clusterK8s_Cluster1
client.tag.zoneeu-central-1b
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-3:
client.tag.clusterK8s_Cluster2
client.tag.zoneeu-central-1a
standby.replicas.awarenesszone,cluster
num.standby.replicas2

Node-4:
client.tag.clusterK8s_Cluster2
client.tag.zoneeu-central-1b
standby.replicas.awarenesszone,cluster
num.standby.replicas2

With the setup presented above, we can't distribute second standby task in different zone as requested by standby.replicas.awareness configuration, because there're only two distinct zones available. In this case Kafka Streams will default to using the Least Loaded client to allocate remaining standby task.

Assuming active stateful task 0_0 is in Node-1, The Least Preferred standby task distribution will look like this:

  1. Node-4 (different cluster, different zone), LL([(Node-2, Node-3])

Where LL is a function determining the least-loaded client based on active + standby task assignment.

Compatibility, Deprecation, and Migration Plan

...