Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Absolute Preferred Standby Task Distribution

Suppose we have the following infrastructure setup: Three Kubernetes Clusters, lets let us call them them K8s_Cluster1, K8s_Cluster2and K8s_Cluster3  and each Cluster3. Each Kubernetes cluster is spanned across three availability zones:   eu-central-1a, eu-central-1b, eu-central-1c. 

Our use-case is to have a distribution of the standby tasks across different Kubernetes clusters , as well as and AZs so we can be Kubernetes cluster and AZ failure tolerant.

With the new configuration options presented in this KIP, we will have the following:

Info
iconfalse

Node-1:
instance.tag.cluster: K8s_Cluster1
instance.tag.zone: eu-central-1a
standby.task.assignment.awareness: zone,cluster
num.standby.replicas2

Node-2:
instance.tag.clusterK8s_Cluster1
instance.tag.zoneeu-central-1b
standby.task.assignment.awarenesszone,cluster
num.standby.replicas2

Node-3:
instance.tag.clusterK8s_Cluster1
instance.tag.zoneeu-central-1c
standby.task.assignment.awarenesszone,cluster
num.standby.replicas2

Node-4:
instance.tag.clusterK8s_Cluster2
instance.tag.zoneeu-central-1a
standby.task.assignment.awarenesszone,cluster
num.standby.replicas2

Node-5:
instance.tag.clusterK8s_Cluster2
instance.tag.zoneeu-central-1b
standby.task.assignment.awarenesszone,cluster
num.standby.replicas2

Node-6:
instance.tag.clusterK8s_Cluster2
instance.tag.zoneeu-central-1c
standby.task.assignment.awarenesszone,cluster
num.standby.replicas: 2

Node-7:
instance.tag.clusterK8s_Cluster3
instance.tag.zoneeu-central-1a
standby.task.assignment.awarenesszone,cluster
num.standby.replicas2

Node-8:
instance.tag.clusterK8s_Cluster3
instance.tag.zoneeu-central-1b
standby.task.assignment.awarenesszone,cluster
num.standby.replicas2

Node-9:
instance.tag.clusterK8s_Cluster3
instance.tag.zoneeu-central-1c
standby.task.assignment.awarenesszone,cluster
num.standby.replicas: 2


With the infrastructure topology and configuration presented above, we can easily achieve absolute preferred standby task distribution, because total tasks that Absolute Preferred standby task distribution. Absolute Preferred standby task distribution is achievable because we have to allocate three tasks for any given stateful task is three (1 active task + 2 standby task), and it corresponds to unique values that we have for each tag is also three. So the formula for determining if absolute preferred Absolute Preferred standby task allocation is achievable can be something like this :

Info
iconfalse

num.standby.replicas <= (allInstanceTags.values().stream().map(Set::size).reduce(0, Math::min) - 1) // -1 is for active task

1. Formula for determining if Absolute Preferred distribution is possible

Where allInstanceTags is a map of all client instance tags and has a signature of Map<String, Set<String>>

Partially Preferred Standby Task Distribution

Suppose we have the following infrastructure setup: Two Kubernetes Clusters, lets let us call them them K8s_Cluster1, K8s_Cluster2  and each Kubernetes cluster is spanned across three availability zones:   eu-central-1a, eu-central-1b, eu-central-1c. 

Our use-case is similar to the previous section - to have a distribution of the standby tasks across different Kubernetes clusters , as well as and AZs so we can be Kubernetes cluster and AZ failure tolerant.

With the new configuration options presented in this KIP, we will have the following:

Info
iconfalse

Node-1:
instance.tag.cluster: K8s_Cluster1
instance.tag.zone: eu-central-1a
standby.task.assignment.awareness: zone,cluster
num.standby.replicas2

Node-2:
instance.tag.clusterK8s_Cluster1
instance.tag.zoneeu-central-1b
standby.task.assignment.awarenesszone,cluster
num.standby.replicas2

Node-3:
instance.tag.clusterK8s_Cluster1
instance.tag.zoneeu-central-1c
standby.task.assignment.awarenesszone,cluster
num.standby.replicas2

Node-4:
instance.tag.clusterK8s_Cluster2
instance.tag.zoneeu-central-1a
standby.task.assignment.awarenesszone,cluster
num.standby.replicas2

Node-5:
instance.tag.clusterK8s_Cluster2
instance.tag.zoneeu-central-1b
standby.task.assignment.awarenesszone,cluster
num.standby.replicas2

Node-6:
instance.tag.clusterK8s_Cluster2
instance.tag.zoneeu-central-1c
standby.task.assignment.awarenesszone,cluster
num.standby.replicas: 2


With the infrastructure topology presented above, we can't achieve absolute preferred standby achieve Absolute Preferred standby task distribution , because we only have two unique cluster tags  tags in the topology. The Absolute preferred distribution Preferred distribution could have been achieved if we had with a third Kubernetes cluster (K8s_Cluster3) spanned across three AZs (as seen in the pervious previous section).

Even though we can't achieve absolute preferred  Absolute Preferred standby task distribution , with the configuration presented above, we can still achieve partially preferred Partially Preferred distribution.

Partially Preferred distribution can be achieved by distributing standby tasks over different zone tags. Zone has higher precedence than cluster, in  in the standby.task.assignment.awareness.awareness configuration. Therefore, Kafka Streams would prefer to distribute standby tasks over the different zone, rather than different cluster when Absolute Preferred distribution check formula [1] returns false.

 configuration, therefore, Kafka Streams would prefer to distribute standby tasks over the different zone, rather than different cluster num.standby.replicas <= allInstanceTags.values().stream().map(Set::size).reduce(0, Math::min) returns false. Kafka Streams will be eligible to perform Partially Preferred standby task distribution when at least one of the instance tag unique values is >= num.standby.replicas. So formula of determining if Partially Preferred standby task allocation is doable, will look like this: 

Info
iconfalse

num.standby.replicas <= (allInstanceTags.values().stream().map(Set::size).reduce(0, Math::max) - 1) // -1 is for active task

1. Formula for determining if Partially Preferred distribution is possible

Where allInstanceTags is a map of all client instance tags and has a signature of Map<String, Set<String>>

...

The Least Preferred Standby Task Distribution

The Last Least Preferred standby task distribution is eligible when none of the Absolute Preferred and Partially Preferred standby task distribution distributions can be satisfied.

Suppose we have the following infrastructure setup: Two Kubernetes Clusters, lets call them K8s_Cluster1, K8s_Cluster2  and each Kubernetes cluster is spanned across three availability zones:  eu-central-1a, eu-central-1b, eu-central-1c

With the new configuration options presented in this KIP, we will have the following:

...

With the setup presented above, we can't distribute second standby task in different zone as requested by standby.task.assignment.awareness configuration, because there're only two distinct zones available (and one will be reserved for active task). In this case Kafka Streams will default to using Lease the Least Loaded client to allocate remaining standby task.

...

  1. Node-4 (different cluster, different zone), LL([(Node-2, Node-3])

Where LL is a function determining the least-loaded client based on active + standby task assignment.

...