Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I introduced a few new components.

1) Scheduler

Scheduler The scheduler is a new components component it include includes many critical features.

It is in charge of two major features, 1. queueing and routing activations 2. decide whether to add more containers based on the loads.

It has a sub-component `Queue`. The role of the queue is similar to the Kafka topic. A dedicated queue is created for each action.

Each queue will receive activation messages from the Kafka for a given action and send them to the ContainerProxy in respond response to requests from it.

2)

...

ETCD

etcd ETCD is a distributed reliable key-value store. etcd is mostly used for transactional support and information sharing among components.

...

In this version, I could not fully rule out it due to heavy dependencies. My final objective is to exclude it at least from the critical path.

It is also being used to send queue creation request to the scheduler.


4) Akka-

...

remote

Akka-cluster is used for schedulers to communicate with each other.

Akka-grpc is required to define a grpc message, there are some advantages in akka-cluster when it is being used for simple inter-cluster communication.

It is also being used to send queue creation request to the scheduler.




Entire The entire system is comprised of three flows, 1. queue creation flow, 2. container creation flow, 3. activation flow.

...

① Once a new request comes, there is no queue for the action, a PoolBalancer starts queue creation flow by randomly choosing a scheduler and calling QueueCreationAPI(Akka-grpc)send a CreateQueue request via akka-remote.

② If a scheduler receives QueueCreation request, it checks whether there is already queues for the action. If there are already three queues(1 leader + 2 followers), it responds with Success. If there are lesser than 3 queues, it first tries ETCD transaction, and if succeeded, write the information in ETCD to describe queue creation is in progress. ③ During this step, In the meantime, if other schedulers can also receives the receive queue creation requests then the transaction will be failed and they will respond with Success because queue creation is already in progress.

④ A scheduler who achieves the transaction, select the three schedulers with the least number of queues and send queue creation request to QueueManagers in each scheduler via Akka-cluster(remoting). This implies all schedulers should be clustered using Akka-cluster.

⑤ Once a QueueManager receives the request, it will create a queue and send a Start message to it and wait for the Ack.

⑥ Each created queue tries to become a leader and only one of them becomes a leader. The leader writes the endpoint of the scheduler in ETCD for activation clients(ContainerProxy), and responds with Ack[Leader|Follower] to QueueManager. All followers watch the leader key and wait for the leadership change to become a leader in case of the leader failure.

⑦ Once a QueueManager receives Ack from the leader queue, it creates a MessageConsumer with the ActorRef of the queue then the MessageConsumer fetches activation messages from Kafka and send them to the leader queue.

③ The QueueManager creates a queue.
④ Once a queue is created, it tries to store its endpoint to ETCD so that containers can connect to the queue. It sends queue endpoint data to DataManagementService.

⑤ A DataManagementService receives the data it stores it in ETCD. Since PoolBalancers are watching ETCD events for queue endpoints. So once a queue is created, it does not try to create a queue for the given action anymore and send activation messages to the scheduler which has the queueIn the future, PoolBalancer would send activation requests to the scheduler directly.


2. Container Creation Flow

Image Modified

① When a new queue is created, it immediately sends a container creation request to the ContainerManager because there is no container yet.

ContainerManager looks for available invokers. The word "available" means healthy invokers with enough resources.

ContainerManager ContainerManager  selects one of healthy the invokers and sends a ContainerCreation request via invokerN `invokerN` Kafka topic. While scheduling, it takes Blackbox and resource tags into account. Invokers can have heterogeneous resources and tags describe such resources.

③ At the same time, ContainerManager will register a job for the container creation request to CreationJobManager. 

CreationJobManager is in charge of managing container creation, it stores "in-progress" data for container creation in ETCD. In case container creation takes too long, a timeout happens for the job and CreationJobManager deletes the in-progress data so that more container creation can happen.

④ Invoker ③ Invoker receives the ContainerCreation Message via Kafka. Above step ④ happens asynchronously, both steps(④) happens concurrently.

⑤ Invoker creates ④ Invoker create a ContainerProxy for the give action.

⑤ When ⑥ When container creation is finished or failed, the invoker sends the ack(Success | Rescheduling) to the ContainerScheduler via creationAck `creationAck` topic. The newly created ContainerProxy accesses to ETCD and figures out the endpoint of the leader queue and initialize the Akka-grpc client to fetch the activation from the leader queue.⑥ Once the ContainerManager receives ContainerCreationMessage includes the target scheduler endpoint with the queue enabling containers to connect to the right scheduler. In case any error happens while accessing the scheduler, it tries to fetch the queue endpoint data from ETCD. Once the CreationJobManager receives the ack message. It finishes the ContainerCreation in the normal case, and if it receives Rescheduling ack, it chooses another invoker and send sendㄴ ContainerCreation to it.

 At the same time, once a new container is created, it stores its data to ETCD. This data is used while scheduling containers.


3. Activation Flow

Image Modified

① While the queue and the containers are being created, activation flows the system like above. PoolBalancer just sends the request to the action scheduler topic in Kafka. The MessageConsumer who subscribes to the topic receives the request and deserialize deserializes it. The message is sent to the leader queue. If there is no queue for the given activation message, it will forward it to the proper scheduler by fetching the queue endpoint information from ETCD.

② A ContainerProxy requests activation messages via  FetchActivation API(Akka-grpc).

⑤ This request is delegated to the QueueManager. The QueueManager finds the queue for the action, forwards the ActivationRequest and receives the ActivationResponse. Finally the ActivationResponse is passed to the ContainerProxy.

③ The ActivationService sends a GetActivation request to the proper queue in a long-poll way to avoid the busy-waiting. The singleton QueueManager object has the queue reference so that ActivationService can directly find the queue and send a request.

④ Once ⑥ Once an activation is over, the ContainerProxy sends the results to PoolBalancer via completedM `completedM` Kafka topic. After then, ContainerProxy autonomously continuously calls the fetchActivation API again whenever the invocation finished so that we can maximize the performance reuse of it containers and take advantage of back-pressure.

⑤ PoolBalancer receives the activation result via `completedM` topic and responds to clients.