...
Before any Flink job can be executed, one JobManager and one or more TaskManager have to be started. The TaskManager then registers at the JobManager by sending a RegisterTaskManager
message to the JobManager. The JobManager acknowledges a successful registration with an AcknowledgeRegistration
message. In case that the TaskManager is already registered at the JobManager, because there were multiple RegisterTaskManager
messages sent, an AlreadyRegistered
message is returned by the JobManager. If the registration is refused, then the JobManager will respond with a RefuseRegistration
message.
A job is submitted to the JobManager by sending a SubmitJob
message with the corresponding JobGraph
to it. Upon receiving the JobGraph
, the JobManager creates an ExecutionGraph
out of the JobGraph
which serves as the logical representation of the distributed execution. The ExecutionGraph
contains the information about the tasks which have to be deployed to the TaskManager in order to be executed.
The JobManager's scheduler is responsible for allocating execution slots on the available TaskManagers. After allocating an execution slot on a TaskManager, a SubmitTask
message with all necessary information to execute the task is sent to the respective TaskManager. A successful task deployment is acknowledged by TaskOperationResult
. Once the sources of the submitted job are deployed and running, also the job submission is considered successful. The JobManager informs the JobClient about this state by sending a Success
message with the corresponding job id.
State updates of the individual task running on the TaskManagers are sent back to the JobManager via UpdateTaskExecutionState
messages. With these update messages, the ExecutionGraph
can be updated to reflect the current state of the execution.
The JobManager also acts as the input split assigner for data sources. It is responsible for distributing the work across all TaskManager such that data locality is preserved where possible. In order to dynamically balance the load, the Tasks
request a new input split after they have finished processing the old one. This request is realized by sending a RequestNextInputSplit
to the JobManager. The JobManager responds with a NextInputSplit
message. If there are no more input splits, then the input split contained in the message is null
.
The `Task`s Tasks
are deployed lazily to the TaskManagers.
This This means that tasks which consume data are only deployed after one of its producers has finished producing some data.
Once Once the producer has done so, it sends a `ScheduleOrUpdateConsumers` ScheduleOrUpdateConsumers
message to the JobManager.
This This messages says that the consumer can now read the newly produced data.
If If the consuming task is not yet running, it will be deployed to a TaskManager.