ID	IEP-103
Author	Andrey Gura
Sponsor	Andrey Gura
Created	23 May 2023
Status	IN PROGRESS

Motivation

In order to perform any computations on a cluster the product should provide a flexible mechanism for code deployment at runtime. Because any software application evolves over time it should be possible to perform code deployment seamlessly and without cluster downtime or node restart. Also it is a common approach to use a versioning and an isolated execution context for code execution on a cluster.

Basic requirements

It should be possible to deploy/undeploy code for execution using CLI and REST API. This proposal doesn’t require a public Java API for the code deployment because usually code deployment is related to development operations rather than application development process.
A deployment/undeployment process must be reliable in a context of distributed systems. If a deployment unit is deployed on a cluster then there are no nodes which don’t have access to this deployment unit or can’t request this deployment unit from another node. If a deployment unit is undeployed then there are no nodes which have access to this deployment unit.
Due to network unreliability the checksum checking must be used during transferring deployment units over the network.
Code deployment must not require any downtime and have to be performed at runtime.
Deployment unit must have an identification mechanism which is suitable for people and machines.
Deployment unit management should provide statuses which reflect a deployment unit’s life cycle and prevent concurrency and code execution isolation issues.
It must be possible to get a list of deployment units that are deployed on a cluster or a particular node and status of each deployment unit.
Compute API must provide an ability to define an execution context - a list of dependencies (deployment units with versions) for a particular job execution. It allows isolated code execution.
It must be possible to avoid usage of an explicit deployment unit version in defining execution context. Instead it should be possible to define usage of the latest version of a particular deployment unit (LATEST notation).
Compute API still supports only types of parameters and return values defined in Compute API in Apache Ignite 3: Phase 1 - Simple remote job execution: Interoperability.
It must be impossible to undeploy a deployment unit if there are any job executions in progress which require these deployment units.
It must be impossible to start a new job which requires a particular deployment unit in case of undeploy of a deployment unit is requested.
It must be impossible to start a job execution in case of absence of a required deployment unit. An error should be returned to the client code.
If a deployment unit is undeployed and there are no any job executions in progress which require this deployment unit then all class loaders and classes loaded by these class loaders must be eventually collected by GC.

Definitions

Deployment unit - a set of artifacts required for code execution on cluster nodes (e.g. Java class files, JAR archives, configuration files and other resources). A deployment unit (one or several ones) defines a context for some code execution (e.g. job execution).

Deployment unit ID - an arbitrary text identifier (e.g. name) which will be used by a user and Ignite 3 cluster for deployment unit manipulations (e.g. deploy\undeploy). The ID must satisfy Maven groupId naming conventions (https://maven.apache.org/guides/mini/guide-naming-conventions.html).

Deployment unit version - a comparable version in major.minor.patch format (semantic versioning). Deployment unit ID and deployment unit version uniquely identify a particular deployment unit. In fact, the version is some kind of redundant information but it could be useful for an end user (e.g. “use latest” scenario or in a development process).

Deployment unit status - defines a state of some deployment unit in a cluster or a node. Deployment unit status allows:

prevent concurrent deployment of the same deployment units.
prevent execution of code which requires a deployment unit scheduled for removal from the cluster.
track deployment progress.

Deployment units base directory - a file system directory where all deployment units must be placed.

Deployment unit directory - a directory under the deployment units base directory where a particular deployment unit is placed.

Design

Deployment unit operations overview

There are two perspectives for deployment unit operations: a user perspective and a cluster perspective.

From a user perspective 3 main operations are allowed for deployment unit:

Deploy a unit on the cluster
Undeploy (remove) a deployment unit from the cluster
Get status about deployment units in the cluster: list of all deployment units in the cluster with their statuses (will be defined below).

Also, it would be useful to have an ability to get a list of all deployment units on a specific node. The main scenario for this operation is troubleshooting.

From a cluster perspective a deployment unit could be deployed on the cluster, but it could be not deployed on a particular node. So additional operation is required: deploy unit on the target node (on demand).

Command line interface

Unit deployment

Deploys a deployment unit with identifier <deployment_unit_id> to the cluster.

$ ignite3 unit deploy <deployment_unit_id> --version <deployment_unit_version> --path <file>|<dir>

Options:

--version - defines the deployment unit’s version (<deployment_unit_version>).
--path - path to JAR-file (<file>) or to the directory (<dir>) which contains all required class files and resources.

Unit removal

Removes a deployment unit with identifier <deployment_unit_id> from the cluster (and all available nodes).

$ ignite3 unit undeploy <deployment_unit_id> --version <deployment_unit_version>

Options:

--version - defines the deployment unit’s version (<deployment_unit_version>).

List of units on the cluster or on the node

Prints out a list of all deployment units or given deployment unit with identifier <deployment_unit_id> on the cluster or node and their status.

$ ignite3 unit list [<deployment_unit_id>] [--version <deployment_unit_version>]] [--node <node_id>]  [--status [UPLOADING[, DEPLOYED [, …]]]]

Output example:

| Unit             | Version | Status    |
| foo.example.job  | 1.0.0   | DEPLOYED  |
| foo.example.job  | *1.0.1  | DEPLOYED  |
| foo.example.job  | 1.0.2   | UPLOADING |
| foo.example.task | *1.0.0  | DEPLOYED  |

The asterisk symbol (*) in the Version column highlights the latest deployed version of a deployment unit.

Options:

--version - filters out deployment unit by version (exact match assumed).
--node - defines target node.
--status - allows to filter out deployment units by given statuses.

Directories structure

All deployment units must be placed in the deployment unit base directory which is a subdirectory under Ignite work directory:

<ignite_work_dir>/deployments

For each deployment unit a directory should be created under the base directory. The name of this directory must be the same as deployment unit ID and a nested directory for a particular deployment unit version must be created.

Example:

- deployments
	- foo.example.job 
		- 1.0.0
		- 1.0.1
	- foo.example.task
		- 1.0.0
		- 2.0.0

Deployment unit operations

A deployment unit is deployed on the cluster if it was uploaded at least on the CMG Raft group majority nodes. It will allow the system to request a needed deployment unit from these nodes on demand. A corresponding record is present in meta storage.

A deployment unit is deployed on a node if it was uploaded to this particular node (during regular deployment or on-demand). A corresponding record is present in meta storage.

The deployment unit operations should take into account that some processes could be started concurrently, so design should prevent concurrent process execution in order to avoid race conditions and problems like ABA.

In order to satisfy all requirements the following terms are introduced:

depOpId - ID of deployment operation. It needs to avoid concurrency issues. Should be unique. Meta storage revision number

nodeDURecord - status of a deployment unit on a node. Abstract structure nodeDURecord{depOpId, status, (node1, node2)} means that the deployment unit’s status is status on nodes node1 and node2 and deployment operation ID is depOpId.

clusterDURecord - status of a deployment unit on a cluster. Abstract structure clusterDURecord{depOpId, status} means that the deployment unit’s status in the cluster is status and deployment operation ID is depOpId.

Deployment unit status

The status defines a state of deployment unit in the cluster. The following statuses are allowed:

UPLOADING - unit deployment to a cluster/node is in progress.
DEPLOYED - unit is deployed on a cluster/node.
OBSOLETE - remove command was initiated for the unit and it will be removed soon from a cluster/node.
REMOVING - unit removal from a cluster/node is in progress.

Unit deployment process

The deployment unit must be uploaded to the CMG Raft group majority nodes (Fig. 1 shows an example of such a majority with an included leader but it is not mandatory). It will ensure that the system is able to detect the change of leader and deploy units correctly; the leader is always a part of the majority during the uploading process.

While majority nodes are available there is at least one node which has an uploaded deployment unit and this deployment unit could be deployed on another node on demand. Otherwise the cluster itself can’t operate correctly.

Fig. 1

Fig. 1

The following steps must be performed during the uploading process:

Choose a majority which contains available nodes.
Supplement chosen set of target nodes by user provided nodes if needed.
Assign depOpId the meta storage revision number.
Create meta storage record for given deployment unit clusterDURecord(depOpId, UPLOADING). This operation could fail because another process has already created a record for the same deployment unit in any status.
Initiate uploading of files to the target nodes and create meta storage record for given deployment unit and node nodeDURecord(depOpId, UPLOADING, node).
As soon as the deployment unit is uploaded to the node, nodeDURecord.status must be changed to DEPLOYED value - nodeDURecord(depOpId, DEPLOYED, node).
As soon as the deployment unit is uploaded to the target nodes, clusterDURecord.status must be changed to DEPLOYED value - clusterDURecord(depOpId, DEPLOYED).

A deployment unit is considered deployed to the cluster only if clusterDURecord.status == DEPLOYED. It means that there is at least one available node which can be a source for downloading the deployment unit on demand.

Unit deployment on-demand process

If code execution is initiated on some node (due to a Compute API invocation) and this target node doesn’t have a required deployment unit it could be requested from some node which already has the required deployment unit. It means that the required deployment unit has clusterDURecord.status == DEPLOYED and there is at least one node where nodeDURecord.status == DEPLOYED. If this conditions are met the following steps must be performed:

Add meta storage record for target node and deployment unit where nodeDURecord.status == UPLOADING and nodeDURecord.depOpId == clusterDUStatus.depOpId. The node should prevent concurrent races between different threads. It’s achievable without any meta storage operations because there is no other node which can initiate the on-demand deployment process.

Note that if clusterDURecord.status == DEPLOYED && nodeDURecord.status == UPLOADING => deployment on-demand is in progress for a given deployment unit and given node.
As soon as the deployment unit is uploaded to the target node, nodeDURecord.status must be changed to DEPLOYED value.

Node restart during unit deployment process

If a target node was restarted during uploading process then the node must find all deployment units with nodeDURecord.status == UPLOADING for restarted node and if clusterDURecord.status == DEPLOYED all such deployment unit should be removed from the node. Corresponding statuses also must be removed from meta storage for this node. Deployment units could be requested again if needed and deployment on-demand will be initiated for this case.

Note, if the target node is in majority then the process which coordinates deployment process should ensure that either another node from majority will be chosen as target or restarted node will repeat the process.

Deployment unit removal process

The deployment unit must be removed from all cluster nodes. In order to achieve this the following must be implemented:

Change clusterDURecord.status to OBSOLETE value. This operation could fail because another process has already changed status to OBSOLETE or REMOVING value. It is also impossible to start an undeployment process in case the deployment process is still in progress.

After this step the deployment unit is not available for new code execution. Code execution in progress still can use this deployment unit.
Meta storage event must be fired to all target nodes due to a change of clusterDURecord.status.
After receiving this event by the target node the system must change nodeDURecord.status to OBSOLETE value.
The node waits for finishing of all code executions in progress that depend on this deployment unit. As soon as all code executions are finished nodeDURecord.status must be changed to REMOVING value.

From this point it is impossible to use the deployment unit for code execution neither for new tasks nor for old tasks (the second is impossible due to the invariant that all old tasks are finished).
For each change of nodeDURecord.status to REMOVING value the system is able to receive an event from meta storage and check that all nodes have nodeDURecord.status == REMOVING. If the condition is met then clusterDURecord.status must be changed to REMOVING too.
Now the deployment unit can be removed from each target node and, after it, remove corresponding status records.
For each removal of nodeDURecord record from meta storage the system is able to receive an event from meta storage and check that there are no any nodeDURecord records for the given deployment unit. Now the system must remove the clusterDURecord record for the deployment unit.

Note that If the deployment unit was removed then there are no any class loaders associated with this deployment unit. Eventually the class loader should be collected by GC and all classes must be unloaded from JVM. It is the critical requirement in order to avoid memory leaks related to multiple class loading/unloading.

Node restart during unit removal process

If a target node was restarted during deployment unit removal process then the node must find all deployment units with clusterDURecord.status == OBSOLETE or clusterDURecord.status == REMOVING for restarted node and finish deployment unit removal process as described in the previous section.

Validation after node restart

Every deployment unit with nodeDURecord.status == DEPLOYED must be validated in order to prevent a situation when the deployment unit was undeployed and then deployed again with the same deployment unit ID and version but with different content (some kind of ABA problem). The nodeDURecord.depOpId and corresponding clusterDURecord.depOpId must be the same. Otherwise the deployment unit should be removed from the node.

For every nodeDURecord(depOpId, UPLOADING) record corresponding clusterDURecord(depOpId, DEPLOYED) or clusterDURecord(depOpId, UPLOADING) must exist. Otherwise the invalid deployment unit must be removed from the node.

For every nodeDURecord(depOpId, DEPLOYED) record corresponding clusterDURecord(depOpId, DEPLOYED) or clusterDURecord(depOpId, UPLOADING) must exist. Otherwise the invalid deployment unit must be removed from the node.

For every nodeDURecord(depOpId, OBSOLETE) or nodeDURecord(depOpId, REMOVING) record corresponding clusterDURecord(depOpId, OBSOLETE) or clusterDURecord(depOpId, REMOVING) must exist. Otherwise the invalid deployment unit must be removed from the node.

Code execution

Execution context

A user should have an ability to have different execution contexts where different versions of classes with the same names could be used. This context is like a web app context in servlet containers but the context in Apache Ignite Compute API is defined by a list (with pre-defined order) of deployment units the job execution depends on which.

Use case 1: from parent to specific

For example Apache Ignite cluster is shared between two teams which have, nevertheless, the common set of shared libraries and parent logic related libraries. In this uses cases there are 3 deployment units:

parent:1.0.0
team1:1.0.0
team2:1.0.0

In this case the parent:1.0.0 deployment unit contains most of the application logic while deployment units team1:1.0.0 and team2:1.0.0 contain some specific entities and reuse classes from the parent deployment unit a lot. The common approach here is finding class at parent libs and then in the specific libs. So each team will define an execution context in the following way:

Team1 - [parent:1.0.0, team1:1.0.0]
Team2 - [parent:1.0.0, team2:1.0.0]

The business logic of each team is fully isolated while shared libs are available to both teams.

Use case 2: override parent by specific

In this use case a user has an application and wants to override some piece of business logic from it. So the user has the following deployment units:

app:1.0.0
app:1.0.1

In this case app:1.0.1 overrides some logic from app:1.0.0 and the user will specify execution context as [app:1.0.1, app:1.0.0].

Chained Ignite class loader

Chain Ignite class loader is the class loader which resolves classes in order of definition of dependencies (from the first to the last).

Class loaders hierarchy

Fig. 2

For user defined classes loading and execution are responsible class loaders. The class loaders are built in the hierarchy pictured above.

The Bootstrap, the extension and the system class loaders are provided by the runtime environment and each class loader first delegates class loading logic to the parent class loader.

Such an approach is not suitable for all user use cases. For example it is impossible to override behavior of some classes. Apache Ignite 3 provides more flexible behavior of class loaders which could be managed by a user - Ignite chained class loader. In general, the Ignite class loader behave like web application classloader in popular web application containers (e.g. Apache Tomcat):

Ignite class loader disallows to override classes from the following packages:
- org.java.*
- org.javax.*
- org.apache.ignite.*
If the system tries to find a class from packages listed above then the Ignite class loader should delegate this logic to the parent class loader first (in our hierarchy it is the system class loader).

Ignite class loader allows to override any non-system and non-Ignite class. If the system tries to find class from any package that is not listed in the previous item then Ignite class loader should try to find class over corresponding deployment unit and delegate this logic to the parent class loader in case of failure.

Compute API changes

`Version`

Version entity should be introduced to the public API with the following properties:

major - major version, integer number.
minor - minor version, integer number.
patch - patch version, integer number.

Usage of “pre-release” field (e.g. beta, RC) is also acceptable (see Semantic versioning) but not required.

Version.LATEST special value also should be available. This value denotes any version number that has the highest value.

The entity must be comparable.

`DeploymentUnit`

DeploymentUnit entity should be introduced to the public API with the following properties:

name - name of the deployment unit, string value.
version - version of the deployment unit, instance of Version entity.

`IgniteCompute` interface

All methods execute, executeColocated and broadcast which receive type Class as parameter must be removed because it is impossible to get Class instance for the task from class loader which is not bootstrap, system or extension. Instead methods which get task class as String must be used.
All methods execute, executeColocated and broadcast must get DeploymentUnit’s array as parameter in order to define execution context (dependencies and order of class loading).

Dependency resolving and class loading

Regular behavior

As soon as code execution is initiated due to invocation one of the methods execute, executeColocated or broadcast the following steps must be performed:

Get a class loader which is identified by an array of DeploymentUnit instances passed to the called method.
If the class loader doesn’t exist yet, then create it:
1. If the deployment unit is deployed and can be used for code execution then use its location for class loading
2. If the deployment unit is deployed to the cluster but is not deployed to the node then initiate deployment on-demand.
3. If the deployment unit doesn’t exist, throw the public DeploymentUnitUnavailableException with “<class_fqdn>. Deployment unit <deployment_unit_id_and ver> doesn’t exist” message, where <class_fqdn> is a compute job/class class name, <deployment_unit_id_and_ver> is a concatenation of ID and version (e.g. com.example:1.0.0).
1. If the deployment unit exists but can’t be used for code execution, throw the public DeploymentUnitUnavailableException with “<class_fqdn>. Deployment unit <deployment_unit_id> can’t be used: [clusterStatus = <clusterDURecord.status>, nodeStatus = <nodeDURecord.status>]” message, where <class_fqdn> is a compute job/class class name.

For each deployment unit check that it is deployed:
Check that deployment unit can be used for code execution:

Load a class of task/job passed to the Compute API method as parameter. At this point ClassNotFoundException could be thrown from the class loader. Consider this exception as a job execution time exception (see the next step).
Execute required code. At this point any standard Java exception or job specific exception could be thrown. Consider any of these exceptions as a job execution time exception and throw public ComputeExecutionException which contains an original exception as a cause.

In order to have the possibility to detect that some code is in progress and uses some class loader a reference counter could be used. The counter must be incremented when job execution is started and decremented when code execution is finished.

LATEST semantics

The only difference between regular behavior and LATEST semantics is the need to resolve the latest version of an available deployment unit before every code execution. It means that two consecutive calls of the same task/job could lead to creation of different class loaders because a new version of the deployment unit can be deployed between these calls.

Known issues and future improvements

It is not clear what to do in case of CMG reconfiguration because the new CMG can consist of nodes which are not related to the previous CMG.
UX. Probably it is a good idea to add purge command to the operations on deployment units. This command is like undeploy but ignores all conditions and just removes all deployment units and statuses from meta storage. It could lead to errors during code execution but also could be the only method to remove the deployment unit in case of unexpected problems.
UX. Remove all versions of deployment unit by version. Maybe this will be useful during the development phase.
UX. Redeploy. Maybe it will be useful to have atomic undeployment and deployment of deployment units. But due to the asynchronous nature of the deployment process it could be hard to implement properly in order to provide good UX.
UX, maintenance, support. It will be useful to add the possibility to validate deployment units. We can extend the list command with --validate flag for example.There are at least two cases for this functionality:
1. Validation of invariants (there is no DUs in odd statuses)
2. Validation of deployed DUs over checksums in order to detect:
  1. File corruption
  2. ABA-problem consequences
It could be really handy to choose between lazy and eager deployment of units (per deployment unit basis) including autodeploy of units on a node join. Also it is a good idea to have some kind of force deployment on some subset of nodes (defined by predicate). Will be designed a bit later.
It seems that it is possible to deploy Java lambdas as some kind of temporary deployment unit with limited life time. Will be designed later.

References

[RFC] Compute API in Apache Ignite 3: Phase 1 - Simple remote job execution

Open Tickets

key	summary	type	created	updated	due	assignee	reporter	customfield_12311032	customfield_12311037	customfield_12311022	customfield_12311027	priority	status	resolution
JQL and issue key arguments for this macro require at least one Jira application link to be configured

Closed Tickets

key	summary	type	created	updated	due	assignee	reporter	customfield_12311032	customfield_12311037	customfield_12311022	customfield_12311027	priority	status	resolution
JQL and issue key arguments for this macro require at least one Jira application link to be configured

Page tree

IEP-103: Code Deployment

Motivation

Basic requirements

Definitions

Design

Deployment unit operations overview

Command line interface

Unit deployment

Unit removal

List of units on the cluster or on the node

Directories structure

Deployment unit operations

Deployment unit status

Unit deployment process

Unit deployment on-demand process

Node restart during unit deployment process

Deployment unit removal process

Node restart during unit removal process

Validation after node restart

Code execution

Execution context

Use case 1: from parent to specific

Use case 2: override parent by specific

Chained Ignite class loader

Class loaders hierarchy

Compute API changes

Version

DeploymentUnit

IgniteCompute interface

Dependency resolving and class loading

Regular behavior

LATEST semantics

Known issues and future improvements

References

Open Tickets

Closed Tickets

`Version`

`DeploymentUnit`

`IgniteCompute` interface