ID

IEP-17

Author

Sponsor

Created

03 Apr 2018

Status


colour	Grey
title	DRAFT

Table of Contents

Motivation

Current service deployment procedure depends on an internal replicated cache. Each service deployment is a distributed transaction on this cache. This procedure proved to be deadlock-prone on unstable topology.

...

Currently when service implementation or configuration changes, you can't make existing instances be redeployed without manual undeployment. GridServiceProcessor has access to the serialized representation of services only, so it can't tell, if anything have changed since previous deployment.

Description

This section contains a description of the proposed service deployment protocol.

Discovery-based deployment

To make service deployment process more reliable on unstable topology and to avoid stuck deployments, that are possible in current architecture, service deployment should be based on custom discovery messages distribution.

Successful scenario

Deployment starts with sending of a custom discovery event, that notifies all nodes in the cluster about the ongoing deployment. This message contains serialized service instance and its configuration. It is delivered to the coordinator node first, that calculates the service deployment assignments and adds this information to the message. During the following round-trip of this message, nodes save information about service deployment assignments to some local storage, and the ones, that were chosen to deploy the services, do it asynchronously in a dedicated thread pool.

...

Once all deployment results are collected, coordinator sends another discovery message, notifying all nodes about successful deployment. This is the moment, when deployment futures are completed and the control is returned from IgniteServices#deploy* methods. Also Service#execute() method starts its work on successful deployment message arrival.

Failure during deployment

There are three types of errors, that should be handled correctly.

Error during service initialization on a node, included into assignment. In this case the problematic node sends failure details to the coordinator over the communication protocol. Once the coordinator receives the failure details, it sends a discovery message, containing this information, to all nodes, so the deploying methods can throw a corresponding exception.
Failure of a node, included into assignment. This situation triggers recalculation of service deployment assignments. Coordinator node sends another discovery message with a set of new assignments in it. If a node already initialized a service and it is not present in the new assignments set, then the service should be cancelled.
Coordinator failure. This situation is processed in a similar way as the previous one. The only difference is that the nodes should resend deployment results to the new coordinator.

Service versioning

TBD

Risks and Assumptions

These changes will break compatibility with previous versions of Apache Ignite completely.

Discussion Links

Service grid redesign: http://apache-ignite-developers.2346864.n4.nabble.com/Service-grid-redesign-td28521.html

Service versioning: http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-td20858.html

Tickets

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	IGNITE-3392

...

Page tree

Versions Compared

Old Version 2

New Version 3

Key

Motivation

Description

Discovery-based deployment

Successful scenario

Failure during deployment

Service versioning

Risks and Assumptions

Discussion Links

Tickets

Page tree

Page History

Versions Compared

Old Version 2

New Version 3

Key

Motivation

Description

Discovery-based deployment

Successful scenario

Failure during deployment

Service versioning

Risks and Assumptions

Discussion Links

Tickets