Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Move persistence to future work

...

Once all deployment results are collected, coordinator sends another discovery message, notifying all nodes about successful deployment. This is the moment, when deployment futures are completed and the control is returned from IgniteServices#deploy* methods. Also Service#execute() method starts its work on successful deployment message arrival.

After that all nodes persist service configurations to the meta-space, so deployed services will live through cluster restarts.

Failure during deployment

...

  • Error during service initialization on a node, included into assignment. In this case the problematic node sends failure details to the coordinator over the communication protocol. Once the coordinator receives the failure details, it sends a discovery message, containing this information, to all nodes, so the deploying methods can throw a corresponding exception.
  • Failure of a node, included into assignment. This situation triggers recalculation of service deployment assignments. Coordinator node sends another discovery message with a set of new assignments in it. If a node already initialized a service and it is not present in the new assignments set, then the service should be cancelled.
  • Coordinator failure. This situation is processed in a similar way as the previous one. The only difference is that the nodes should resend deployment results to the new coordinator.

Deployment on new

...

nodes [In progress]

When a new node connects to the existing cluster, all needed services are deployed and initialised on it by the time it is accepted to the topology.

This is what happens, when a new node joins the cluster:

  1. connecting node sends a TcpDiscoveryJoinRequestMessage with persisted service configurations attached to itTcpDiscoveryJoinRequestMessage;
  2. coordinator recalculates service assignments and attaches them to the successive TcpDiscoveryNodeAddedMessage;
  3. connecting node receives the assignments, initialises all needed services and sends confirmation to the coordinator on completion over communication;
  4. coordinator sends TcpDiscoveryNodeAddFinishedMessage only when it receives confirmation about deployed services from the joining node.

Nodes also store deployment counter for each service in meta-space. It shows, how many times a service has been undeployed or redeployed. It will help in situations, when a joining node has some persisted services, that are missing on other nodes. This way undeployed services won't be brought back by the nodes, that were killed before the undeployment happened.

Hot redeployment

It should be possible to update service implementation without downtime. Employment of Deployment SPI should solve this problem.

Service processor should subscribe to class deployments and restart corresponding services, when their classes change.

The basic usage scenario involves enabling UriDeploymentSpi and updating the JAR files, containing implementation classes. It will lead to existing services cancellation and reployment. It implies, that services should be ready to sudden cancellations. Documentation should contain explanation of this fact with examples. 

...

These changes will break compatibility with previous versions of Apache Ignite completely.

Also there will be no way to preserve services between cluster restarts. Even though there is such possibility currently.

Further work

There is another flaw in are still some flaws in the current service grid design, that are not covered in this IEP.

Moving existing services to new nodes

Suppose, we have one server node, and we deploy several cluster-singleton services on it.

...

Changes, described in current IEP don't interfere with service rebalancing in any way, so we can do it as a separate task after the proposed changes are implemented.

Service persistence

In previous design services could be persisted to disk along with other data in system caches.

Since the utility cache will become abandoned after the proposed changes, there will be no way to preserve deployed services between cluster restarts.

If we deside to keep this feature, then we should implement storing the service configurations. Also a way to configure this behaviour should be developed.

Discussion Links

Service grid redesign: http://apache-ignite-developers.2346864.n4.nabble.com/Service-grid-redesign-td28521.html

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyIGNITE-6069
,