IDIEP-42
Author
Sponsor
Created

 

Status

COMPLETED


Motivation

Currently, compute grid functionality is not supported by thin client protocol. 

Description

We can start implementing compute grid functionality with an ability to run already deployed to grid compute tasks by task name (as a first step). In this case, to run some custom task users should deploy class with this task to server manually. The same functionality already implemented for binary-rest clients (GridClientCompute). This IEP describes steps to achieve such a goal and doesn't cover any class deployment functionality or other compute grid functionality (ability to run Callable, Runnable or other jobs).  

Currently, there are some limitations in the thin client protocol exists that prevents to implement the execution of long-running async tasks. The protocol allows only one-way requests (from client to server), but to inform the client about task completion effectively we need some kind of server-initiated messages to the client.

Protocol changes

Notifications

The new ability should be added to notify clients from the server-side. Notification - it's a one-way message from server to client. Notifications format should be compatible with existing response messages format (from server to client), to make it possible to distinguish notifications and responses on the client-side. 

Proposed notification format:

Notification
longResource ID (task ID, continuous query ID, etc)
short

Message flags. Bitwise OR operation of following options:

0x0001 Error flag

0x0004 Notification flag (should always be set for notifications)

shortNotification operation code
intError code (if error flag is set)
stringError message (if error flag is set)
...Notification payload (if error flag isn't set)

Operation codes

New request type should be added to start a new task:

NameCode
OP_COMPUTE_TASK_EXECUTE6000

To notify about task completion following notification code should be used:

NameCode
OP_COMPUTE_TASK_FINISHED6001

To cancel the task existing request type should be used:

NameCode
OP_RESOURCE_CLOSE0

OP_COMPUTE_TASK_EXECUTE message format

Request
int

Count of nodes N selected to compute task. If this value is 0, no server nodes should be explicitly listed in the message, but all server cluster nodes should be selected for task execution. 

UUID

Node ID #1

...
UUIDNode ID #N
byte

Task flags. Combination of:

0x01 No-failover flag

0x02 No result caching flag

0x04 Keep binary flag

longTask timeout
stringTask name
objectTask argument
Response
longUnique started task ID (resource ID).

OP_COMPUTE_TASK_FINISHED message format

Notification for successfully executed task
long

Task ID (resource ID).

short

Notification flag (0x0004)

shortOP_COMPUTE_TASK_FINISHED (6001)
objectTask result


Notification for the failed task
long

Task ID (resource ID).

short

Notification flag | Error flag (0x0005)

shortOP_COMPUTE_TASK_FINISHED (6001)
intError code
stringError message

Overall workflow

Proposed task execution workflow:

  1. Client sends OP_COMPUTE_TASK_EXECUTE request and gets task ID as a response if task was successfully registered.
  2. Client has the ability to cancel the task using OP_RESOURCE_CLOSE request and passing task ID as a resource ID to the server.
  3. Server notifies the client with OP_COMPUTE_TASK_FINISHED message when the task is completed (successfully, unsuccessfully or if it was canceled by client). Notification should be sent eventually for each task which was successfully registered by OP_COMPUTE_TASK_EXECUTE request.

Feature masks

Currently, protocol versions are used to support backward-compatibility. Sometimes it's not convenient since to support some feature client should increase protocol version and support all other features introduced in all the previous versions of the protocol. This problem can be solved by using feature masks in the new protocol version. Clients and servers should inform each other on handshake about features that they supported.

Proposed new protocol version: 2.0.0

Proposed changes to handshake request and response: 

Request
byteHandshake code, always 1
shortVersion major
shortVersion minor
shortVersion patch
byte

Client code, always 2


intClient features mask array lengthSince version 2.0.0 (new field)
byte[]Client features mask arraySince version 2.0.0 (new field)
MapUser attributesSince version 1.7.0
StringUsername (optional)Since version 1.1.0
StringPassword (optional)Since version 1.1.0


Response (success)
byteSuccess flag, 1
intServer features mask array lengthSince version 2.0.0 (new field)
byte[]Server features mask arraySince version 2.0.0 (new field)
UUIDNode idSince version 1.4.0

Features mask - it's an array, where some bit is set if the feature with the corresponding id is supported. 

For compute tasks execution following future should be introduced:

Featureid
EXECUTE_TASK_BY_NAME0

So, bit 0 should be set on features mask if the client supports this feature.

Client-side API (java thin client)

All task-related operations should be started using a new ClientCompute interface. To get an instance of this interface new methods should be added to IgniteClient interface:

IgniteClient
    public ClientCompute compute(ClientClusterGroup grp);
    public ClientCluster cluster();

Proposed ClientCompute interface:

ClientCompute
public interface ClientCompute {
    public ClientClusterGroup clusterGroup();
    // Sync and async task execution methods.
    public <T, R> R execute(String taskName, @Nullable T arg) throws ClientException, InterruptedException;
    public <T, R> Future<R> executeAsync(String taskName, @Nullable T arg) throws ClientException;
    // ClientCompute modificators.
    public ClientCompute withTimeout(long timeout);
    public ClientCompute withNoFailover();
    public ClientCompute withNoResultCache();
}

Async execution methods should return Future. Using this Future users can get task results or cancel the task.

Risks and Assumptions

If blocking IO is used (as in java thin client, for example) it's impossible to implement async operations without dedicated for each channel thread to process incoming messages. So, the count of threads on the client-side will be increased and can be raised dramatically if partition awareness functionality with a lot of server connections is used.

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/Thin-client-compute-support-td44405.html

Reference Links

// Links to various reference documents, if applicable.

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels