IDIEP-68
Author
Sponsor
Created

  

Status

DRAFT


Motivation

Thin clients need an efficient way to stream large amounts of data into the cluster.

Description

Add DataStreamer operations to the Thin Client protocol: OP_DATA_STREAMER_START, OP_DATA_STREAMER_ADD_DATA

There are multiple options for the client-side implementation with this approach, from simple to more efficient:

  • Stateless -- all data goes though a single server node, only OP_DATA_STREAMER_START is used to write the batch and close the streamer until next batch is ready
  • Stateful -- all data goes though a single server node, and streamer is kept open
  • Partition-aware stateless – data is grouped by node and batches are sent to the primary, new streamer is used for every batch
  • Partition-aware stateful – data is grouped by node and batches are sent to the primary, streamer per node is kept open


OP_RESOURCE_CLOSE can be used to close the streamer, as well as Close flag, depending on the use case:

  • Cancel and close - use OP_RESOURCE_CLOSE
  • Flush and close - use OP_DATA_STREAMER_ADD_DATA with Close flag (to avoid an extra OP_RESOURCE_CLOSE call)

OP_DATA_STREAMER_START = 8000

Initial operation combines streamer options and the first batch of entries.

Request
intcacheId
byteflags (allowOverwrite, skipStore, keepBinary, flush, close)
intperNodeBufferSize, -1 for server default
intperThreadBufferSize, -1 for server default
BinaryObjectStream receiver
bytereceiverPlatform, when receiver is not null (1 = Java, 2 = .NET, 3 = C++) 
intentryCount
n*(Object, Object)entries (add when value is not null, remove otherwise)


Response
longresourceId (0 when close flag is set)

Details

  • Close flag can be true when there is only a single batch, so an additional close request is not necessary
  • Flush flag should be true when client-side user code calls Flush method, and false otherwise

OP_DATA_STREAMER_ADD_DATA = 8001

Add data to the existing streamer by a resource id, optionally flush and/or close the streamer.


Request
longresourceId
byteflags (flush, close)
intentryCount
n*(Object, Object)entries (add when value is not null, remove otherwise)


Response
longresourceId (0 when close flag is set)

Details

  • Close flag can be true for the last batch, so an additional close request is not necessary
  • Flush flag should be true when client-side user code calls Flush method, and false otherwise

Risks and Assumptions

  • Unlike existing thick streamer API, we are not going to allow changing options (allowOverwrite, etc) after the start. This behavior seems confusing. Every client-side implementation can decide on the API, but it makes sense to remove setters from the DataStreamer interface and move all the options to a separate type, like DataStreamerOptions, and pass this once to igniteClient.dataStreamer(cacheName, options).
  • Buffer sizes can be matching or different on client and server sides.
    • Example 1: per-node buffer size is the same on partition-aware client and server. When client flushes the buffer, it gets flushed on the server right away.
    • Example 2: client-side buffer is small due to resource constraints, server-side buffer is bigger for better batching and performance.
  • Client API can expose both server-side and client-side buffer sizes as configuration parameters, or choose to hide them for simplicity

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/IEP-68-Thin-Client-Data-Streamer-td51622.html

Reference Links

PoC: https://github.com/apache/ignite/pull/8847

Tickets


key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels