You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

IDIEP-83
Author Pavel Tupitsyn 
Sponsor Pavel Tupitsyn 
Created

  

StatusDRAFT


Motivation

TCP connections can enter half-open state: seems to be alive, but any attempt to send data will fail. Long-living and mostly idle connections are especially susceptible to this behavior.

Retry mechanism (IEP-82 Thin Client Retry Policy) in thin client implementations partially mitigates the issue. However, not all operations are safe to retry, and reconnect affects performance.

To improve the connection stability and detect failures early we can add a keep-alive mechanism.

Description

Why not TCP keepalive

TCP has a built-in keepalive mechanism, but it has some disadvantages:

  • Optional (may not be present in some TCP stacks)
  • May not be handled well by some routers (RFC 1122, section 4.2.3.6)
  • Default timeout is too long (2 hours), and is problematic to adjust on SDK versions that are in use in Ignite (Java 8, .NET Standard 2.0), or hard to do right in some languages (Python, JS).

Because of that, some protocols implement keepalive logic on a higher level (SMB, TLS). More details: https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html

Proposal

Add OP_HEARTBEAT to the protocol with an empty payload. Clients can send heartbeats at a configurable interval and receive responses to ensure that the connection is active.

This applies to Ignite 2.x and 3.x.

Risks and Assumptions

  • New ProtocolBitmaskFeature will be added to maintain protocol compatibility.
  • TODO: Should we set heartbeat interval automatically according to ClientConnectorConfiguration#idleTimeout?

Discussion Links

Reference Links

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels