Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Table of Contents

Motivation

// Define the problem to be solved.

Description

...

TCP connections can enter half-open state: seems to be alive, but any attempt to send data will fail. Long-living and mostly idle connections are especially susceptible to this behavior.

Retry mechanism (IEP-82 Thin Client Retry Policy) in thin client implementations partially mitigates the issue. However, not all operations are safe to retry, and reconnect affects performance.

To improve the connection stability and detect failures early we can add a keep-alive mechanism.

Description

Why not TCP keepalive

TCP has a built-in keepalive mechanism, but it has some disadvantages:

  • Optional (may not be present in some TCP stacks)
  • May not be handled well by some routers (RFC 1122, section 4.2.3.6)
  • Default timeout is too long (2 hours), and is problematic to adjust on SDK versions that are in use in Ignite (Java 8, .NET Standard 2.0), or hard to do right in some languages (Python, JS).

Because of that, some protocols implement keepalive logic on a higher level (SMB, TLS). More details: https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html

Proposal

Add OP_HEARTBEAT to the protocol with an empty payload. Clients can send heartbeats at a configurable interval and receive responses to ensure that the connection is active.

Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

...