You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

To be Reviewed By: 14 March 2022

Authors: Mario Ivanac

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

When several gateway receivers have the same value for hostname-for-senders (for example when running Geode under kubernetes and a load balancer balances the load among the remote servers), it has been observed that number of connections in GW senders pool used for sending ping message is much greater then number of dispatcher threads, although in this case only one connection could be used (since destinations have same address and port ). There are 2 reasons for this behavior:

  1. Since all Ping tasks are triggered in parallel, generally each task will request same connection at the same time, and we will create new connection for each task.
  2. For example if we have configured only one dispatcher thread on local server, and remote site have five servers, pool will try to ping for example server0. It will open new connection trying to reach server0 but (because all servers are sharing VIP:PORT) will probably get some other, let's say server1. It will use this connection to ping server0 and distributed ping functionality will do the magic on the receiving side. However, GW sender pool will notice it has new endpoint now - server1 - and it will want to ping it as well. So, now we need to ping server0 and server1. And finally we could end up pinging all 5 servers, and actually we only need server0.

Anti-Goals


Solution

Solution for described problems is:

  1. introduce configurable option to gradually activate pinging toward destination. This can be accomplish by increasing initial delay of each ping task.
  2. For Ping task (which as prerequisite has defined destination endpoint), when sending ping message, in case connected endpoint is different than the destination endpoint, don't register this new endpoint.

PR with proposed solution: https://github.com/apache/geode/pull/7347

Changes and Additions to Public Interfaces


Performance Impact

No impacts.

Backwards Compatibility and Upgrade Path

No impacts.

Prior Art

What would be the alternatives to the proposed solution? What would happen if we don’t solve the problem? Why should this proposal be preferred?

FAQ

Answers to questions you’ve commonly been asked after requesting comments for this proposal.

Errata

What are minor adjustments that had to be made to the proposal since it was approved?

  • No labels