...
|
Anti-Goals
What is outside the scope of what the proposal is trying to solve?
Solution
Gw sender failover
N/A
Solution
Current status of the solution is located on this PR:We have implemented a solution for this issue in the following commit: https://github.com/apache/geode/pull/4713/commits/f896f04df291246d420cab88b660fc9736fca49b
There is just one test failing (testExecuteOp from ConnectionPoolImplJUnitTest) that causes integration test and stress test tasks to fail. The tests works locally, only fails in concourse.We are working to fix it.
Gw sender failover
Solution consists on refactoring some maps on LocatorLoadSnapshot class. They use ServerLocation objects as key, this has to change due to it will not be unique for each server. We changed the maps to use InternalDistributedMember objects as key for the map entries. The ServerLocation information is not lost, as it is contained in the entry value for all the maps.
...
Gw sender pings not reaching gw receivers
We think the reason behind the issue of the pings is the way they are created. When a new server connection is established, a new PingTask is created and will be in charge of running the PingOp:
...
When PingTask are run by LiveServerPinger, they call PingOp.execute(ExecutablePool pool, ServerLocation server). PingOp only uses hostname and ip (ServerLocation) to get the connection to send the ping message. As all receivers are sharing the same host and port, it is not guaranteed that the connection is really pointing to the server we want to connect.
We have added a new method PingOp.execute(Executable pool, Endpoint endpoint) to solve this. In this way, if the connection obtained is not pointing to the required Endpoint, it can be discarded an ask for a new one.
Changes and Additions to Public Interfaces
If you are proposing to add or modify public interfaces, those changes should be outlined here in detail.N/A
Performance Impact
Do you anticipate the proposed changes to impact performance in any way? Are there plans to measure and/or mitigate the impact?
Backwards Compatibility and Upgrade Path
Will the regular rolling upgrade process work with these changes?
How do the proposed changes impact backwards-compatibility? Are message or file formats changing?
When getting the connection to execute the ping, some retries could happen until the right connection is obtained so this operation will take longer, but we do not think it will impact performance.
Backwards Compatibility and Upgrade Path
N/AIs there a need for a deprecation process to provide an upgrade path to users who will need to adjust their applications?
Prior Art
After checking with the dev mailing list, we received the suggestion to configure serverAffinity in Kubernetes to solve the issue with the pings. We tried but
FAQ
Answers to questions you’ve commonly been asked after requesting comments for this proposal.
Errata
, but that option broke the failover of gw senders when a gw receiver is down.
FAQ
TBD
Errata
N/AWhat are minor adjustments that had to be made to the proposal since it was approved?