Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Any proposed solution to the problem that did not use the existing rebalance logic would have to reimplement large and complicated areas of code in order to correctly create redundant copies on members. One possible other solution that would use the existing rebalance logic would be to provide additional arguments to the existing rebalance operation to prevent moving buckets and prevent moving primaries. Given that the rebalance operation is already complicated, and that it could be confusing from a user perspective to use the name “rebalance” for an operation that is not actually balancing any data load, this solution was rejected in favour of creating a new, specific operation to restore redundancy.

Errata

...

The section describing the Status enum should now read: 

The Status returned by the RestoreRedundancyResults will be FAILURE if at least one bucket in one region has fewer than the configured number of redundant copiesERROR if the restore redundancy operation failed to start or encountered an exception and SUCCESS otherwise.

This change raises the threshold for what is considered a successful operation from one that results in any level of redundancy for all regions to one that results in fully satisfied redundancy for all regions.


The section describing the success/error status of the restore redundancy gfsh command should now read:

The command will return success status if:

  • Redundancy is fully satisfied for all regions that were included, either explicitly or implicitly.
  • No partitioned regions were found and none were explicitly included.

The command will return error status if:

  • At least one bucket in a region has zero redundant copies, and that region has redundancy configured.
  • At least one bucket in a region has fewer than the configured number of redundant copies.
  • At least one of the explicitly included partitioned regions is not found.
  • There is a member in the system with a version of Geode older than 1.13.0 (assuming that is the version in which this feature is implemented).
  • The restore redundancy function encounters an exception.

This change brings the gfsh command output in line with the Status returned by the RestoreRedundancyResults.


The RestoreRedundancyBuilder interface should now be: 

public interface RestoreRedundancyBuilder {
  RestoreRedundancyBuilder includeRegions(Set<String> regions);

  RestoreRedundancyBuilder excludeRegions(Set<String> regions);

  RestoreRedundancyBuilder setReassignPrimaries(boolean shouldReassign);

  CompletableFuture<RestoreRedundancyResults> start();

  RestoreRedundancyResults redundancyStatus();
}

This change allows for more easily understandable code.


The --dont-reassign-primaries argument should be renamed --reassign-primaries throughout. The default value of the argument will be true, so the behaviour described in the RFC will be unchanged.

This change brings the gfsh arguments in line with the RestoreRedundancyBuilder interface. 


The backwards compatibility and upgrade path section should now read:

Members running older versions of Geode will not be able to execute the restore redundancy function, so if any such members are detected in the system, the gfsh commands will fail to start and return an error status.

This change takes into account the fact that it will be possible for individual members to successfully start a restore redundancy operation regardless of the other members in the system, but that attempting to send a function to an older member during executing of the gfsh command will result in an exception.

References

Anchor
ref1
ref1
[1] https://issues.apache.org/jira/projects/GEODE/issues/GEODE-4250 

...