You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

Rebalancing relocates data from heavily loaded members to lightly loaded members. Currently Geode only supports manual rebalancing by issuing a gfsh command or a java function call. In most cases, the decision to rebalance is based on the size of the member. As Geode monitors the data size, it can also automatically trigger rebalancing. Auto-Rebalancing (AR) is expected to prevent failures cased by unbalanced cluster more than manual rebalance.

Requirements

  1. Configurable threshold to qualify system as unbalanced
  2. Minimize the impact on concurrent operations caused by continuous rebalancing
    1. Configurable schedule
    2. Ability to disable AR
    3. Minimum size qualification
  3. Ability to plug a custom AR manager

Alternatives

The user can schedule a cron job to invoke the gfsh rebalance command on a periodic basis.

Description

We would like to implement this as an independent module without modifying existing code, so that it can be easily applied to any version of the system. To enable AR, the user will place the auto-rebalance jar on their classpath and add an initializer to their cache.xml. The initializer will provide the following configuration

  1. Schedule: In order to minimize the impact on concurrent operations, we feel it’s important to provide the user with the ability to configure the frequency and timing of automatic rebalancing. Bucket movement does add load to the system and in our performance tests we can see that the throughput of concurrent operations drops during bucket movement. Schedule can be provided as a cron string. Default: Once a week on Saturday
  2. AR manager: the class to check the system state and trigger rebalance. Default: AutoRebalance
    1. Threshold: Rebalancing will be triggered if the load difference between the most-loaded and least-loaded member is more than this threshold. Default: 20%
    2. Rebalancing could be harmful when the cache is initially being populated, because bucket sizes may vary wildly when there is very little data. Because of that, we will also provide a threshold before automatic rebalancing will kick in.

E.g.

<cache>
...
 <initializer>
  <!-- Optional. Default: Once a week on Saturday. E.g. check at 3 am every night -->
  <parameter name=”schedule”> 0 0 3 * * * </parameter>
 
  <!-- Optional auto-rebalance manager -->
  <class-name> org.apache.geode.rebalance.AutoBalance </class-name>
 
  <!-- Optional. Default: 20%. E.g. Don’t rebalance until the variation in size between members is more than 10% -->
  <parameter name=”unbalance-percentage”> 10 </parameter>
 
  <!-- Optional. Default: 50%. E.g. Don’t rebalance a region until at least one member is 50% full -->
  <parameter name=”size-threshold-percentage”> 50 </parameter>
 </initializer>
... 
</cache>

Design

We only want one member to be automatically rebalancing a given region. So each member that starts auto rebalancing will try to get a distributed lock. If the member obtains the lock it will do the auto rebalancing until the member dies. Otherwise it continue to wait for the lock to become available.

Note that this means there is a limitation that members configured with the auto rebalancer have all of the regions defined, because otherwise some regions may not be rebalanced.

At the scheduled interval the AR will check the balance of the system. It will do that by calling PartitionRegionHelper.getPartitionRegionInfo and fetching the size of all of the regions in bytes from all members. It will sum the colocated regions together (like rebalancing does). For each region, it will then compute

(largest_member_size - smallest_member_size) / largest_member_size. 

If that is greater than the threshold, it will invoke a rebalance on that region.

Testing

We will need to add auto rebalancing to some existing tests and give it a schedule that will cause it to run during the test. We will also need to write unit tests for the rebalancing triggering and scheduling logic.

Limitations

  • Only regions that are defined on the auto rebalancer node will be rebalanced. Users can add accessors if there is a region they want to make sure gets rebalanced but is not available everywhere.
  • Rebalancing always recovers redundancy, moves buckets, and moves primaries. This means that when the rebalancer kicks in, redundancy will be recovered, regardless of the settings for recovery-delay.
  • There is no way to disable or modify the automatic rebalancing without restarting members, since the configuration is part of the member configuration.
  • No labels