...

Rebalancing relocates data from heavily loaded members to lightly loaded members. Currently Geode only supports manual rebalancing by issuing a gfsh command or a java function call. In most cases, the decision to rebalance is based on the data distribution in the cluster and max memory configuration of the members. As Geode monitors the data size, it can also automatically trigger rebalancing. Auto-balancing is expected to periodically will redistribute data-load in the cluster and load periodically and prevent conditions leading to failures.

Configurable size threshold to qualify system as unbalancedoff-balanced
Configurable data-distribution skew to trigger rebalance
Reuse existing manual-rebalancing flow
Minimize the impact on concurrent operations caused by continuous rebalancing
1. Configurable schedule
2. Ability to disable auto-balancing
Ability to plug a custom AR manager

...

The user can schedule a cron job to invoke the gfsh rebalance command on a periodic basis.

Description

Unhealthy-Member: A member is unhealthy if its heap is critical (included in ResourceAdvisor().adviseCriticalMembers())

Acceptor-member: A member can accept a new bucket if it has enough memory available to host the bucket, i.e. totalBytes + newBucket.getBytes() << localMaxMemory

Transfer-size: The total number of bytes that will be transferred during a rebalance operation

How is load defined?

Load on a member is a function of

Total number of buckets Total number of membershosted on the member
Number of primary buckets on the member
Number of secondary buckets on the member
Size of the buckets
Maximum memory

When is a

...

cluster off-balance?

A member is unhealthy or heavily loaded if

its heap is critical (included in ResourceAdvisor().adviseCriticalMembers())
the node is misconfigured, for e.g. max memory is not sufficient to host even one bucket

When is a member lightly loaded?

if the member has enough memory, i.e. totalBytes + newBucket.getBytes() << localMaxMemory

...

if transfer-size is more than X% of the total data size, rebalance can result in a consistent data distribution and create comparable free space on all nodes
if some nodes in the cluster are heavily loaded while most other nodes are free. , percentage heap utilization is much higher than other members in the cluster
if the cluster is not running at configured redundancy levels
if distributing 10% of the data can result in a consistent data distribution and create comparable free space on all nodes
Or any unhealthy node exists in the cluster

Where can a bucket be moved?

For a bucket B, if there is a lightly loaded member which is not hosting B

Use Cases

Adding a node to a existing cluster after loading data. In this case the new node will be lightly loaded and may not participate in data servinghost any buckets. In this scenario the total number of bytes rebalanced may not be a lot.
Node recovery after a few node failures. In this case some buckets may not have enough redundancy or primary ownership may be limited to a few nodes only.
transfer-size will be small percentage of total data size. Auto-rebalance will detect a skew and will take action
After node failure and recovery, gfsh command "rebalance -simulate" reports a high transfer-size. In this case, the nodes may have comparable utilization, but a rebalance would result in a uniform region data distribution. So action would be taken
Over time, some buckets may grow much larger than other buckets in the region. Or some regions may grow more than others. Rebalance would get triggered, resulting in a uniform distribution
Some buckets may grow up much larger than others

Design

We would like to implement this as an independent module without modifying existing code, so that it can be easily applied to any version of the system. To enable auto-balancing, the user will place the auto-balance jar on their classpath and add an initializer to their cache.xml. The initializer will provide the following configuration

Schedule - cron string: In order to minimize the impact on concurrent operations, we feel it’s important to provide the user with the ability to configure the frequency and timing of automatic rebalancing. Bucket movement does add load to the system and in our performance tests we can see that the throughput of concurrent operations drops during bucket movement. A user is expected to configure off-peak hours for rebalancing. So a schedule based on cron like configuration is useful.
rebalance operation may move
Size-threshold-percent - int between 1 and 99: Rebalancing will be triggered if the
total number of bytes
transfer-size is more than this threshold
,
. This threshold is the percentage of the total data size. Rebalance operation computes transfer size based on relationship between regions, primary ownership and redundancy.
Rebalancing could be harmful when the cache is initially being populated, because bucket sizes may vary wildly when there is very little data. Because of that, we will also provide a threshold before automatic rebalancing will kick in.

E.g.

<cache>

...

 <initializer>

  <!-- Optional auto-rebalance manager -->

  <class-name> com.gemstone.gemfire.cache.util.AutoBalancer </class-name>

  <!-- Optional. Default: Once a week on Saturday. E.g. check at 3 am every night -->

  <parameter name=”schedule”> 0 0 3 * * *? </parameter>

  <!-- Optional auto-rebalance manager -->

  <class-name> org.apache.geode.rebalance.AutoBalance </class-name>

  <!-- Optional. Default: 20%. E.g. Don’t rebalance until the variationtransfer in size between members is more than 10% of the total data size -->

  <parameter name=”unbalance”size-threshold-percentage”>percent”> 10 </parameter>

  <!-- Optional. Default: 50%. E.g. Don’t rebalance a region until at least one member is 50% full -->

  <parameter name=”size-threshold-percentage”> 50 </parameter>

 </initializer>

...

</cache>

We only want one member to be automatically rebalancing a given region. So each member that starts auto rebalancing will try to get a distributed lock. If the member obtains the lock it will do the auto rebalancing until the member diesrebalance completes. Otherwise it continue to wait for the lock to become availablenext cycle and repeat.

At the scheduled interval the AR auto-balancer will check the balance of the system. It will do that by calling PartitionRegionHelper.getPartitionRegionInfo and fetching the size of all of the regions in bytes from all members. It will sum the colocated regions together (like rebalancing does).

Note that this means there is a limitation that members configured with the auto rebalancer have all of the regions defined, because otherwise some regions may not be rebalanced.

...

Space shortcuts

Page tree

Versions Compared

Old Version 7

New Version 8

Key

Description

How is load defined?

When is a member lightly loaded?

Where can a bucket be moved?

Use Cases

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 7

New Version 8

Key

Description

How is load defined?

When is a member lightly loaded?

Where can a bucket be moved?

Use Cases