Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current stateDone

Released: 4.5.0

Problem

It is likely that in some Bookkeeper clusters, we’ll have machines with different hardware capabilities. Due to the need for expanding capacity in a cluster, it is possible that we end up having bookies of different SKUs(No. of CPUs, memory/storage capacity, etc) in a single cluster. In such environments, we can have a policy that makes sure that bookies with smaller storage capacity don’t run out of disk space and turn read-only quickly. Instead we want the storage usage on those bookies to grow at a pace that is proportional to their capacity. Another policy could be to select bookies that are experiencing lower system load more often during ledger creation, than the ones that are experiencing higher load. Yet another approach could take each bookies network bandwidth into account and use it as a weight. 

...

Operations like auto recovery could cause imbalance if the weight is not taken into account. During recovery, bookies get hold of an under replicated ledger and copy data to themselves from other bookies. Since the bookies use bookkeeper client libraries to do the read and write, they should have access to the free disk space usage on all the bookies. One simple solution would be that bookies with lower weight pause after replicating a ledger. Whereas the bookies that have higher weight will take shorter pauses or no pauses at all. This guarantees that bookies that have more free disk space will end up copying more under replicated ledgers to themselves.

1.4 Implementation

1.4.1 Ensemble Placement Policy

EnsemblePlacementPolicy is an interface between BookieWatcher and the various placement policies. This interface will be enhanced and a new method will be added called 'updateBookieInfo()' to provide the up to date mapping of bookies to their metrics such as free disk space, load, etc. 

The following ensemble placement policies are supported in Bookkeeper. We describe how this proposal affects these policies:

  1. DefaultEnsemblePlacementPolicy: Changes will be made to newEnsemble() and replaceBookie() interfaces to select the bookies based on their free disk space based weight when the configuration suggests that weight based placement is to be honored. 
  2. RackawareEnsemblePlacementPolicy: Enhancements will be made to selectRandomFromRack() and selectRandomInternal() to take into account the weights of the bookies. No changes will be made to the rack selection logic. But once a rack has been selected the bookie selection within that rack will be based on disk weight. 
  3. RegionAwareEnsemblePlacementPolicy: No changes are needed to the region selection logic. In this policy  once a region is selected, the bookie selection is done using RackawareEnsemblePlacementPolicy. 

1.4.2 BookieInfo

The following protocol message is being added to retrieve the BookieInfo details. This message can be extended to retrieve other information such as network bandwidth in the future.

message GetBookieInfoRequest {
enum Flags {
TOTAL_DISK_CAPACITY = 0x01;
FREE_DISK_SPACE = 0x02;
}
// bitwise OR of Flags
optional int64 requested = 1;
}
message GetBookieInfoResponse {
required StatusCode status = 1;
optional int64 totalDiskCapacity = 2;
optional int64 freeDiskSpace = 3;
}

1.4.3 BookieInfoReader

An instance of BookieInfoReader is created to read periodically the bookie information such as free disk space, bookie load etc. This singleton is instantiated once per Bookkeeper object. It uses a SingleThreadedExecutor to query the bookie info from all the bookies. It then communicates the updated information to the ensemble policies via the updateBookieInfo() interface described above. Changes in the number of bookies in the cluster also triggers this operation. The event is triggered from BookieWatcher singleton.

1.5 Alternatives and Strategies

1.

...

5.1 Using zookeeper for collecting free disk space info:

One of the alternative mechanisms considered for getting the free disk space info from the bookies was to use zookeeper for storing the free disk space info. This is how it would work:

...

When there are 100s or 1000s of bookies in the cluster and an equivalent number of clients, the load on zookeeper could be very high. Hence we decided against this approach.

1.

...

6 External Interface

1.

...

6.1 External APIs exposed

The following new client configuration parameters will be exposed:

...