Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Operations like auto recovery could cause imbalance if the weight is not taken into account. During recovery, bookies get hold of an under replicated ledger and copy data to themselves from other bookies. Since the bookies use bookkeeper client libraries to do the read and write, they should have access to the free disk space usage on all the bookies. One simple solution would be that bookies with lower weight pause after replicating a ledger. Whereas the bookies that have higher weight will take shorter pauses or no pauses at all. This guarantees that bookies that have more free disk space will end up copying more under replicated ledgers to themselves.

1.4 Implementation

1.4.1 Ensemble Placement Policy

EnsemblePlacementPolicy is an interface between BookieWatcher and the various placement policies. This interface will be enhanced and a new method will be added called 'updateBookieInfo()' to provide the up to date mapping of bookies to their metrics such as free disk space, load, etc. 

The following ensemble placement policies are supported in Bookkeeper. We describe how this proposal affects these policies:

  1. DefaultEnsemblePlacementPolicy: Changes will be made to newEnsemble() and replaceBookie() interfaces to select the bookies based on their free disk space based weight when the configuration suggests that weight based placement is to be honored. 
  2. RackawareEnsemblePlacementPolicy: Enhancements will be made to selectRandomFromRack() and selectRandomInternal() to take into account the weights of the bookies. No changes will be made to the rack selection logic. But once a rack has been selected the bookie selection within that rack will be based on disk weight. 
  3. RegionAwareEnsemblePlacementPolicy: No changes are needed to the region selection logic. In this policy  once a region is selected, the bookie selection is done using RackawareEnsemblePlacementPolicy. 

1.4.2 BookieInfoReader

An instance of BookieInfoReader is created to read periodically the bookie information such as free disk space, bookie load etc. This singleton is instantiated once per Bookkeeper object. It uses a SingleThreadedExecutor to query the bookie info from all the bookies. It then communicates the updated information to the ensemble policies via the updateBookieInfo() interface described above. Changes in the number of bookies in the cluster also triggers this operation. The event is triggered from BookieWatcher singleton.

1.5 Alternatives and Strategies

1.

...

5.1 Using zookeeper for collecting free disk space info:

One of the alternative mechanisms considered for getting the free disk space info from the bookies was to use zookeeper for storing the free disk space info. This is how it would work:

...

When there are 100s or 1000s of bookies in the cluster and an equivalent number of clients, the load on zookeeper could be very high. Hence we decided against this approach.

1.

...

6 External Interface

1.

...

6.1 External APIs exposed

The following new client configuration parameters will be exposed:

...