Manager Functionality

The Accumulo manager currently has the following management functionality.

Primary functionSub functionStoragePrimary KeyDescription
Tablet server heart beat


Periodically pings all tablet servers and gathers stats that are aggregated for the monitor.  This entire functionality needs to be reevaluated.
Tablet managementFind tablets needing attentionZK/RT/MT<table id>;<end row>

Finds tablets that need split, compaction, assignment, assignment, unassignment, are assigned to a dead tserver.  This functionality already scales out in that this search is run in parallel in tablet server iterators.

Assign and balance tabletsZK/RT/MT<table id>;<end row>The balancer plugin is currently called in the manager process to get tablet assignments and tablet migrations.  This plugin is given per tablet sever stats. The manager tracks active migrations in memory. The manager makes RPCs to tablet servers to load and unload tablets and also writes location information to the metadata table.
Compaction coordinationjob priority queuesMemory<queue id>:<priority>Stores compaction jobs in multiple priority queues used for compactors to find work
reservation/commitZK/RT/MT<table id>;<end row>Atomically reserves and commits files in compaction job for a tablet on behalf of compactors (maybe compactors could commit their self?)
refreshZK/RT/MT~refresh<???>Ensures that after a system compaction happens that hosted tablets are reliably asked to reload their metadata
FATEExecutionZK<64bit int>Reliably executes (even in the case of process faults) a multistep internal operation in Accumulo.  Each fate transaction has a random 64bit id and in zookeeper it stores an operation stack under this id.
Table locksZK<table id>Some fate transactions acquire a table lock that give it exclusive access to the table.  This lock is stored in ZK for the lifetime of the fate transaction.
Service client requestmanage FATEsZK<64bit int>The internal implementation of many Accumulo client operations use common RPCs to initiate and get the status of FATE ops.
Modify ZKZKvarious(todo expand?)There are some client request RPC the manager handles that are not FATEs.
AdvertisementZK<host>:<port>The manager process advertises itself in ZK.
Upgrade
ZK/RT/MTvariousUpdates internal metadata in ZK/RT/MT when a new version of Accumulo starts for the first time.
GCcandidate collectionZK/RK/MT~del<hash><file>Loads a batch of candidate files to delete (todo use % for GC batch size) into memory

candidate use detectionZK/RK/MT<table id>Scans all in use files removing them from the in memory candidate set

file deletionDFS<file path>Deletes any candidates that are in use.  Currently a thread pool is used for this to process the deletions in parallel in the single process.

Accumulo's management functionality deals with meta data, meta operations (like creating a table), and managing the worker processes.  GC and the Monitor have historically been separate processes from the manger process, however their functionality is definitely related to meta data and meta operations in Accumulo.  Since a complete redesign of the manager is being considered, the GC and monitor are included as management functional components for consideration.  The non management components of Accumulo are those that deal directly with data and are the worker processes (compactors, tablet servers, and scan servers).  

The functional management components have the following dependencies on each other.

FunctionDependencyDescription
UpgradeTablet managmentThe upgrade process relies on tablet management to assign tablets during the upgrade process
Tablet managmentUpgradeThe tablet management can not begin to bring a data level of Accumulo online until it has been upgraded
FATETablet managmentSome FATE operations (like split, merge, and delete table) need tablets to be unassigned during the fate transaction. These operation rely on the tablet management functionality to do this.  The merge code is currently too tightly coupled with tablet management, it would be nice to have it less tightly coupled like the new split code.
service client requestFATEMany (todo use % for GC batch size) client request to the manager initiate and check on FATEs
Monitortablet server heart beatThe monitor ask the manger for statistics about the current tservers.
Tablet managementtablet server heart beatThe balancer plugin is given stats gathered from the tablet server heart beat

Multiple active manager processes

Warning, this is an incomplete idea of how to scale the manager and should not be viewed as a definite way forward. Its a proposal that needs review.

For scalability, Accumulo has a need for multiple processes executing management functionality.  One way to achieve this to take each of the functional components above and to make them an independently scalable service inside of Accumulo.  These independent services would have a internal APIs (not SPI or public API) that dependent services could use. For example the Tablet Management service in Accumulo could have an internal API for requesting tablet unassignment that the split, merge, and delete table fate operations running FATE could call.  The callers of this internal API would not know or care how or where that functionality is processed.  It could be processed by tablet management code running in another manager process.

There are two advantages to making these internal functionality scale independently. 

  1. Different functionality may benefit from being able to partition its data differently among multiple processes.  For example tablet management and FATE may want to partition work very differently.
  2. Different functionality may benefit from being able to have different numbers of processes.  For example maybe 10 GC processes are needed and 1 client service handler.

In implementation this would look like the following for an Accumulo instance with a single manager process. Key : GC=Garbage Collection, TM=tablet management, CC=compaction coordinator

Single manager process

An Accumulo system with multiple manager processes could look like the following. Maybe the monitor functionality is not a scalable internal service, so it just runs in one of the manager processes.

multiple manager processes

Accumulo processes are started with a resource group option.  These internal management services could be configured to run inside a resource group.   For example FATE could be configured for resource group RG2, GC for resource group RG3, and everything else for resource group RG1.  This would look like the following.

Managers running in resource groups

Next steps

This design may not actually be workable.  The following needs to be done to further explore this concept and determine if it could be implemented.

  1. For each functional service explore in more detail how it would scale
  2. For each functional service define everything it would need to offer to other functional services and how this might be implemented.
  • No labels