Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Background

On-Demand tablet hosting and Resource Groups, described in A More Cost Efficient Accumulo, are two new features that aim to lower the cost of using Accumulo and increase efficiency of compute resources. On-Demand tablet hosting could allow users to reduce their overall TabletServer footprint if they determine that they do not need immediate access to the data in the tablets of a table. Resource Groups will enable users to assign workloads for tables to specific groups of compute resources. Metrics will enable third-party schedulers (e.g. HPA, KEDA, etc.) to dynamically scale the compute resources within a resource group as demand changes. These features will require that users analyze their application requirements and perform some level of resource planning.

Note

Given the analysis and planning required, we assume that resource group definitions will change infrequently. 


draw.io Diagram
bordertrue
diagramNameResourceGroupsCurrent
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth496
revision1

The diagram on the left is an example layout of an Accumulo instance with three resource groups.

  • The resource group on the left is the default resource group, which is mandatory. The default resource group must, at a minimum, contain one TabletServer and one Compactor to host the root and metadata tables. Tablets for other tables are hosted in the TabletServers in the default group unless configured otherwise*. Applications using the default resource group can insert data via BatchWriter or BulkImport and can access their data using the IMMEDIATE consistency level on the Scanner. Because there are TabletServers in this group, users can assign ranges of their tables to always be hosted, instead of being hosted on-demand (new default behavior in 4.0). The Compactors in this resource group will perform major compactions for all tables that have not been configured to use a different group**.

  • The resource group in the middle contains no TabletServers, only ScanServers and Compactors. Users would configure a group like this when their application does not need immediate consistency for their data. Applications using a resource group like this would only insert data via BulkImport and would access their data using the EVENTUAL consistency level on the Scanner***. Tablets will never be hosted in this resource group because there are no TabletServers. The Compactors in this resource group will perform major compactions for all tables that have been configured to use this group.

  • The resource group on the right contains a mix of TabletServers and ScanServes. Users would configure a group like this when their application requires immediate consistency in some cases and eventual consistency in others.

* See A More Cost Efficient Accumulo, Resource Groups, bullet #1
** See A More Cost Efficient Accumulo, Resource Groups, bullet #3
*** See A More Cost Efficient Accumulo, Resource Groups, bullet #2


----

The Single Manager Problem

With the introduction of On-Demand tablets most tablet management functions (split, merge, delete, compact, etc.) had to be moved out of the TabletServer and into the Manager. Prior to Accumulo 4.0 TabletServers were constantly scanning the tablets under their control looking for what tablets needed to be compacted, split, etc. This activity was highly parallelized.  Moving this functionality to the Manager, there is now a single process continually scanning all metadata looking for work, using the TabletManagementIterator on the metadata tablets to try and offload the computation as much as possible. User initiated tablet management functions are still FaTE operations but they run within the Manager instead of being delegated to the TabletServer hosting the tablet. Additionally, the CompactionCoordinator functionality was moved into the Manager. Because of the increased tablet management, FaTE, and CompactionCoordinator workloads there is a concern that a single Manager will not be able manage an Accumulo instance like it could in earlier Accumulo versions.

----


Multiple Managers - Attempt #1

PR #3262 is an incomplete solution for allowing multiple active Manager processes. One of the Manager's would grab the lock, making it the primary Manager, and be responsible for executing FaTE operations and being the CompactionCoordinator. The primary and other Managers would determine which tables they are responsible for managing via some stable algorithm that recomputes the distribution as Managers are added and removed. Recent comments on PR #3262 suggest that maybe the primary manager should calculate the distribution of tables/tablets and publishes it for other Managers to consume.

Positives with Attempt #1

  1. With the primary Manager only advertising the FATE and CompactionCoordinator Thrift services, clients only need to communicate with a single Manager.

Negatives with Attempt #1

  1. FATE and CompactionCoordinator operations are not currently distributed, although there is a WIP PR (3964) for distributing FATE.
  2. As noted, the current distribution algorithm may not be stable, depending on the final implementation it could introduce a layer of complexity with a ZooKeeper or RPCs.
  3. There is an issue with the Monitor and multiple Managers, does the Monitor server need to communicate with all of them?
  4. Do we also have a single GC problem? We have talked about partitioning the GC process, potentially by namespace, to reduce memory pressure on the GC process and to make it faster.

Info

If the FATE distribution problem is solved, then it may be possible to create a FATE server component. The Fate class in the Manager the contains the TransactionRunners and is very modularized. It would be trivial to create a FateServer with this. The Manager would retain the FATE API and create transactions, but the execution of them could be scaled independently.

----



Possible Solution #2

One possible solution to the Single Manager problem is to push the Management layer (Monitor, Manager, GC) into the resource group. From a management perspective, this appears as different Accumulo databases, but they are not. They are different Accumulo server process groups that are serving up data from the same Accumulo instance. This may, or may not, solve the single Manager problem depending on the number of tablets that the Manager has to manage within a single resource group. However, it solves the Monitor issue and GC and CompactionCoordinator partitioning problems. To make this work I believe that:

  1. There would need to be a table property for the resource group. When the Manager for a group starts up, it would find all of the tables with the same group name and manage them.
  2. The Accumulo client code would also need to determine the group name for the table that it needs to operate with so that it communicates with the correct Manager.
  3. The ZooKeeper paths for some components may need to change. For example, the paths for FATE transactions won't have any identifying information in the path as to which resource group the transaction is related to.

    Code Block
    /accumulo/01af27af-ad10-416e-96e2-5fc35217022d/fate
    /accumulo/01af27af-ad10-416e-96e2-5fc35217022d/fate/tx_1fbc9ea5cc0b473f
    /accumulo/01af27af-ad10-416e-96e2-5fc35217022d/fate/tx_1fbc9ea5cc0b473f/TX_NAME
    /accumulo/01af27af-ad10-416e-96e2-5fc35217022d/fate/tx_1fbc9ea5cc0b473f/repo_0000000000
    /accumulo/01af27af-ad10-416e-96e2-5fc35217022d/fate/tx_1fbc9ea5cc0b473f/repo_0000000002
    /accumulo/01af27af-ad10-416e-96e2-5fc35217022d/fate/tx_1fbc9ea5cc0b473f/repo_0000000003
    /accumulo/01af27af-ad10-416e-96e2-5fc35217022d/fate/tx_1fbc9ea5cc0b473f/repo_0000000004


    Note

    It may not be beneficial to put the resource group name into the ZK path as cleanup will need to be done for old resource group names.



draw.io Diagram
bordertrue
diagramNameResourceGroupsProposed
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth496
revision2