Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • POST:  /admin/rebalance
    parameters:
      – dataverseName (optional):  the name of the dataverse to be rebalanced;
      – datasetName (optional): the name of the dataset to be rebalanced;
      – nodes: a comma separated list of node names where the dataset gets rebalanced to.

    If neither dataverseName nor datasetName are provided, we will rebalance all datasets except Metadata datasets.
    If dataverseName is provided but datasetName is not provided, we will rebalance the given dataverse.
     
    Example: curl -X POST http://localhost:19002/admin/rebalance?dataverseName=tpch&datasetName=LineItem&nodes="asterix_nc1"

  • DELETE: /admin/rebalance
    It cancels all running or pending rebalance requests.
    Example: curl -X POST DELETE http://localhost:19002/admin/rebalance

...

    • rebalance requests are processed one-at-a-time by a singleton thread executor;
    • each time a request timecomes, we submit the rebalance task into the thread executor and add it to the task queue.;
    • once a rebalance task completes, we remove the task from the task queue;
    • a cancellation request will cancel of running or pending requests in the rebalance task queue.

...

    • The locks in metadata transaction a to c makes make sure that read-only queries are allowed for the most time except the duration of metadata transaction b.                 

    • We make sure step 5 and 6 can sustain InterruptedException, which means we will keep retrying metadata transaction 5 and 6 until success, in the event of interrupted exceptions.

  • Concurrency:
          Since we cut the rebalance process into three metadata transactions, other metadata write operations could potentially interleave with the rebalance process.
    • CASE 1: if foo is dropped between metadata transaction a and b.  At the beginning of step 5, we check the existence of foo and drop target files if dataset foo is dropped between transaction a and b.
    • CASE 2: if foo is dropped between metadata transaction b and c.  This time, it's the rebalance target that gets dropped.  Therefore, step 6 is independent to the drop operation.

  • Idempotent property:
    •  The whole rebalance process is idempotent in case it fails or the node crashses in the middle:
      • In metadata transaction a,  since (1) only the new node group creation is persistent metadata write, (2) we use uuid to name a node group if the desired name exists, and (3) step 3 drops leftover files for the rebalance target, it is idempotent if failure or crash happens at any point.
      • In metadata transaction b and c, since they sustain InterruptedException, they will never be interrupted in the middle as long as the node doesn't crash. In this way, a user cancellation request will not waste the bulk part of the rebalance work, i.e., metadata transaction a.
      • In metadata transaction b and c, if the node or JVM crashes, 
        • if the metadata entity was not switched (depending on what's there in the metadata node), the system uses the rebalance source as foo;
        • if the metadata entity was switched, the system uses the rebalance target as foo;
      • In the event of failures, there could be leaked source files (from metadata transaction ac) which will be reclaimed in the next rebalance operation,  or or leaked target files (from metadata transaction ba) which will not be reclaimed,  or leaked node group name (from metadata transaction ba) which doesn't prevent the success of the next rebalance operation.  (ASTERIXDB-1948)