Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Metadata entity:
    • Dataset: 
      for each dataset record, there is one more open field called "rebalanceCount".  If the dataset has never been rebalanced, it is missing.
    • Nodegroup:
      when a dataset foo is created, we internally create a nodegroup with name foo_i (or just foo if i is missing) where i=foo.rebalanceCount, based on the current available nodes. Then, we let the nodegroup of foo be foo_i.

  • Primary/secondary index file directory layout:
    • If the rebalanceCount of the dataset is Missing,  the file directory layout of indexes is the same as before - index files are directly under  in the dataset's directory.
    • If the rebalanceCount of the dataset is larger than 0,  index files are under a nested directory in the dataset's directory with name rebalanceCount.

  • For each shadow dataset foo, repeat the following process:
    1. create a new node group foo_i (where i= foo.rebalance_count  is missing? 1: foo.rebalance_count + 1) that contains the current available nodes, if the node group has already been occupied, we let the new node group have name foo_<uuid>;

    2. create an uncommitted dataset with same name foo (on node group foo_<i>) using node group foo_<i> with the same rebalance_count; (in the following description, we will call this dataset "rebalance target" and call the original dataset foo "rebalance source".)

    3. drop any leftover files for rebalance target;

    4. upsert all documents from rebalance source to rebalance target on all partitions

    5. check the existence of foo – if foo does not exist in metadata, drop the files for rebalance target. Update the metadata entity of dataset foo switch to the rebalance target.

    6. drop files of the rebalance source and drop node group foo_<i-1>


    • There are three metadata transactions for step 1 to 6:

      1. step 1-4,  locks – read lock on foo and read lock on node group foo.nodegroup.

      2. step 5: write lock on foo, conditional read lock on node group foo.nodegroup.

      3. step 6: read lock on foo and node group foo_(i-1) (the same as foo.nodegroup in metadata transaction a)


...