Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Metadata entity
    • Dataset: 
      for each dataset record, there is one field called "rebalanceCount".  If the dataset has never been rebalanced, it is Missing.
    • Nodegroup:
      when a dataset foo is created, we internally create a nodegroup with name foo_i (or just foo if i=0) where i=foo.rebalanceCount based on the current available nodes.  ThenThen, we let the nodegroup of foo be foo_i.
  • For each shadow dataset foo, repeat the following process:
    1. create a new node group foo_i (where i= foo.rebalance_count + 1) that contains the current available nodes, if the node group has already been occupied, we let the new node group have name foo_<uuid>;

    2. create an uncommitted dataset foo (on node group foo_<i>) using node group foo_<i> with the same rebalance_count;

    3. drop any leftover files for the uncommitted dataset foo;

    4. upsert all documents from foo to foo (on node group foo_<i>) on all partitions

    5. update the metadata entity for dataset foo,  make the uncommitted foo become the committed foo in metadata

    6. drop files foo  (on node group foo_i-1) and drop node group foo_i-1

...