Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • A PR is comprised of number of distributed bucket regions with 0 or more secondary(redundant) copies; and clear has to be performed on both primary and secondary copies, keeping the data consistent between them.
  • During the operation the bucket region could be moved with rebalance operation or from members joining/leaving the cluster OR its state could be changed (primary/secondary).
  • Updating/clearing OQL indexes (which are synchronously and asynchronously maintained).
  • Updating/clearing Lucene indexes which are asynchronously maintained via AEQ.
  • Clearing persistent/overflowed data and managing data consistency across primary and secondary copies and disk stores which could be offline.
  • Handling clear during region initialization operation - from both initial image provider and requester point of view 
  • Handling concurrent clear and cache operations (put/destroy) - in synchronization with primary and secondary copies.
  • Notifying client subscribed to the PR on clear event, and keeping it consistent with server side data.
  • Managing Transactional operations with and during clear.
  • Updating/clearing Lucene indexes which are asynchronously maintained via AEQ. We rely on GEODE-9133 to allow a regionEvent to be sent to gateway sender. Before that, PR clear will throw UnsupportedException.

Anti-Goals

  •  Sending clear event for WAN replication.As this has implication on entire region data on remote cluster; this is kept out of the scope of this work; in-line with WAN GII, Region destroy operations.

...

After Successful clear() operation on the PR:

  • Send clear event to recreate Lucene index at each bucket.
  • Update OQL indexes
  • Notify clients at PR level (with RVV?)
  • Return to caller
  • Before GEODE-9133 is implemented, clear on a PR with lucene index will throw UnsupportedException. 


Scenarios/options considered in supporting clear operation:

...

  • The member after receiving the clear request: 

    Acquires the distributed lock; and elects himself as the co-ordinator member. This prevents multiple clear() ops concurrently getting executed.

    Gets the primary bucket list. Sends clear message to the primary buckets (members).

    The primary buckets upon receiving the message (processing), take a distributed lock to prevent losing primary bucket. Then takes RVV.lockForClear (a local write lock to prevent on-going operations, GII. transactions). 

    Upon completion of clear on primary bucket, sends the clear message to secondary buckets (members). 

    When secondary bucket receives RVV, it will wait/check for local RVV to dominate the received RVV, which makes sure the concurrent cache operations are applied/rejected accordingly.

    NOTE:

    The cache operations are synchronized under the RVV lock; for non off-heap region the clear will call the map.clear() which should have minimal or no impact on performance. For off-heap region map.clear() will iterate over the entries to clear the off-heap entries; this could have impact on cache operation performance. This will be documented. And In future option of iterating over region entires could be done in background thread.    

  • Handling Transaction (TX) during clear

    As the clear and transactions are handled at bucket region level. They will operate by taking rvvLock.

    If TX gets lock first, clear thread will wait until TX finishes and releases the rvvLock. 

    If clear gets the rvvLock first, TX will fail and rollback.

  • Updating OQL Index

    The index are managed both synchronously and asynchronously. The clear will update both synchronous and asynchronous indexes under lock, by clearing both index data structures and the queues used for asynchronous maintenance.  

  • Disk Region clear version tag is written to oplog for each bucket region.
  • CacheListener.afterRegionClear(), CacheWriter.beforeRegionClear() will be invoked at PR level.
  • Update to client queue i.e. notify client

    The subscription clients will be notified with clear region event (with version info?) at PR level. 

  • Off-heap entries:  Reuse current non-partitioned regions clear logic to release off-heap entries in an iteration.
  • GII: Reuse current non-partitioned regions clear's logic to compete for rvvLock with clear.
  • Region Destroy  

    The clear will throw RegionDestroyedException at bucket region level. The coordinator should collect the exception and  throw this exception to caller. 

  • LRU: Clear will remove all the expiration tasks at bucket.
  • PartitionOfflineException If some buckets’ clear failed with PartitionOfflineException, the PR.clear should return the caller with PartialResultException. Let user to decide if to call clear again.
  • Updating Lucene Index

    As part of clear(), the Lucene indexes will need to be recreated . Any events which are in the AEQ prior to clear (for the cleared region entries) will be rejected/handled (this logic exists in the product). via a RegionEvent passing through AEQ. Before GEODE-9133 supported RegionEvent on AEQ, we will throw UnsupportedException for the time being.

Test Cases to consider

  1. TX in PR
    1. Test TX with one or more primary buckets in the same or different member, when clear is on-going
  2. Retry at Accessor (a member is killed, need to find new primary target)
  3. Retry at client (coordinator is killed)
  4. PR ends with PartialResult, due to PartitionOfflineException
  5. When PR.clear is on-going, restart some members.
  6. When PR.clear is on-going, rebalance to move primary.
  7. Benchmark test to see how long is used in local clear vs distribution when there’s off-heap setting.
  8. Region destroy (bucket or the whole PR) when clear is on-going
  9. When PR.clear is on-going, try to create OQL index on value/key, or try to create Lucene Index, or try to alter region

...

Based on our new understanding, to satisfy the goal of “Notifying clients subscribed to the PR on clear events, and keeping it consistent with server side data”, we are planning to take/adopt “Solution 1” detailed in the following section. 

...

  • Cache Writer invocation (before cache op) - On any one node hosting the PR. 
  • Cache Listener invocation (after cache op) - On all the nodes hosting the PR and with listeners.
  • Client event delivery (notify clients) - Add a clear event to clients HA region queue (both primary and redundant). With region destroy operation, the events are created and added into the client’s queue (if present) in all the nodes hosting the PR.Lucene index update - No notification sent via AEQ. Instead, clear operation will recreate the lucene index for all the buckets. 
  • AEQ - Clear events will NOT be added to primary and secondary AEQs. 
  • WAN - No clear event sent across the WAN gateway.

...