Page History

...

A PR is comprised of number of distributed bucket regions with 0 or more secondary(redundant) copies; and clear has to be performed on both primary and secondary copies, keeping the data consistent between them.
During the operation the bucket region could be moved with rebalance operation or from members joining/leaving the cluster OR its state could be changed (primary/secondary).
Updating/clearing OQL indexes (which are synchronously and asynchronously maintained).
Updating/clearing Lucene indexes which are asynchronously maintained via AEQ.
Clearing persistent/overflowed data and managing data consistency across primary and secondary copies and disk stores which could be offline.
Handling clear during region initialization operation - from both initial image sender provider and requester point of view
Handling concurrent clear and cache operations (put/destroy) - in synchronization with primary and secondary copies.
Notifying client subscribed to the PR on clear event, and keeping it consistent with server side data.
Managing Transactional operations with and during clear.

...

The co-ordinator member takes distributed-lock at PR level making sure only one clear() operation on that region is in progress.
Send clear message to other members hosting the PR primary buckets. Re-send/retry clear message if the bucket regions are moved.primary bucket region is moved. If the primary bucket region is not moved, lock it from moving.
At At primary bucket, do local clear and distribute to secondary buckets with region version (RVV) info.
The RVV info is used to allow, reject concurrent cache operation (operation in-flight during clear), thus keeping primary and secondary bucket data consistency.
Persist clear event (with version info) for persistent regions.

...

The member after receiving the clear request:
Acquires the distributed lock; and elects himself as the co-ordinator member. This prevents multiple clear() ops concurrently getting executed.
Gets the primary bucket list. Sends clear message to the primary buckets (members).
The primary buckets upon receiving the message (processing), takes take a distributed lock to prevent losing primary bucket. Then takes RVV.lockForClear (a local write lock to prevent on-going operations, GII. transactions).
Upon completion of clear on primary bucket, sends the clear message to secondary buckets (members).
When secondary bucket receives RVV, it will wait/check for local RVV to dominate the received RVV, which makes sure the concurrent cache operations are applied/rejected accordingly.
NOTE:
The cache operations are synchronized under the RVV lock; for non off-heap region the clear will call the map.clear() which should have minimal or no impact on performance. For off-heap region map.clear() will iterate over the entries to clear the off-heap entries; this could have impact on cache operation performance. This will be documented. And In future option of iterating over region entires could be done in background thread.
Handling Transaction (TX) during clear
As the clear and transactions are handled at bucket region level. They will operate by taking rvvLock.
If TX gets lock first, clear thread will wait until TX finishes and releases the rvvLock.
If clear gets the rvvLock first, TX will fail and rollback.
Updating OQL Index
The index are managed both synchronously and asynchronously. The clear will update both synchronous and asynchronous indexes under lock, by clearing both index data structures and the queues used for asynchronous maintenance.
Disk Region clear version tag is written to oplog for each bucket region.
CacheListener.afterRegionClear(), CacheWriter.beforeRegionClear() will be invoked at PR level.
Update to client queue i.e. notify client
The subscription clients will be notified with clear region event (with version info?) at PR level.
Off-heap entries: Reuse current non-partitioned regions clear logic to release off-heap entries in an iteration.
GII: Reuse current non-partitioned regions clear's logic to compete for rvvLock with clear.
Region Destroy
The clear will throw RegionDestroyedException at bucket region level. The coordinator should collect the exception and throw this exception to caller.
LRU: Clear will remove all the expiration tasks at bucket.
PartitionOfflineException If some buckets’ clear failed with PartitionOfflineException, the PR.clear should return the caller with PartialResultException. Let user to decide if to call clear again.
Updating Lucene Index
As part of clear(), the Lucene indexes will be recreated. Any events which are in the AEQ prior to clear (for the cleared region entries) will be rejected/handled (this logic exists in the product).

...

Space shortcuts

Page tree

Versions Compared

Old Version 27

New Version 28

Key