Background

Recently, there have been some GC changes to help with processing times and decrease the possibility of accidental deletion of required files.

GC Metadata Validation
: https://github.com/apache/accumulo/pull/3755
Ensures that GC returns all information from a row when processing gcCandidates for deletion. 

Delete Referenced Candidates: https://github.com/apache/accumulo/issues/3693
Removes gcCandidates from the metadata that match current tablet file references.

This work has raised attention to the GC and how it performs its functions.
Reference issue: 
https://github.com/apache/accumulo/issues/608 

Possible Improvements

Migrate gcCandidate creation into tablet section 

Add preliminary gcCanidates to tablet section so that a single mutation can move references to a "del-candidate" section of the same tablet.
     This ensures that a single mutation & WAL would contain both, the reference modification and the creation of the "del-candidate".

References in a tablet now migrate between three categories: file ref, scan ref, and del candidate.
The expectation is that in a single mutation, a file reference would move to either a scan ref or a del candidate. 
Likewise, a scan ref would move to a del candidate. 

Deletion candidates would be checked across tablets file and scan ref sections and removed if "InUse". 
This would ensure that candidates are always created when a tablet's file reference is modified. 

Removal of the ~del metadata table section is currently unknown as deletion candidates for non-tablet information may still need to be created. 

Results of meeting on 10/4/2023

  1. Will test out the second addition of the gc candidates after the tablet mutation around ManagerMetadataUtils line 206
  2. Will do some scans to see what other files may be left around where this feature is currently enabled
    1. Also try and determine historically why files may be left around if possible
  3. Create a prototype added in-tablet delete entries that can be written at the same time the major compaction mutation is made.
  • No labels