Overview

Deterministic Range Request Caching is directly related to partial object caching.  We are taking a slightly different approach, building this as a per service plugin instead of building directly into the core.  This is being done to mitigate risk and increase the speed of acceptance.

Interested Parties:

  • Jeff Bevill - Developer
  • Brian Olsen - Developer 
  • Ryan Durfey - stakeholder

Feature Request

 https://github.com/apache/trafficserver/issues/2662

Create a new feature that allows caches to support fixed deterministic ranges, for example 1 MB blocks of content as individual objects. If a client requests bytes 500,000-1,500,000 the cache would check cache for existing objects and if not in cache it would round this request to the deterministic boundaries and request any specific ranges that this request overlaps. In this case it would request range 1-1,000,000 and 1,000,001 to 2,000,000 and use these to serve the client request.  We have also found that it is inefficient to perform huge reads and writes to a single disk in cache which is what happens when you store a single .mp4 for a UHD movie which can be up to 15 GB.  We want to force this feature to break down these large files into reasonable chunks on the order of 10 MB so they can be striped across different disks in the cache.

 

NGINX Feature Comparison - Range Slicing

https://www.nginx.com/blog/smart-efficient-byte-range-caching-nginx/

Issues with Current Approaches to Range Request Handling

  1. Background Fetch 
    1. For very large files (>5 GB) this can be problematic, blocking subsequent ranges during fill. These can be unblocked, but it risks thundering herd issues at the origin.
    2. It can lead to caching very large files when not needed like 100 GB software patches.
    3. It can lead to caching of massive files on a single disk.  If these files are popular and using long reads/writes it can cause issues with disk usage
  2. Cache Ranges as individual Files
    1. this can be an issue if users send overlapping ranges which leads to caching the same content many times which will fill up cache.

Basic Logic

  1. Feature configurable (off/on) on a per service (remap) basis
  2. Cache should check for existing cached objects or ranges to serve requests first
  3. If missing any deterministic ranges that cover the client request the cache would fill those ranges as individually cached objects and use those to fulfill the client request
  4. Force all files over a certain size to be broken down into range files, and make this size configurable.  For example all files over 100 MB would be broken down into ranges.
  5. Allow small files under the configured size to be stored whole and filled through background fetch so as not to waste resources breaking down small files into ranges.  This may be optional if it doesn't save resources.
  6. Because we are making choices about request and storage size based on file size, we may need to do a preliminary head request for non-cached files to get file size unless this can be done on the fly after the headers are returned for the first request
  7. Blocking: Requests for a full file or  large range that must fill from origin should not block requests for individual ranges occurring around the same time.  Due to bandwidth constraints a 15 GB may take several minutes or more to load to cache.  We want other users to be able to request ranges further into a file while other users request request the full file.   Ideally the user requesting the full file or large range would make sequential requests over time and check each time he makes a range request to see if any sub-ranges were filled.  This would allow us to avoid thundering herd scenarios while preventing blocking due to large requests.
  8. If a client requests a full file or a massive range and then aborts the request part way through download, the cache should stop filling the rest of the asset for efficiency.  This may affect options for cache warming so we might want this feature to be configurable on/off on a per service basis.

Use Case Testing

Full Object Tests

  • Non-range request with cache-hit of full asset
  • Non-range request with cache-miss with range blocks present (incomplete)
  • Non-range request with cache-miss for unknown object

Range Tests

  • Range request cache-hit from full asset
  • Range request cache-hit from cached blocks only [one, two or all]
  • Range request cache-miss for cached blocks and un-cached
    • Ensure data cached after request
  • Range request cache-miss for only un-cached blocks [including first or last block] 
    • Ensure data cached after request
  • Range request cache-miss for unknown object
    • Ensure data cached after request

Stub File Tests

  • Any read-fail of stub file ensures that valid blocks are not lost when stub file is re-created
  • Any read-fail of a data block does not fail use of other data blocks with no read failures
  • Age of stub file reflects a confirmation time when all cached blocks correctly would match the origin's range request data
  • Any range response from an origin that does not match the cached blocks correctly invalidates all blocks from previous version
    • Ensure any match includes matching of all alternate variables (Encoding/Language etc..)

Invalidation / Purge

  • Any invalidation of file attempts to free as many blocks as possible from older file
    • Ensure no re-use of blocks present after invalidation is possible with a mismatched version
  • Purge of file results in purge of stub file and all blocks

Encoding

  • Requests for objects with encoding variances doesn't affect caching
    • This one is tricky since our version of ATS only allows for one version of a file to be cached (ie doesn't count encoding as part of cache key when considering file uniqueness) . 
    • We need to account for our version as well as ATS versions where the encoding is considered part of the cache key.

On / Off Testing

  • Partial object caching is turned on for service after some full objects are already in cache, test requesting ranges.
  • Partial object caching is turned off for service when some partial objects are already in cache, do we lose these objects in cache?

Via Header Codes

  • Generate new via codes for partial object cached

Testing for ATS Version

  • Originally developed for ATS 6.2, we need to test on ATS 7 since we are transitioning.

Failed / Failing Disks

  • Test on a server with failed/failing disks to make sure it behaves as expected on write failure for stub and fragment

UML Diagrams

 

Block Cache Initialization

Block Cache Store

Block Cache Read

block-cache-init.svg

block-cache-store.svg 

block-cache-read.svg 

  • No labels