Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added proposed implementation from the summit.

...

Presuming the range information would be stored in the root segment, memory bounding may not be necessary. This data is rewritten frequently in normal operation, such as for a refresh or the addition of an alternate. If we are willing to do a rewrite for a range request then we can have much weaker space bounds on the memory consumed by the range tracking. This would require a read from disk on a cache hit but that is commonly done now. The key difference would be treating a cache "miss" differently, although the mechanism used for an alternate miss may work as well.

Plugin Implementation

An alternative to modifying the cache is to implement a partial object caching plugin on top of the existing cache. The fundamental idea here is that the plugin fetches fixed-size "chunks" from the origin and uses those to compose the range requested by the client. For each client request, the plugin

  • determines the set of chunks that overlap the range
  • issues a new set of origin requests (in parallel) to fetch the corresponding chunks
  • rewrites the cache key for each chunk so they are all stored independently
  • creates a transformed client response by composing the correct byte ranges from the chunks

This would be a fairly complex plugin, but it has the advantage of not changing the cache format and minimizes the risk of breaking other features.

 

Summit Proposed Implementation

 

Partial Object Caching

Partial object caching is really the caching of range requests for the same object as a single object. It is now common for large objects that should be cached to never be retrieved in full, only one range at a time.

This design should enable caching range requests as part of a single object with only minimal changes to the cache internals.

We want this design to have a few essential properties.

* Data requests to the origin server must not be much larger than the original client request.
* Handle range requests for an object in any order with any overlap.
* Serve ranges requests from cache if the range requested is already in the cache.
* Minimal additional overhead of data and computation.

Normally a large object is stored as an ordered list of fragments each of roughly the same size. For a partial object we fix this size for a particular object. Different objects may have different fragment sizes but we require all fragments of specific partial object to be identical in size, except possibly for the last fragment. This is normally the case for all multiple fragment objects although it is not an explicit requirement. The only change here is that we will need to store the fixed fragment size in the metadata of the first fragment per alternate for a partial object.

Given that fixed fragment size we then (conceptually) divide the object in to ranges of that size. We say such a fragment is *touched* by a range request if any byte of that request is in the fragment. Because we fix the fragment size we can skip untouched fragments without losing the ability to insert those fragments later.

..sidebar:
Because fragments are computationlly chained it is not possible to insert a fragment. The ordering of fragment cache ids is fixed by the cache id of the earliest fragment.

Range requests are processed by adjusting the size seen by the origin server to always be an integral multiple of the fragment size with a offset that is also an integral multiple of the fragment size. This can mean expanding or shrinking the request dependent on how it intersects already cached fragments. The altered size is chosen to be the minimum size that covers all of the untouched fragments while satisfying the size and offset requirements. If this yields a size that cover the entire object then the object will not be partially cached but will be retrieved in full.

The client always recieves the originally requested range.

With this change we then use the fragment offset table to store the fragment presence. Because we have adjusted the request to the origin server a fragment will always be either all present or not present at all in the cache. Because the fragment size is fixed, we can compute the fragment offsets without having to store them, therefore we can use them purely as a marker while still allowing normal range acceleration operation if the requested range is covered by cached fragments.

Another minor change is that we permit an empty earliest fragment if the first range request does not touch that fragment. This is necessary because the cache logic assumes the earliest fragment is also closes to the stripe write cursor and will therefore be overwritten before any other fragment. This allows validity checking before any cache data is served. We must preserve that and so will write an empty earliest fragment even if it is not retrieved from the origin server. The current logic always reads the earliest fragment (if distinct from the first fragment) so doing this keeps the same logic for checking a partial object. We will need to make a note in the first fragment metadata about this so that the fragment table offsets can be done correctly (in effect this will shift the indices by one). Note that the earliest fragment is entirely empty (no content), a full fragment (if there other fragments), or a partial fragment **only if** it contains the entire object (i.e. it is also the last fragment).

As range requests are processed for a partial object additional fragments are written and the first fragment is updated. This keeps all of the content fragments bookmarked between the earliest fragment and the first fragment as for full objects. It also allows the fragment offset table to be extended as needed if fragments beyond the current end are retrieved.

..note:
An open question is the handling of 304 responses and object refresh in general. If the client makes a range request that touches previously untouched fragments we can't attach an ``If-Modified-Since`` field to it. For a rane request that can be satisfied from data in the cache, I don't know if we can attach an ``If-Modified-Since``. If that is possible then I presume we would treat the entire object uniformly, that is all bytes are valid or stale, not just the fragments touched by the client request.

A related issue is if we get different expiration data for different requests to the origin server. Should the object be updated with the new expiration data or keep the original?

..note:
A serious issue with caching range requests is tracking the valid bytes. It is to avoid this problem that we adjust the size of the range request to the origin server. By limiting it to being more than than 2 fragment sizes larger than the original request we avoid excessive network traffic beyond the client request while guaranteeing we need only track the fragments, as is already done for full objects.

..note:
Although we must rewrite the first fragment on every request that adds fragments to the partial object we still need to be a bit careful about the full size so that we don't write a partial fragment in the false belief that it is the last fragment.

..note:
It is easiest to use the target fragment size for the fragment size of a partial object but this is in no way required. Currently I don't think it is a useful feature to be able to tune this independently but it would require very little extra work to implement. The main concern would be the additional complexity of configuration as seen by the user, although we could default it to zero which would mean "use the target fragment size". Then it would need to be configured only by those who had an interest in doing so.