You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

This plugin exploits RFC 6429, Metalink/HTTP: Mirrors and Hashes, and RFC 3230, Instance Digests in HTTP, to detect duplicate downloads from different mirrors and redirect clients to mirrors that are already cached

Many download sites or content distribution networks distribute the same files from different mirrors and users do not know which mirrors are already cached. These sites often present users with a simple download button that redirects them to a mirror, but it does not predictably redirect them to the same mirror, or to a mirror that is already cached, so users cannot predict whether a download will take seconds or hours, which is frustrating

Given a response with a "Location: ..." header and a "Digest: SHA-256=..." header, such as from MirrorBrain, if the URL in the "Location: ..." header is not already cached but the cache already contains content that matches the digest in the "Digest: SHA-256=..." header, then this plugin will rewrite the "Location: ..." header with the cached URL. This should redirect clients to mirrors that are already cached

The code is up on GitHub. You can either download a zip archive or download the code with Git:


  $ git clone https://github.com/jablko/dedup.git

Follow the instructions in the Programmers Guide to compile the plugin:


  $ tsxs -I proxy/api -C metalink.cc -o metalink.so

Finally add "metalink.so" to the Traffic Server plugin.config file to load the plugin

The plugin implements TS_HTTP_SEND_RESPONSE_HDR_HOOK to check and potentially rewrite the "Location: ..." and "Digest: SHA-256=..." headers after responses are cached. It does not do this before responses are cached in case the contents of the cache change after responses are cached. It uses TSCacheRead() to check if the URL in the "Location: ..." header is already cached. In future, the plugin should also check if the URL is fresh or not

The plugin implements TS_HTTP_READ_RESPONSE_HDR_HOOK and a null transform to compute the SHA-256 digest for content as it is added to the cache. It uses SHA256_Init(), SHA256_Update(), and SHA256_Final() from OpenSSL to compute the digest, then it uses TSCacheWrite() to associate the digest with the request URL. This adds a new cache object where the key is the digest and the object is the request URL

To check if the cache already contains content that matches a digest, the plugin must call TSCacheRead() with the digest as the key, read the URL stored in the resultant object, and then call TSCacheRead() again with this URL as the key. This is probably inefficient and should be improved

An early version of the plugin scanned "Link: <...>; rel=duplicate" headers. If the URL in the "Location: ..." header was not already cached, it scanned "Link: <...>; rel=duplicate" headers for a URL that was already cached. The "Digest: SHA-256=..." header is superior because it will find content that already exists in the cache in every case that a "Link: <...>; rel=duplicate" header would, plus cases where the URL is not listed among the "Link: <...>; rel=duplicate" headers, maybe because the content was downloaded from a URL not participating in the content distribution network, or maybe because there are too many mirrors to list in "Link: <...>; rel=duplicate" headers

The "Digest: SHA-256=..." header is also more efficient than "Link: <...>; rel=duplicate" headers because it involves a constant number of cache lookups. RFC 6249 requires a "Digest: SHA-256=..." header or "Link: <...>; rel=duplicate" headers MUST be ignored:

If Instance Digests are not provided by the Metalink servers, the Link header fields pertaining to this specification MUST be ignored.

Metalinks contain whole file hashes as described in Section 6, and MUST include SHA-256, as specified in FIPS-180-3.

Alex Rousskov pointed out a project for Squid to implement Duplicate Transfer Detection:

Per Jessen is working on another project for Squid with a similar goal: http://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid

  • No labels