Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Take standard headers and knowledge about objects in the cache and potentially rewrite those headers so that a client will use a URL that is 's already cached instead of one that isn't. The headers are specified in RFC 64296249 (Metalink/HTTP: Mirrors and Hashes) and RFC 3230 (Instance Digests in HTTP) and are sent by various download redirectors or content distribution networks.

...

More important than saving a little bit of bandwidth, this saves users from frustration.

A lot of download sites distribute the same files from many different mirrors and users don't know which mirrors are already cached. These sites often present users with a simple download button, but the button doesn't predictably access the same mirror, or a mirror that is 's already cached. To users it seems like the download works sometimes (takes seconds) and not others (takes hours), which is frustrating.

...

When it sees a response with a "Location: ..." header and a "Digest: SHA-256=..." header, it checks to see if the URL in the Location header is already cached. If it isn't, then it tries to find a URL that is cached to use instead. It looks in the cache for some object that matches the digest in the Digest header and if it finds somethingsucceeds, then it rewites the Location header with the URL from that object's URL.

That This way a client should get sent to a URL that's already cached and the user won't end up downloading download the file again.

Just build the plugin and then add it to your pluginsplugin.config file.

The code is distributed along with recent versions of Traffic Server, in the "plugins/experimental/metalink" directory. To build it, pass the "--enable-experimental-plugins" option to the Traffic Server configure script when you build Traffic Server:

Code Block

...

When you're done building Traffic Server, add "metalink.so" to your pluginsplugin.config file to start using the plugin.

...

An early version of the plugin scanned "Link: <...>; rel=duplicate" headers. If the URL in the "Location: ..." header was not wasn't already cached, it scanned "Link: <...>; rel=duplicate" headers for a URL that was. The "Digest: SHA-256=..." header is superior because it will find content that already exists in the cache in every case that a "Link: <...>; rel=duplicate" header would, plus in cases where the URL is not listed among the "Link: <...>; rel=duplicate" headers, maybe because the content was downloaded from a URL not participating in the content distribution network, or maybe because there are too many mirrors to list in "Link: <...>; rel=duplicate" headers.

...

Metalinks contain whole file hashes as described in Section 6, and MUST include SHA-256, as specified in [FIPS-180-3].

Alex Rousskov pointed out a project for Squid to implement Duplicate Transfer Detection:

  • http://thread.gmane.org/gmane.comp.web.squid.devel/15803
  • http://thread.gmane.org/gmane.comp.web.squid.devel/16335
  • http://www.hpl.hp.com/techreports/2004/HPL-2004-29.pdf

Per Jessen is working on another project for Squid with a similar goal: http://wiki.jessen.ch/index/How_to_cache_openSUSE_repositories_with_Squid