Skip to content

Design Doc: Rewrite Result Metadata Caching

Jeff Kaufman edited this page Jan 5, 2017 · 2 revisions

Rewrite Result Metadata Caching

Maksim Orlovich, February 2011

Overview and rationale

Consider for example the case of optimizing CSS. When we first perform the optimization we create an output resource with a filename like a.css.pagespeed.cf.somehash.css, and we write it out in the data cache. With that, when the browser requests the optimized version we can serve it without recomputation. But now suppose the same page is loaded again. We would like to avoid re-optimizing CSS but with system as described we cannot determine the output URL without computing the output contents, as it incorporates the content hash.

The solution we use is to have a secondary cache, keyed by the filter id and the output URL minus the hash and the extension, which gives the hash and extension back. With this, we can simply look up the target URL in this cache and avoid re-optimization. Notice also that we do not need to touch the input data at all. This is very nice since the metadata cache entry is pretty small, while some input can get pretty large. We keep track of the validity of this cache by giving it the same TTL as the input (or inputs) that were used to construct it. (We need the extension since we want image rewriting to be able to change formats, so for example a gif input might produce a png output).

The cache is committed to from ResourceManager::Write(), and fetched in CreateOutputResourceWithPath() and CreateOutputResourceFromResource(). The information in it is accessible via the OutputResource::CachedResult object gotten via OutputResource::cached_result().

The optimizable flag

The above technique is insufficient in a simple scenario: suppose the image rewriter tried resizing and optimizing an image and it turned out that it wasn't making things better. It is clearly desirable to avoid doing this again and again every time the page is processed. To accomplish that, an optimizable() bit is stored in this cache; so if it's false the filter can avoid doing any work; the usual TTL mechanisms will let us known when to retry again. This is presently encoded via presence or absence of X-ModPagespeed-Unoptimizable header in the cache entry. This bit is set by ResourceManager::Write(), cleared by ResourceManager::WriteUnoptimizable() and is accessible via the optimizable() method of CachedResult.

Custom metadata

Rationale

Recall that one of the stated advantages of having this cache is being able to rewrite the HTML without looking at the data of resources so that one doesn't load in megabytes of image data. Unfortunately the design as stated thus far is insufficient for at least 2 things we want to be able to do: insert image dimensions and inline small images. To permit this functionality, CachedResult provides a key/value store that persists in the cache so that filters can record information like image dimensions they compute, and reuse it without recomputation. The relevant methods are SetRemembered() and Remembered().

Metadata used by OutputResource::CachedResult

Key Format Description
"OutputResource_OriginExpiration" int64 This stores the time at which this cached result should expire due to the inputs expiring. Note that there is also a separate HttpCache expiration time for this entry, which may be larger if the filter is using ReuseByContentHash() == true. Older versions used just the cache expiration and not the field.

Metadata used by RewriteSingleResourceFilter

Key Format Description
"RewriteSingleResourceFilter_InputTimestamp" int64 This stores the timestamp_ms() of the input resource used to construct this entry. This is used to determine when the resource needs freshening.
"RewriteSingleResourceFilter_CacheVer" int Encodes the value of FilterCacheFormatVersion() of the filter that generated this cache entry. If this does not match the value returned by the filter at runtime, this entry will be ignored.
"RewriteSingleResourceFilter_InputHash" string For filters with ReuseByContentHash() == true this field is used to store a hash of the contents of the input that was used to produce the output this cached entry refers to. If the input expires as per cache policy, but still has the same hash, RewriteSingleResourceFilter will be able to reuse the result.

Metadata used by ImgRewriteFilter

Key Format Description
"ImgRewriteFilter_W" int Stores the width of the image the CachedResult is pointing to.
"ImgRewriteFilter_H" int Stores the height of the image the CachedResult is pointing to.
"ImgRewriteFilter_DataUrl" string If the image is small enough to inline, this key exists and contains the text of the data: url. Note that the main url() still has the http:// version, as we need it for old IE.
Clone this wiki locally