Skip to content

Design Doc: Image rewriting specification

Otto van der Schaaf edited this page Oct 25, 2023 · 3 revisions

Image rewriting specification

Joshua Marantz, November 2010

Mostly current design doc: Using Scanline as Intermediate Data for Image Rewriting

Often a web page contains scaled-down versions of images, but relies on the browser to re-scale the images at page load. Even more often, a web site contains naively-compressed versions of images, and these image files often contain irrelevant metadata inserted by the image processing tool chain (such as the version of the software that generated the image, or information about the camera used). This results in extra image data being sent; if instead we strip metadata, rescale, and recompress on the server side we can reduce the amount of image data on the wire, and thus latency and bandwidth demands. We do this automatically in instaweb-apache. This means that site maintainers don't need to add a separate image-recompression step to their work flow when performing site updates, and also means that images in the site can preserve metadata for editing without worries that this metadata will lead to increased bandwidth costs. Server-side rescaling was used with success in Google products.

Instaweb-apache deals with image references in four distinct stages:

  1. Identify img urls and attempt to fetch and locally cache the corresponding image data. This fetch can be initiated asynchronously, so the first time an html page is fetched no further image processing may occur.

  2. When the image is available, its dimensions are obtained and compared to the dimensions given in the img url (if any). If the actual dimensions of the image are larger than the dimensions specified in the web page, we rescale the image data to the specified size.

  3. The (possibly rescaled) image is run through a re-compression / optimization library. At the moment we use the image optimizers from PageSpeed, which rely on libjpeg-turbo, libpng, and libwebp. Note that the results of re-compression are intended to be visually indistinguishable from the original source image; for jpegs and lossy WebP this means we must generally respect the quality setting chosen for the original source image.

  4. If the rescaled image file is smaller in size than the original image file, the html reference is rewritten to point to the re-compressed image. Otherwise, a notation is left in the server cache to indicate that this image file should not be rewritten in future.

The urls for re-scaled images encode the original image url in their file name. This permits us to statelessly serve rewritten images from multiple web server instances. Note in particular that it is possible to request rewritten html from server A, and request the accompanying rewritten image file from server B. In this case, server B will fetch the original image and perform the same rescaling and recompression steps that A did before the url was rewritten in the first place. If B fails to obtain the resource (either because it is still being fetched asynchronously, or because there are network problems) it temporarily redirects to the original, unrewritten source url for the image.

Clone this wiki locally