Skip to content

Design Doc: Cache Html Rewriter Flow

Otto van der Schaaf edited this page Oct 25, 2023 · 2 revisions

Cache Html Rewriter Flow

Megha Mohabey, 2012-11-28

Objective

  • Make Html cacheable by removing the noncacheable parts specified by the publisher.
  • Work seamlessly with Split html rewriter and other rewriters.

Background

Html is usually noncacheable. This rewriter allows caching of HTML by removing noncacheable parts. When a request is received, the server responds with a rewritten cached html and initiates a fetch to get the noncacheable parts of the page. The noncacheable parts (NOTE: This identifies the portions of the page that are not cached on servers. It is specified using id or class attributes in the HTML by the publisher. Ref: https://developers.google.com/speed/docs/pss/PrioritizeAboveTheFold) of the page are then extracted form the rewritten fetch response and sent to the browser. The javascript on the client side then stitches together the cacheable and noncacheable parts on the browser.

Blink caches a split version of origin HTML comprising of above the fold (ATF) and below the fold (BTF) content. It also sends ATF HTML first resulting in faster rendering. In Blink, publisher HTML is passed async to headless browser which does the split. With Cache HTML rewriter, we want to get the benefits of caching the HTML without prioritizing specific content. Since prioritization is done independently in Split Html rewriter, we can update the cached html more frequently and rewrite it in the line of request thus getting browser related optimizations.

Cache Html Rewriter Flow

When the request comes, the following sequence of events happen

Property Cache Lookup to check if the HTML is cached and if the split is defined

  • Cache Html Miss and Split Miss Flow:
    • In the line of request
      • Start ProxyFetch
      • Rewrite Page
      • Serve Rewritten Page
      • Split Filter will initiate a headless browser render to update the split information in the cache.
    • In async flow
      • Create a secondary fetch which buffers the html from the ProxyFetch and passes it through a separate rewrite driver which has only StripNonCacheableFilter (strip_non_cacheable_filter.h) enabled.
  • Cache Html Miss and Split Hit Flow
    • In line of request
      • Start ProxyFetch
      • Rewrite page with Split Filter enabled
      • Serve rewritten page
    • In async flow (same as Cache Html Miss and Split Miss Flow)
  • Cache Html Hit and Split Miss Flow
    • In the line of request
      • Rewrite cached html using rewrite driver 1 (with Split disabled).
      • Serve the rewritten page.
      • Start ProxyFetch and rewrite the origin html. Extract the non cacheable json and send it to the browser.
    • In async flow (same as Cache Html Miss and Split Miss Flow)
  • Cache Html Hit and Split Hit Flow
    • In the line of request
      • Rewrite cached html using rewrite driver 1 (with Split Filter enabled).
      • Serve the ATF rewritten page.
      • Set the BTF json in rewrite driver 2
      • Start ProxyFetch and rewrite the origin html. Extract the non cacheable json and send it to the browser.
      • Send the BTF json to the browser.
    • In async flow (same as Cache Html Miss and Split Miss Flow)

JS serving

This section explains when the JS snippets and Json is served in various paths.

  • Cache Html Miss and Split Miss Flow:
    • no JS sent by either filters
  • Cache Html Miss and Split Hit Flow
    • JS sent by SplitFilter
  • Cache Html Hit and Split Miss Flow
    • After sending cached html, CacheHtmlFlow sends blink_js and noncacheable json.
  • Cache Html Hit and Split Hit Flow
    • Rewrite driver 1 has split enabled, so split will send all the initial JS that it needs.
    • Split Filter will not call SplitHtmlFilter::ServeNonCriticalPanelContents (split_html_filter.cc) when CacheHtmlFilter is enabled and it is a cache_html_hit.
    • A modified version of above function will be called by CacheHtmlFlow, which will send the noncacheable json before non critical json.

Flow With Diff Detection Logic

  • The diff detection flow will remain same as blink flow critical line, however, when a diff is detected
    • CacheHtmlFlow will send notifications to the other subscribers. Please note that CacheHtmlFilter will use a different proto to store the information rather than using blink_critical_line_data.proto but it will have similar fields for diff detection logic.
    • CacheHtmlFlow will initiate a delete of CriticalLineInfo used by SplitHtmlFilter.

Details Still Need to be Resolved

  • Cache Invalidation

Advantages

  • More integrated with the default request processing path and other rewriters.
  • Removes the dependency on headless browser.
  • Rewriting happens in the line of request, thus allows applying browser dependent features.webk
Clone this wiki locally