Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error page optimisations #299

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

RealOrangeOne
Copy link
Member

@RealOrangeOne RealOrangeOne commented Jul 25, 2024

Description of Changes Made

This PR makes 2 notable changes:

Serve simpler 404 pages when possible

Our 404 page is a fancy HTML page, comprised of multiple templates, and requiring a number of DB queries to create (not many queries, granted). If a person in a browser loads a page, we want to show them this "fancy" 404 page for a better user experience. However, if the request shouldn't return HTML (eg it's a missing static file) or user never asked for HTML, we shouldn't spend the time creating a fancy 404 page if it's never going to be viewed.

Instead, when possible, we show a simplified HTML page, which just contains text. This requires much fewer resources to generate, and is quicker to serve.

Cache 404 pages

This one might be controversial. 😬

If a page returns a 404, chances are it'll still be a 404 in 10 minutes time, or even longer. Therefore, it's probably something which can be cached to reduce system load.

According to RFC2616, 404s should not be cached. However, for our use case, I think it's worth it. The TTL is intentionally shorter than it probably could be, but this could be increased in future.

In Wagtail, a request will always do a database query. Potentially multiple depending on how much of the path does exist. Therefore, missing pages can result in higher than expected usage, and won't be cached by an edge cache. Worse still, because the 404 pages usually shown are fancy HTML versions, they may do queries in themselves (for eg navigation), making 404s more expensive still.

By caching the 404, we reduce the impact on users viewing it in future, especially useful if a site is being crawled, as many frontend caches will normalise URLs before caching (ours sure does).

If a 404 has been cached, and a page is created in its place, Wagtail's existing frontend caching will purge the 404s cache during publishing.

Related reading:

How to Test

This can be tested in the browser, by confirming the correct 404 is shown. The unit tests give a few useful examples. Similarly, curl can be used to manually exercise the header.

Note: If no Accept header is passed, Django assumes */*.

MR Checklist

  • Add a description of your pull request and instructions for the reviewer to verify your work.
  • If your pull request is for a specific ticket, link to it in the description.
  • Stay on point and keep it small so the merge request can be easily reviewed.
  • Tests and linting passes.

Unit tests

  • Added
  • Not required

Documentation

Browser testing

  • I have tested in the following browsers and environments (edit the list as required)
    • Latest version of Chrome on mac
    • Latest version of Firefox on mac
    • Latest version of Safari on mac
    • Safari on last two versions of iOS
    • Chrome on last two versions of Android
  • Not required

Data protection

  • Not relevant
  • This adds new sources of PII and documents it and modifies Birdbath processors accordingly

Accessibility

  • Automated WCAG 2.1 tests pass
  • HTML validation passes
  • Manual WCAG 2.1 tests completed
  • I have tested in a screen reader
  • I have tested in high-contrast mode
  • Any animations removed for prefers-reduced-motion
  • Not required

Sustainability

  • Images are optimised and lazy-loading used where appropriate
  • SVGs have been optimised
  • Perfomance and transfer of data considered
  • If JavaScript is needed alternatives have been considered
  • Not required

Pattern library

  • The pattern library component for this template displays correctly, and does not break parent templates
  • The styleguide is updated if relevant
  • Changes are not relevant the pattern library

I've upstreamed some helper methods which would make this kind of content negotiation much simpler in future: django/django#18415

@helenb
Copy link
Member

helenb commented Jul 29, 2024

I will have a go at reviewing this, but probably worth a second pair of eyes from a 'proper' back-end dev - @zerolab maybe you if you have time?

Re caching the 404 page: could you check with Liv if there are any SEO implications?

@RealOrangeOne
Copy link
Member Author

This is still very draft (hence the status). It's not really reviewable or shippable yet. I need to do some more thinking and testing to make sure the Accept handling is correct, and how it plays with caching.

SEO wise, I highly doubt bots pay much attention to the content. Sending the correct mime type ought to handle most things. But I'll check

This saves queries and processing time, if the HTML isn't ever going to be rendered, and simple text would be enough.
This should reduce the impact on missing pages being crawled.
@RealOrangeOne RealOrangeOne force-pushed the error-page-optimisations branch from 949a272 to 489ae30 Compare September 16, 2024 16:14

@requires_csrf_token
@vary_on_headers("Accept")
@cache_control(max_age=900) # 15 minutes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this maybe set s_maxage too?

"s_maxage": s_maxage,

return True


@requires_csrf_token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these need csrf? Doesn't that skip the cache completely?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants