Wasteback Machine

Overview

Wasteback Machine is a JavaScript library for analysing archived web pages, measuring their size and composition to enable retrospective, quantitative web research.

Features

Multi-archive support: Supports 20+ web archives and is extensible to additional archives.
Aggregate mementos: Retrieve memento-datetimes for a target URL from an archive’s CDX server.
Analyse page composition: Break down archived web pages by resource type, including HTML, stylesheets, scripts, images, etc.
Calculate size metrics: Compute total and per-type sizes, including counts and bytes.
Generate resource inventory: Optionally produce an inventory of all resources with metadata.
Completeness scoring: Assess how fully an archived web page was retrieved.
CLI utility: Analyse archived web pages directly from the command line.

Installation

npm i @overbrowsing/wasteback-machine

Usage

Wasteback Machine provides two functions:

getMementos: Fetch all memento-datetimes from the CDX server of a supported web archive for a given URL.
analyseMemento: Analyses the size and composition of an archived web page from a supported web archive.

1. Fetch Available Memento-datetimes (`getMementos`)

Fetch all memento-datetimes for https://nytimes.com, from the Internet Archive (🆔 = ia).

import { getMementos } from "@overbrowsing/wasteback-machine";

const mementos = await getMementos(
  "ia", // Web archive ID (🆔 = ia, Internet Archive)
  "https://nytimes.com", // Target URL
);

console.log(mementos);

Example Output

[
  '19961112181513', '19961121230155', '19961219002950', '19961220073509',
  '19961226135029', '19961228014508', '19961230230427', '19970209220858',
  '19970303103041', '19970414192930', '19970414210143', '19970415180120',
  ... 688983 more items
]

2. Analyse An Archived Web Page (`analyseMemento`)

Analyse the archived snapshot of https://nytimes.com, November 12, 1996, from the Internet Archive (🆔 = ia).

Tip

If you provide a full 14-digit datetime (YYYYMMDDHHMMSS) using getMementos, Wasteback Machine skips the TimeGate (URI-G) lookup, improving performance.

import { analyseMemento } from "@overbrowsing/wasteback-machine";

const mementoData = await analyseMemento(
  "ia", // Web archive ID (🆔 = ia, Internet Archive)
  "https://nytimes.com", // Target URL
  "19961112", // Target memento-datetime (YYYYMMDDhhmmss); minimum input: YYYY
  { includeResources: true } // Resource list (true/false)
);

console.log(mementoData);

Example Output

{
  target: {
    url: 'https://nytimes.com', 
    datetime: '19961112'
  },
  memento: {
    url: 'https://web.archive.org/web/19961112181513if_/https://nytimes.com',
    datetime: '19961112181513',
  },
  archive: {
    name: 'Internet Archive (Wayback Machine)',
    organisation: 'Internet Archive',
    country: 'United States of America',
    continent: 'North America',
    url: 'https://web.archive.org',
  },
  sizes: {
    html: { bytes: 1653, count: 1 },
    stylesheet: { bytes: 0, count: 0 },
    script: { bytes: 0, count: 0 },
    image: { bytes: 46226, count: 2 },
    video: { bytes: 0, count: 0 },
    audio: { bytes: 0, count: 0 },
    font: { bytes: 0, count: 0 },
    flash: { bytes: 0, count: 0 },
    plugin: { bytes: 0, count: 0 },
    data: { bytes: 0, count: 0 },
    document: { bytes: 0, count: 0 },
    other: { bytes: 0, count: 0 },
    total: { bytes: 47879, count: 3 }
  },
  completeness: '100%',
  resources: [
    {
      url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/index.gif',
      type: 'image',
      size: 45259
    },
    {
      url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/free-images/marker.gif',
      type: 'image',
      size: 967
    }
  ]
}

Supported Web Archives

Each supported web archive has a unique web archive ID (🆔) required for API calls. The table also indicates which functions each archive supports.

Web Archive	Organisation	🆔	`getMementos`	`analyseMemento`
Arquivo.pt	🇵🇹 FCCN/FCT	arq	✓	✓
National Library and Archives of Quebec (BAnQ) Web Archiving	🇨🇦 National Library and Archives of Quebec (BAnQ)	banq	✕	✓
Columbia University Libraries Web Archives	🇺🇸 Columbia University Libraries	cul	✓	✓
Webarchiv	🇨🇿 National Library of the Czech Republic	cz	✓	✓
European Union Web Archive	🇪🇺 European Union	euwa	✓	✓
Estonian Web Archive	🇪🇪 National Library of Estonia	ewa	✓	✓
Government of Canada Web Archive	🇨🇦 Library and Archives Canada	gcwa	✓	✓
Croatian Web Archives (HAW)	🇭🇷 National and University Library in Zagreb	haw	✓	✓
Internet Archive (Wayback Machine)	🇺🇸 Internet Archive	ia	✓	✓
Icelandic Web Archive (Vefsafn.is)	🇮🇸 National and University Library of Iceland	iwa	✓	✓
Library of Congress Web Archive	🇺🇸 Library of Congress	loc	✕	✓
National Library of Ireland Web Archive	🇮🇪 National Library of Ireland	nliwa	✓	✓
National Library of Medicine	🇺🇸 National Library of Medicine	nlm	✓	✓
National Records of Scotland Web Archive	🏴󠁧󠁢󠁳󠁣󠁴󠁿 National Records of Scotland	nrs	✓	✓
Norwegian Web Archive	🇳🇴 National Library of Norway	nwa	✓	✓
New Zealand Web Archive	🇳🇿 National Library of New Zealand	nzwa	✓	✓
The Web Archive of Catalonia (Padicat)	🇪🇸 Library of Catalonia	padicat	✓	✓
PRONI Web Archive	🇬🇧 The Public Record Office of Northern Ireland	proni	✓	✓
Smithsonian Institution Archives	🇺🇸 Smithsonian Libraries and Archives	sia	✓	✓
Spletni Arhiv	🇸🇮 National and University Library of Slovenia	slo	✕	✓
Australia Web Archive (Trove)	🇦🇺 National Library of Australia	trove	✕	✓
UK Government Web Archive (UKGWA)	🇬🇧 The National Archives	ukgwa	✓	✓
University of North Texas Web Archives	🇺🇸 University of North Texas University Libraries	untwa	✓	✓
York University Digital Library	🇨🇦 York University Libraries	yudl	✓	✓

Adding Web Archives

Wasteback Machine can support additional web archives if they meet the following criteria:

Provide a CDX server API (required for getMementos).
Support the Memento Protocol (RFC7089) (required for analyseMemento).
Support replay state modifiers (URL Rewrite Type Modifier) endpoints for both:
- Raw content (see example).
- Navigational toolbars suppressed (see example).

To request support for an archive that meets these criteria, submit an issue using the template.

CLI

Wasteback Machine CLI lets you analyse an archived web page to view its size, composition, and estimated emissions using CO2.js and the Sustainable Web Design Model.

Quick Start

After installation, start the CLI:

npx cli

CLI Prompts

1. Enter web archive ID ('help' to list archives or [Enter ↵] = Internet Archive):
2. Enter URL to analyse:
3. Enter target year (YYYY):
4. Enter target month (MM or [Enter ↵] = 01):
5. Enter target day (DD or [Enter ↵] = 01):

Example Output

________________________________________________________

MEMENTO INFO

  Memento URL:    https://web.archive.org/web/19961112181513if_/https://nytimes.com
  Web Archive:    Internet Archive (Wayback Machine)
  Organisation:   Internet Archive
  Website:        https://web.archive.org

________________________________________________________

PAGE SIZE

  Data:           46.76 KB
  Emissions:      0.014 g CO₂e
  Completeness:   100%

________________________________________________________

PAGE COMPOSITION

  HTML
      Count:      1
      Data:       1653 bytes (3.5%)
      Emissions:  0.000 g CO₂e

  IMAGE
      Count:      2
      Data:       46226 bytes (96.5%)
      Emissions:  0.013 g CO₂e

________________________________________________________

Credits

Developed by the Overbrowsing Research Group at the Institute for Design Informatics, The University of Edinburgh, with support in part from the European Association for Digital Humanities (EADH).

Citing

Results generated with Wasteback Machine may be freely cited, quoted, analysed, or republished with attribution to 'Wasteback Machine'. No special permission is required for academic, journalistic, or personal use.

A publication related to this project appeared in the Proceedings of iConference 2026 (view PDF). Please cite as:

Mahoney, D. (2026). Wasteback Machine: a method for quantitative measurement of the archived web. Information Research an International Electronic Journal, 31 (iConf), 448–464. https://doi.org/10.47989/ir31iConf64185

@article{Mahoney_2026,
  author  = {Mahoney, David},
  title   = {Wasteback Machine: a method for quantitative measurement of the archived web},
  journal = {Information Research: An International Electronic Journal},
  volume  = {31},
  number  = {iConf},
  pages   = {448-464},
  year    = {2026},
  month   = {Mar},
  url     = {https://publicera.kb.se/ir/article/view/64185},
  doi     = {10.47989/ir31iConf64185}
}

Licenses

Wasteback Machine is licensed under Apache 2.0. For full licensing details, see the LICENSE file.

Use of Wasteback Machine is subject to the terms, policies and licenses of each respective supported web archive.

Terms

All results generated by Wasteback Machine are provided "as-is" without warranties of any kind, express or implied, including but not limited to accuracy, completeness, or reliability. The authors and contributors accept no liability for any errors, omissions, or consequences arising from the use of this software or the results it produces.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wasteback Machine

Overview

Features

Installation

Usage

1. Fetch Available Memento-datetimes (`getMementos`)

Example Output

2. Analyse An Archived Web Page (`analyseMemento`)

Example Output

Supported Web Archives

Adding Web Archives

CLI

Quick Start

CLI Prompts

Example Output

Credits

Citing

Licenses

Terms

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Wasteback Machine

Overview

Features

Installation

Usage

1. Fetch Available Memento-datetimes (getMementos)

Example Output

2. Analyse An Archived Web Page (analyseMemento)

Example Output

Supported Web Archives

Adding Web Archives

CLI

Quick Start

CLI Prompts

Example Output

Credits

Citing

Licenses

Terms

1. Fetch Available Memento-datetimes (`getMementos`)

2. Analyse An Archived Web Page (`analyseMemento`)