Agent-Accessible Content auditor — six industry-standard pillars, fully traceable evidence.
Clipper audits live URLs against six published web standards (W3C Semantic HTML, Mozilla Readability, Schema.org, WCAG 2.1 / axe-core, Dublin Core / OpenGraph, IETF RFC 7231) and returns a per-pillar evidence report with a complete audit trail — no APIs, no credentials, no external dependencies.
What Clipper is and is not (v2.1, May 2026). Clipper is an agent-accessibility auditor: it measures whether a page meets each of the six published standards above. The per-pillar scores are real signals; that is what Clipper's evidence speaks to.
Clipper is not a validated predictor of AI-citation behavior or retrieval-augmented generation accuracy. Earlier versions advertised a composite "agent-readiness" headline (
parseability_score,universal_score) that aggregated the six pillars; that composite was pre-registered, ship-gated, and falsified on its first clean held-out test:
- Pre-registered in findings/post-v2-roadmap.md §6 with a Pearson r ≥ +0.35 ship gate, before validation data was collected.
- Calibrated on corpus-002 (n=43) where it produced r=+0.62 — but the same corpus also selected the pillar weights via scripts/gamma-experiments.py, so the +0.62 was an in-sample maximum, not held-out.
- Falsified on held-out corpus-003 (n=171) under three independent judges (Llama-3.3-70B, GPT-4o, DeepSeek-V3.2) — see evaluation/phase5-results/corpus-003-analysis/session-9.5-verdict.md. r ≈ −0.20 across all three judges; the gate was missed by ~0.55 points and the sign flipped.
v2.1 keeps the falsified composites available behind
--include-compositefor backward compatibility with existing automation. The default is--diagnostic-mode: composites are suppressed, pillar-level evidence is what you get. The retrieval-prediction research line is closed; see findings/v2.1-release-scope.md and findings/clipper-next-design.md (parked).Use Clipper to answer: "Does this page meet WCAG / Schema.org / RFC 7231 / Mozilla Readability / HTML5 semantic / Dublin Core expectations, and where does it fall short?" — questions Clipper has evidence for. Do not use it as a single-number ranker of pages by AI-readiness.
- Overview
- Clipper Standards Framework
- Quick Start
- Installation
- CLI Usage
- Enterprise Features
- Quick Start Demo
- Standards Authority Mapping
- Example: Audit Trail Reports
- Scoring System
- GitHub Integration
- File Structure
- Real-World Use Cases
- Contributing
Clipper provides standards-based content auditing with:
- Industry Standards: Every pillar score traceable to a recognized authority (W3C, Schema.org, Mozilla Readability, WCAG, RFC 7231, Dublin Core).
- Zero API Dependencies: Local evaluation using established standards frameworks.
- Immediate Usability: Runs directly from the command line; no setup beyond
pip install. - Enterprise Defensible: Per-pillar audit trails identifying what was evaluated and why each score was assigned.
- Pre-registration trail: The composites that were rolled up from the pillars were ship-gated and falsified on a held-out corpus before this release; the falsification record is in the repo.
Core question (auditor framing): Does this page meet each of the six published web standards Clipper audits, and which fail? — answered with per-pillar evidence.
Question Clipper does not answer: Will an LLM agent cite this page over another one? — Clipper has tested whether its pillar composite predicts that on held-out data and the answer was no. Cross-page ranking by single-number AI-readiness is not a supported use.
# Standards-based dependencies - no APIs required
axe-selenium-python # WCAG 2.1 DOM navigability (Deque Systems)
selenium # W3C WebDriver standard
extruct # Schema.org structured data (W3C)
readability-lxml # Mozilla Readability content extraction
httpx # Modern HTTP standard (RFC compliance)
beautifulsoup4 # HTML parsing standard- 🏗️ W3C Semantic HTML (25%) - HTML5 semantic elements, ARIA roles
- 📄 Content Extractability (20%) - Mozilla Readability signal-to-noise analysis
- 📊 Schema.org Structured Data (20%) - JSON-LD quality, type validation, field completeness
- 🛡️ DOM Navigability (15%) - WCAG 2.1 / Deque axe-core DOM evaluation
- 🏷️ Metadata Completeness (10%) - Dublin Core, Schema.org, OpenGraph field coverage
- 🌐 HTTP Compliance (10%) - Reachability, redirects, robots.txt, cache headers, agent content hints
STANDARDS_AUTHORITY = {
'semantic_html': 'HTML5 Semantic Elements (W3C)',
'content_extractability': 'Mozilla Readability (Firefox Reader View algorithm)',
'structured_data': 'Schema.org (Google/Microsoft/Yahoo)',
'dom_navigability': 'WCAG 2.1 AA (W3C) + axe-core (Deque Systems)',
'metadata_completeness': 'Dublin Core + Schema.org + OpenGraph',
'http_compliance': 'RFC 7231 + robots.txt + Cache headers'
}- ✅ Every Score Traceable - No black box algorithms
- ✅ Audit Trail Generated - Complete evaluation methodology documented
- ✅ Standards Compliance - Built on recognized industry authorities
- ✅ Reproducible Results - Same evaluation across different environments
Try Clipper immediately: No API keys, no setup required.
# Works from any Copilot conversation (performance mode default)
python main.py express --urls https://your-docs.com --out results/# 1. Clone repository
git clone https://github.com/your-org/clipper-content-evaluation.git
cd clipper-content-evaluation
# 2. Install standards-based dependencies
pip install -r requirements.txt
# 3. Ready to evaluate immediately!
python main.py express --help- Python 3.7+
- No API keys required ✅
- No external services needed ✅
- Works completely offline ✅
Clipper is designed for immediate use from GitHub Copilot conversations:
# Just run it - no configuration needed
python main.py express urls/clipper-demo-urls.txt --out evaluation-resultsClipper provides a complete standards-based evaluation pipeline:
Run complete Access Gate evaluation in one command:
# Single URL evaluation
python main.py express --urls https://developer.upsun.com/api/sdk/php --out results/
# Multiple URLs from file (batch optimized)
python main.py express samples/urls.txt --out comprehensive-results/ --name evaluation
# Copilot-friendly (minimal output, maximum speed)
python main.py express urls.txt --out results/ --quiet
# Debug mode (slower, detailed analysis)
python main.py express urls.txt --out results/ --standard
# Performance benchmarking
python main.py express urls.txt --out results/ --benchmark
# Rendering-mode dimension (Phase 3.1)
# raw: models non-JS agents (RAG crawlers, indexers)
# rendered: models JS-executing agents (default)
# both: produces a per-URL delta and flags JS-dependent pages
python main.py express urls.txt --out results/ --render-mode bothclipper history <url> walks every *_scores.json file under --root
(default evaluation/) and prints one row per prior evaluation of that
URL, sorted by score-file mtime, with the parseability delta vs. the
previous row. Use it to confirm a page has actually improved rather than
just regressed and recovered:
python main.py history https://learn.microsoft.com/en-us/azure/aks/faq
# Machine-readable
python main.py history https://learn.microsoft.com/en-us/azure/aks/faq --jsonFor detailed analysis, run individual components:
# 1. Crawl URLs (capture HTML snapshots)
python main.py crawl samples/urls.txt --out snapshots/
# 2. Parse Content (extract structural signals)
python main.py parse snapshots/ --out parse-results.json
# 3. Standards Evaluation (Clipper methodology)
python main.py score parse-results.json --out scores.json
# 4. Generate Report (actionable insights)
python main.py report scores.json --md comprehensive-report.mdTest for agent-friendly content formats:
# HTTP content negotiation analysis
python main.py negotiate urls.txt --out negotiation-results/Every evaluation generates comprehensive documentation:
{
"standards_authority": {
"semantic_html": "HTML5 Semantic Elements (W3C)",
"content_extractability": "Mozilla Readability (Firefox Reader View algorithm)",
"structured_data": "Schema.org (Google/Microsoft/Yahoo)",
"dom_navigability": "WCAG 2.1 AA (W3C) + axe-core (Deque Systems)",
"metadata_completeness": "Dublin Core + Schema.org + OpenGraph",
"http_compliance": "RFC 7231 + robots.txt + Cache headers"
},
"audit_trail": {
"dom_navigability": {
"standard": "WCAG 2.1 AA (W3C) + axe-core (Deque Systems)",
"method": "Automated DOM navigability evaluation",
"violations_count": 3,
"passes_count": 47
},
"content_extractability": {
"standard": "Mozilla Readability",
"extraction_ratio": 0.45,
"extracted_text_length": 12340,
"structure_preservation": 28
}
},
"evaluation_methodology": "Clipper Standards-Based Access Gate"
}- Standards mapping for each component
- Evaluation methodology documentation
- Score calculation transparency
- Industry authority references
# Quality gate integration
python main.py express staging-urls.txt --out quality-gate/ --quiet
if jq '.parseability_score >= 70' quality-gate/report_scores.json; then
echo "✅ Quality gate passed"
else
echo "❌ Quality gate failed - see audit trail"
fi
# Batch evaluation (optimized performance)
python main.py express production-urls.txt --out batch-audit/ --name prod-audit
# Debug mode for detailed analysis
python main.py express problem-urls.txt --out debug-analysis/ --standard5-Minute Clipper Validation:
# 1. Test with a documentation URL
echo "https://learn.microsoft.com/en-us/azure/azure-functions/functions-overview" > test-url.txt
python main.py express test-url.txt --out demo-results --name validation
# 2. Review standards-based results
cat demo-results/validation.md
# 3. Examine audit trail
jq '.audit_trail' demo-results/validation_scores.jsonExpected Output:
Clipper Evaluation Results:
├─ Total URLs: 1
├─ Average Score: 60.7/100
└─ Agent-Ready: 0/1 (0.0%)
Component Breakdown:
semantic_html: 72.7/100 (HTML5 Semantic Elements)
content_extractability: 74.5/100 (Mozilla Readability)
structured_data: 12.0/100 (Schema.org)
dom_navigability: 35.0/100 (WCAG 2.1 / axe-core)
metadata_completeness: 100.0/100 (Dublin Core / OpenGraph)
http_compliance: 100.0/100 (RFC 7231 / robots / cache)
Clipper builds on established industry standards:
| Pillar | Authority | Implementation | Weight |
|---|---|---|---|
| Semantic HTML | W3C HTML5 Specification | BeautifulSoup + html5lib | 25% |
| Content Extractability | Mozilla Readability | readability-lxml | 20% |
| Structured Data | Schema.org Consortium | extruct library | 20% |
| DOM Navigability | W3C + Deque Systems | axe-selenium-python | 15% |
| Metadata Completeness | Dublin Core / Schema.org / OpenGraph | BeautifulSoup | 10% |
| HTTP Compliance | IETF RFC 7231 + robots.txt | httpx | 10% |
🏛️ No Custom Algorithms: Every score component is traceable to recognized industry standards.
Clipper generates comprehensive audit documentation:
## Clipper Access Gate Evaluation
**Final Score:** 60.7/100 (moderate_issues)
**Evaluation Methodology:** Standards-Based Access Gate
**Standards Compliance:** 6/6 frameworks evaluated
### Pillar Analysis
- **Semantic HTML**: 72.7/100 (Good semantic coverage, ARIA roles present)
- **Content Extractability**: 74.5/100 (Clean extraction via Readability, structure preserved)
- **Structured Data**: 12.0/100 (Limited JSON-LD quality, missing key fields)
- **DOM Navigability**: 35.0/100 (Accessibility violations detected, capped per-rule)
- **Metadata Completeness**: 100.0/100 (All metadata fields present)
- **HTTP Compliance**: 100.0/100 (Reachable, no robots blocks, cache headers present)### Priority Fixes (Standards-Based)
🔥 **Critical - Structured Data Quality**
- Add complete JSON-LD with @type, name, author, dateModified, description
- Validate Schema.org required properties for declared types
- Include OpenGraph and microdata alongside JSON-LD
⚠️ **Important - DOM Navigability**
- Add `aria-label` attributes to navigation elements
- Ensure color contrast ratios meet WCAG AA standards
- Fix heading hierarchy violations
📋 **Recommended - Semantic HTML**
- Add `<main>` element wrapper (HTML5 semantic requirement)
- Implement proper heading hierarchy (h1 → h2 → h3)
- Use `<article>` elements for content sections
⚠️ Composite generalization status (v2.1, May 2026). The composite headline scores below (parseability_scoreanduniversal_score) were calibrated on Clipper's corpus-002 (n=43, single grader architecture). On a held-out corpus-003 (n=171, three independent judges, variance restored via Phi-4-mini scorer) the composite does not generalize: Pearson r between composite and judged QA accuracy is approximately −0.20 under all three judges, against a pre-registered ship-gate target of r ≥ +0.35. The composite was pre-registered, ship-gated, and falsified on its first clean held-out test. The corpus-002 r=+0.62 was an in-sample maximum (the same corpus selected the pillar weights via the γ experiments), not a held-out validation result.The per-pillar measurements that feed the composite remain real signals against published standards (W3C, Schema.org, Mozilla, WCAG, RFC 7231, Dublin Core) and are unchanged. Composites are suppressed by default in v2.1 (
--diagnostic-modeis the default; pass--include-compositeto opt in for backward compatibility). The "Access Gate Classification" bands below are corpus-002 internal-consistency diagnostics; they are not validated to predict retrieval or AI-citation behavior on arbitrary pages and should not be used to rank pages against each other.See findings/post-v2-roadmap.md, findings/v2.1-release-scope.md, and evaluation/phase5-results/corpus-003-analysis/session-9.5-verdict.md for the full pre-registration / falsification trail.
Clipper's primary output is the per-pillar component_scores block — six numbers, each measuring conformance to one published standard. Each pillar score is independently meaningful and should be the basis of any cross-page comparison.
For backward compatibility, Clipper can also emit two 0–100 composite numbers (suppressed by default in v2.1):
parseability_score— the type-adjusted composite. Clipper detects whether the page is an article, landing page, tutorial, FAQ, reference, or code sample, and reweights the six pillars accordingly. Available with--include-composite. Did not generalize on corpus-003.universal_score— the same pillar scores under the default article weights. Available with--include-composite. Did not generalize on corpus-003.
The content type, detection signal, and full weight table used for each page are recorded under audit_trail._content_type. See docs/scoring.md#content-type-profiles for the profile table and detection precedence.
Every result also carries a methodology block (always present, in default and --include-composite mode) stating the calibration corpus, generalization status, and recommended use:
"methodology": {
"scoring_version": "v2-evidence-partial",
"calibration_corpus": "corpus-002",
"generalization_status": "falsified on corpus-003 (Pearson r ≈ −0.20 across three judges vs ship gate +0.35); see evaluation/phase5-results/corpus-003-analysis/session-9.5-verdict.md",
"recommended_use": "per-pillar audit; composite is not validated for ranking pages",
"release": "v2.1"
}Access Gate Classification (internal-consistency diagnostic on corpus-002 — see calibration note above)
- 90-100:
clean- Fully agent-ready - 75-89:
minor_issues- Nearly agent-ready - 60-74:
moderate_issues- Improvements needed - 40-59:
significant_issues- Major optimization required - 0-39:
severe_issues- Substantial restructuring needed partial_evaluation- One or more pillars could not be evaluated (e.g., network timeout). The final score is a weighted average over the surviving pillars and the dropped pillars are listed infailed_pillars. See docs/scoring.md for the full contract.evaluation_error- Every pillar failed; no usable score.
Based on agent retrievability impact:
- Semantic HTML (25%) - Essential for content structure and agent parsing
- Content Extractability (20%) - Can agents cleanly extract the content?
- Structured Data (20%) - Machine-readable metadata for agent understanding
- DOM Navigability (15%) - Accessible DOM structure for crawlers
- Metadata Completeness (10%) - Identity, authorship, and currency signals
- HTTP Compliance (10%) - Reachability, crawl permissions, cacheability, agent content hints
Clipper can evaluate each URL under two assumptions via --render-mode raw|rendered|both:
rendered(default) — models agents that execute JavaScript. DOM navigability runs in headless Chrome via axe-core.raw— models non-JS agents (RAG crawlers, search indexers, API clients). DOM navigability falls back to static analysis.both— produces twoScoreResultentries per URL and a "Rendering-Mode Deltas" section. Pages with|rendered - raw| >= 15are flagged as JS-dependent. Treatmin(rendered, raw)as the pessimistic score of record.
See docs/scoring.md#rendering-modes for the full explanation.
The Content Extractability score (20% of overall) uses Mozilla Readability to measure extraction quality:
| Sub-signal | Max Points | What it measures |
|---|---|---|
| Signal-to-Noise Ratio | 40 | Ratio of extracted meaningful text to total page text. Optimal range: 0.3-0.8. |
| Structure Preservation | 30 | Do headings, lists, and code blocks survive extraction? (10 pts each category) |
| Boundary Detection | 30 | Did Readability find a clear article boundary? Checks title extraction, content length, and <main>/<article> overlap. |
The Structured Data score (20% of overall) evaluates schema quality, not just presence:
| Sub-signal | Max Points | What it measures |
|---|---|---|
| Type Appropriateness | 20 | Does the @type match recognized content types (Article, WebPage, HowTo, etc.)? |
| Field Completeness | 30 | Per-type required + recommended fields for the four validated @type values. See below. |
| Multiple Formats | 20 | Are JSON-LD, OpenGraph, and microdata all present? |
| Schema Validation | 30 | Are required properties present for the declared Schema.org type? |
Per-type field expectations (Field Completeness is computed per JSON-LD item, averaged across validated items):
@type |
Required | Recommended |
|---|---|---|
Article |
headline, datePublished |
author, dateModified, description, publisher |
FAQPage |
mainEntity (non-empty list of Question entries with acceptedAnswer) |
— |
HowTo |
name, step (non-empty list) |
description, totalTime |
BreadcrumbList |
itemListElement (list with ≥2 items) |
— |
Items of other @type values fall back to a generic key-field check. Missing and structurally invalid fields are logged in audit_trail.structured_data.field_completeness_detail. See docs/scoring.md for the full specification.
The HTTP Compliance score (10% of overall) is split into five sub-signals:
| Sub-signal | Max Points | What it measures |
|---|---|---|
| HTML Reachability | 15 | Does the URL serve a 200 response to Accept: text/html? |
| Redirect Efficiency | 25 | Chain length (0 hops optimal, >4 penalized), proper status codes, performance impact. |
| Crawl Permissions | 20 | robots.txt allows access + no <meta name="robots" content="noindex"> blocking. |
| Cache Headers | 20 | Presence of ETag, Last-Modified, and Cache-Control headers. |
| Agent Content Hints | 20 | Signals that the page offers machine-readable alternate formats or LLM-specific endpoints. |
Agent Content Hints detects:
<link rel="alternate" type="text/markdown">(6 pts) — markdown alternate link<meta name="markdown_url">(4 pts) — markdown URL metadata (e.g. Microsoft Learn)data-llm-hintattributes (4 pts) — explicit LLM guidance in HTMLllms.txtreferences (3 pts) — site-level LLM endpoint declaration- Non-HTML
<link rel="alternate">(3 pts) — any non-HTML alternate format (JSON, XML, etc.)
The Metadata Completeness score (10% of overall) checks for 7 key fields across Dublin Core, Schema.org, and OpenGraph:
| Field | Max Points | Sources checked |
|---|---|---|
| Title | 15 | <title>, og:title, Schema.org name/headline |
| Description | 15 | <meta name="description">, og:description, Schema.org description |
| Author/Publisher | 15 | <meta name="author">, Schema.org author/publisher |
| Date | 15 | <meta> date tags, Schema.org dateModified/datePublished, <time> elements |
| Topic/Category | 15 | <meta name="ms.topic">, Schema.org articleSection, <meta name="keywords"> |
| Language | 10 | <html lang="">, <meta http-equiv="content-language"> |
| Canonical URL | 15 | <link rel="canonical"> |
Clipper integrates seamlessly with CI/CD workflows:
# .github/workflows/clipper-quality-gate.yml
name: Clipper Quality Gate
on: [pull_request]
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Clipper Standards Evaluation
run: |
python main.py express docs-urls.txt --out pr-evaluation/ --quiet
# Gate on pillars individually — each maps to a published standard.
# Don't gate on composite (parseability_score) — it failed corpus-003.
jq -e '.[] | select(.component_scores.http_compliance < 70 or .component_scores.semantic_html < 60)' \
pr-evaluation/report_scores.json && exit 1 || exit 0No API keys required - works immediately in any CI environment!
clipper/
├─ README.md # This comprehensive guide
├─ USER-INSTRUCTIONS.md # End-user walkthrough
├─ main.py # CLI entry point
├─ requirements.txt # Standards-based dependencies
├─ retrievability/
│ ├─ cli.py # Clipper CLI interface
│ ├─ access_gate_evaluator.py # Standards-based evaluation engine
│ ├─ performance_evaluator.py # Async/parallel evaluator
│ ├─ score.py / performance_score.py # Scoring orchestration
│ ├─ crawl.py # URL acquisition + redirect tracking
│ ├─ parse.py # Content signal extraction
│ ├─ report.py # Audit trail + markdown report
│ └─ schemas.py # Clipper data structures
├─ urls/ # Curated URL lists for evaluation
├─ samples/ # Sample URLs and snapshots
├─ scripts/ # Automation utilities
└─ docs/ # Technical documentation
# Pre-publication quality gates
python main.py express staging-docs-urls.txt --out quality-check/
# Get standards-based compliance report immediately# Quarterly accessibility audits
python main.py express corporate-docs.txt --out compliance-audit/
jq '.audit_trail.dom_navigability' compliance-audit/report_scores.json# Validate agent-ready content
python main.py express api-documentation.txt --out agent-readiness/
# Verify content extractability, structured data, and metadata coverage# Regression testing for content changes
python main.py express --urls https://docs.updated-site.com --out regression-test/
# Compare against baseline standards compliance- ✅ Per-pillar evidence against published standards — each score traceable to a named authority (W3C, Schema.org, Mozilla, WCAG/axe-core, IETF RFC 7231, Dublin Core).
- ✅ API-free operation — works in any environment with Python and Chrome.
- ✅ Audit trail per pillar — the JSON output records which checks fired and why.
- ✅ Pre-registration discipline — v2's composite was pre-registered, ship-gated, and falsified before this release; the trail is in findings/post-v2-roadmap.md and evaluation/phase5-results/corpus-003-analysis/session-9.5-verdict.md. That makes Clipper's claims about what its pillars do (and don't) measure unusually defensible.
- ❌ Single-number AI-readiness ranking. The composite that aggregated the six pillars failed its held-out validation (corpus-003, three judges, r ≈ −0.20). Don't use it to rank pages.
- ❌ Predict citation share or RAG accuracy. Tracks for those measurements were scoped in findings/clipper-next-design.md and parked — they need a different scale of evidence than Clipper has.
- ❌ Replace WCAG conformance review. Clipper runs axe-core in headless Chrome, which catches a defined automated subset of WCAG issues. Manual conformance review remains separate.
🎯 Result: A defensible, per-pillar standards auditor for documentation and content teams. Use it to find specific failures against specific standards. Don't use it as a popularity oracle.
Clipper welcomes contributions that enhance standards-based evaluation:
- Standards Integration - Add support for additional industry standards
- Evaluation Enhancement - Improve component-specific analysis
- Enterprise Features - Expand audit trail and compliance documentation
- Agent Optimization - Enhance agent-focused content quality metrics
Pillar behavior is locked in by a small offline fixture suite. Install dev dependencies and run pytest:
pip install -r requirements-dev.txt
pytest -qThe suite completes in under a second, requires no network or browser, and runs automatically in CI on every pull request. See docs/testing.md for the fixture layout and guidance on adding new fixtures.
Clipper - Standards-Based Access Gate Evaluator. Licensed under the MIT License.
🚀 Clipper: Where industry standards meet agent-ready content evaluation.