Skip to content

ps0394/Clipper

Repository files navigation

Clipper - Command-Line Interface Progressive Performance Evaluation & Reporting

Agent-Accessible Content auditor — six industry-standard pillars, fully traceable evidence.

Clipper audits live URLs against six published web standards (W3C Semantic HTML, Mozilla Readability, Schema.org, WCAG 2.1 / axe-core, Dublin Core / OpenGraph, IETF RFC 7231) and returns a per-pillar evidence report with a complete audit trail — no APIs, no credentials, no external dependencies.

What Clipper is and is not (v2.1, May 2026). Clipper is an agent-accessibility auditor: it measures whether a page meets each of the six published standards above. The per-pillar scores are real signals; that is what Clipper's evidence speaks to.

Clipper is not a validated predictor of AI-citation behavior or retrieval-augmented generation accuracy. Earlier versions advertised a composite "agent-readiness" headline (parseability_score, universal_score) that aggregated the six pillars; that composite was pre-registered, ship-gated, and falsified on its first clean held-out test:

  • Pre-registered in findings/post-v2-roadmap.md §6 with a Pearson r ≥ +0.35 ship gate, before validation data was collected.
  • Calibrated on corpus-002 (n=43) where it produced r=+0.62 — but the same corpus also selected the pillar weights via scripts/gamma-experiments.py, so the +0.62 was an in-sample maximum, not held-out.
  • Falsified on held-out corpus-003 (n=171) under three independent judges (Llama-3.3-70B, GPT-4o, DeepSeek-V3.2) — see evaluation/phase5-results/corpus-003-analysis/session-9.5-verdict.md. r ≈ −0.20 across all three judges; the gate was missed by ~0.55 points and the sign flipped.

v2.1 keeps the falsified composites available behind --include-composite for backward compatibility with existing automation. The default is --diagnostic-mode: composites are suppressed, pillar-level evidence is what you get. The retrieval-prediction research line is closed; see findings/v2.1-release-scope.md and findings/clipper-next-design.md (parked).

Use Clipper to answer: "Does this page meet WCAG / Schema.org / RFC 7231 / Mozilla Readability / HTML5 semantic / Dublin Core expectations, and where does it fall short?" — questions Clipper has evidence for. Do not use it as a single-number ranker of pages by AI-readiness.

Table of Contents

Overview

Clipper provides standards-based content auditing with:

  • Industry Standards: Every pillar score traceable to a recognized authority (W3C, Schema.org, Mozilla Readability, WCAG, RFC 7231, Dublin Core).
  • Zero API Dependencies: Local evaluation using established standards frameworks.
  • Immediate Usability: Runs directly from the command line; no setup beyond pip install.
  • Enterprise Defensible: Per-pillar audit trails identifying what was evaluated and why each score was assigned.
  • Pre-registration trail: The composites that were rolled up from the pillars were ship-gated and falsified on a held-out corpus before this release; the falsification record is in the repo.

Core question (auditor framing): Does this page meet each of the six published web standards Clipper audits, and which fail? — answered with per-pillar evidence.

Question Clipper does not answer: Will an LLM agent cite this page over another one? — Clipper has tested whether its pillar composite predicts that on held-out data and the answer was no. Cross-page ranking by single-number AI-readiness is not a supported use.

Clipper Standards Framework

Industry-Standard Evaluation Stack (API-Free)

# Standards-based dependencies - no APIs required
axe-selenium-python    # WCAG 2.1 DOM navigability (Deque Systems)
selenium              # W3C WebDriver standard  
extruct               # Schema.org structured data (W3C)
readability-lxml      # Mozilla Readability content extraction
httpx                 # Modern HTTP standard (RFC compliance)
beautifulsoup4        # HTML parsing standard

6-Pillar Evaluation Framework

  1. 🏗️ W3C Semantic HTML (25%) - HTML5 semantic elements, ARIA roles
  2. 📄 Content Extractability (20%) - Mozilla Readability signal-to-noise analysis
  3. 📊 Schema.org Structured Data (20%) - JSON-LD quality, type validation, field completeness
  4. 🛡️ DOM Navigability (15%) - WCAG 2.1 / Deque axe-core DOM evaluation
  5. 🏷️ Metadata Completeness (10%) - Dublin Core, Schema.org, OpenGraph field coverage
  6. 🌐 HTTP Compliance (10%) - Reachability, redirects, robots.txt, cache headers, agent content hints

Standards Authority Mapping

STANDARDS_AUTHORITY = {
    'semantic_html': 'HTML5 Semantic Elements (W3C)',
    'content_extractability': 'Mozilla Readability (Firefox Reader View algorithm)',
    'structured_data': 'Schema.org (Google/Microsoft/Yahoo)',
    'dom_navigability': 'WCAG 2.1 AA (W3C) + axe-core (Deque Systems)',
    'metadata_completeness': 'Dublin Core + Schema.org + OpenGraph',
    'http_compliance': 'RFC 7231 + robots.txt + Cache headers'
}

Enterprise Defensibility

  • ✅ Every Score Traceable - No black box algorithms
  • ✅ Audit Trail Generated - Complete evaluation methodology documented
  • ✅ Standards Compliance - Built on recognized industry authorities
  • ✅ Reproducible Results - Same evaluation across different environments

Quick Start

Try Clipper immediately: No API keys, no setup required.

# Works from any Copilot conversation (performance mode default)
python main.py express --urls https://your-docs.com --out results/

Installation

Instant Setup (API-Free)

# 1. Clone repository
git clone https://github.com/your-org/clipper-content-evaluation.git
cd clipper-content-evaluation

# 2. Install standards-based dependencies  
pip install -r requirements.txt

# 3. Ready to evaluate immediately!
python main.py express --help

Prerequisites

  • Python 3.7+
  • No API keys required ✅
  • No external services needed ✅
  • Works completely offline ✅

Copilot Integration

Clipper is designed for immediate use from GitHub Copilot conversations:

# Just run it - no configuration needed
python main.py express urls/clipper-demo-urls.txt --out evaluation-results

CLI Usage

Clipper provides a complete standards-based evaluation pipeline:

🚀 Express Mode (Recommended)

Run complete Access Gate evaluation in one command:

# Single URL evaluation
python main.py express --urls https://developer.upsun.com/api/sdk/php --out results/

# Multiple URLs from file (batch optimized)  
python main.py express samples/urls.txt --out comprehensive-results/ --name evaluation

# Copilot-friendly (minimal output, maximum speed)
python main.py express urls.txt --out results/ --quiet

# Debug mode (slower, detailed analysis)
python main.py express urls.txt --out results/ --standard

# Performance benchmarking
python main.py express urls.txt --out results/ --benchmark

# Rendering-mode dimension (Phase 3.1)
# raw:      models non-JS agents (RAG crawlers, indexers)
# rendered: models JS-executing agents (default)
# both:     produces a per-URL delta and flags JS-dependent pages
python main.py express urls.txt --out results/ --render-mode both

Trend view: how has a URL scored over time?

clipper history <url> walks every *_scores.json file under --root (default evaluation/) and prints one row per prior evaluation of that URL, sorted by score-file mtime, with the parseability delta vs. the previous row. Use it to confirm a page has actually improved rather than just regressed and recovered:

python main.py history https://learn.microsoft.com/en-us/azure/aks/faq

# Machine-readable
python main.py history https://learn.microsoft.com/en-us/azure/aks/faq --json

Step-by-Step Pipeline

For detailed analysis, run individual components:

# 1. Crawl URLs (capture HTML snapshots)
python main.py crawl samples/urls.txt --out snapshots/

# 2. Parse Content (extract structural signals)  
python main.py parse snapshots/ --out parse-results.json

# 3. Standards Evaluation (Clipper methodology)
python main.py score parse-results.json --out scores.json

# 4. Generate Report (actionable insights)
python main.py report scores.json --md comprehensive-report.md

Content Negotiation Testing

Test for agent-friendly content formats:

# HTTP content negotiation analysis
python main.py negotiate urls.txt --out negotiation-results/

🎯 Enterprise Features

Audit Trail Generation

Every evaluation generates comprehensive documentation:

{
  "standards_authority": {
    "semantic_html": "HTML5 Semantic Elements (W3C)",
    "content_extractability": "Mozilla Readability (Firefox Reader View algorithm)",
    "structured_data": "Schema.org (Google/Microsoft/Yahoo)",
    "dom_navigability": "WCAG 2.1 AA (W3C) + axe-core (Deque Systems)",
    "metadata_completeness": "Dublin Core + Schema.org + OpenGraph",
    "http_compliance": "RFC 7231 + robots.txt + Cache headers"
  },
  "audit_trail": {
    "dom_navigability": {
      "standard": "WCAG 2.1 AA (W3C) + axe-core (Deque Systems)",
      "method": "Automated DOM navigability evaluation",
      "violations_count": 3,
      "passes_count": 47
    },
    "content_extractability": {
      "standard": "Mozilla Readability",
      "extraction_ratio": 0.45,
      "extracted_text_length": 12340,
      "structure_preservation": 28
    }
  },
  "evaluation_methodology": "Clipper Standards-Based Access Gate"
}

Compliance Documentation

  • Standards mapping for each component
  • Evaluation methodology documentation
  • Score calculation transparency
  • Industry authority references

Enterprise Workflows

# Quality gate integration
python main.py express staging-urls.txt --out quality-gate/ --quiet
if jq '.parseability_score >= 70' quality-gate/report_scores.json; then
  echo "✅ Quality gate passed"
else 
  echo "❌ Quality gate failed - see audit trail"
fi

# Batch evaluation (optimized performance)
python main.py express production-urls.txt --out batch-audit/ --name prod-audit

# Debug mode for detailed analysis  
python main.py express problem-urls.txt --out debug-analysis/ --standard

🚀 Quick Start Demo

5-Minute Clipper Validation:

# 1. Test with a documentation URL
echo "https://learn.microsoft.com/en-us/azure/azure-functions/functions-overview" > test-url.txt
python main.py express test-url.txt --out demo-results --name validation

# 2. Review standards-based results  
cat demo-results/validation.md

# 3. Examine audit trail
jq '.audit_trail' demo-results/validation_scores.json

Expected Output:

Clipper Evaluation Results:
├─ Total URLs: 1
├─ Average Score: 60.7/100  
└─ Agent-Ready: 0/1 (0.0%)

Component Breakdown:
  semantic_html: 72.7/100 (HTML5 Semantic Elements)
  content_extractability: 74.5/100 (Mozilla Readability)
  structured_data: 12.0/100 (Schema.org)
  dom_navigability: 35.0/100 (WCAG 2.1 / axe-core)
  metadata_completeness: 100.0/100 (Dublin Core / OpenGraph)
  http_compliance: 100.0/100 (RFC 7231 / robots / cache)

Standards Authority Mapping

Clipper builds on established industry standards:

Pillar Authority Implementation Weight
Semantic HTML W3C HTML5 Specification BeautifulSoup + html5lib 25%
Content Extractability Mozilla Readability readability-lxml 20%
Structured Data Schema.org Consortium extruct library 20%
DOM Navigability W3C + Deque Systems axe-selenium-python 15%
Metadata Completeness Dublin Core / Schema.org / OpenGraph BeautifulSoup 10%
HTTP Compliance IETF RFC 7231 + robots.txt httpx 10%

🏛️ No Custom Algorithms: Every score component is traceable to recognized industry standards.

Example: Audit Trail Reports

Clipper generates comprehensive audit documentation:

Standards Compliance Summary

## Clipper Access Gate Evaluation

**Final Score:** 60.7/100 (moderate_issues)
**Evaluation Methodology:** Standards-Based Access Gate
**Standards Compliance:** 6/6 frameworks evaluated

### Pillar Analysis
- **Semantic HTML**: 72.7/100 (Good semantic coverage, ARIA roles present)
- **Content Extractability**: 74.5/100 (Clean extraction via Readability, structure preserved)
- **Structured Data**: 12.0/100 (Limited JSON-LD quality, missing key fields)
- **DOM Navigability**: 35.0/100 (Accessibility violations detected, capped per-rule)
- **Metadata Completeness**: 100.0/100 (All metadata fields present)
- **HTTP Compliance**: 100.0/100 (Reachable, no robots blocks, cache headers present)

Actionable Recommendations

### Priority Fixes (Standards-Based)

🔥 **Critical - Structured Data Quality**
- Add complete JSON-LD with @type, name, author, dateModified, description
- Validate Schema.org required properties for declared types
- Include OpenGraph and microdata alongside JSON-LD

⚠️ **Important - DOM Navigability**
- Add `aria-label` attributes to navigation elements
- Ensure color contrast ratios meet WCAG AA standards
- Fix heading hierarchy violations

📋 **Recommended - Semantic HTML**
- Add `<main>` element wrapper (HTML5 semantic requirement)
- Implement proper heading hierarchy (h1 → h2 → h3)
- Use `<article>` elements for content sections

Scoring System

⚠️ Composite generalization status (v2.1, May 2026). The composite headline scores below (parseability_score and universal_score) were calibrated on Clipper's corpus-002 (n=43, single grader architecture). On a held-out corpus-003 (n=171, three independent judges, variance restored via Phi-4-mini scorer) the composite does not generalize: Pearson r between composite and judged QA accuracy is approximately −0.20 under all three judges, against a pre-registered ship-gate target of r ≥ +0.35. The composite was pre-registered, ship-gated, and falsified on its first clean held-out test. The corpus-002 r=+0.62 was an in-sample maximum (the same corpus selected the pillar weights via the γ experiments), not a held-out validation result.

The per-pillar measurements that feed the composite remain real signals against published standards (W3C, Schema.org, Mozilla, WCAG, RFC 7231, Dublin Core) and are unchanged. Composites are suppressed by default in v2.1 (--diagnostic-mode is the default; pass --include-composite to opt in for backward compatibility). The "Access Gate Classification" bands below are corpus-002 internal-consistency diagnostics; they are not validated to predict retrieval or AI-citation behavior on arbitrary pages and should not be used to rank pages against each other.

See findings/post-v2-roadmap.md, findings/v2.1-release-scope.md, and evaluation/phase5-results/corpus-003-analysis/session-9.5-verdict.md for the full pre-registration / falsification trail.

Clipper's primary output is the per-pillar component_scores block — six numbers, each measuring conformance to one published standard. Each pillar score is independently meaningful and should be the basis of any cross-page comparison.

For backward compatibility, Clipper can also emit two 0–100 composite numbers (suppressed by default in v2.1):

  • parseability_score — the type-adjusted composite. Clipper detects whether the page is an article, landing page, tutorial, FAQ, reference, or code sample, and reweights the six pillars accordingly. Available with --include-composite. Did not generalize on corpus-003.
  • universal_score — the same pillar scores under the default article weights. Available with --include-composite. Did not generalize on corpus-003.

The content type, detection signal, and full weight table used for each page are recorded under audit_trail._content_type. See docs/scoring.md#content-type-profiles for the profile table and detection precedence.

Every result also carries a methodology block (always present, in default and --include-composite mode) stating the calibration corpus, generalization status, and recommended use:

"methodology": {
  "scoring_version": "v2-evidence-partial",
  "calibration_corpus": "corpus-002",
  "generalization_status": "falsified on corpus-003 (Pearson r ≈ −0.20 across three judges vs ship gate +0.35); see evaluation/phase5-results/corpus-003-analysis/session-9.5-verdict.md",
  "recommended_use": "per-pillar audit; composite is not validated for ranking pages",
  "release": "v2.1"
}

Access Gate Classification (internal-consistency diagnostic on corpus-002 — see calibration note above)

  • 90-100: clean - Fully agent-ready
  • 75-89: minor_issues - Nearly agent-ready
  • 60-74: moderate_issues - Improvements needed
  • 40-59: significant_issues - Major optimization required
  • 0-39: severe_issues - Substantial restructuring needed
  • partial_evaluation - One or more pillars could not be evaluated (e.g., network timeout). The final score is a weighted average over the surviving pillars and the dropped pillars are listed in failed_pillars. See docs/scoring.md for the full contract.
  • evaluation_error - Every pillar failed; no usable score.

Pillar Weight Distribution

Based on agent retrievability impact:

  • Semantic HTML (25%) - Essential for content structure and agent parsing
  • Content Extractability (20%) - Can agents cleanly extract the content?
  • Structured Data (20%) - Machine-readable metadata for agent understanding
  • DOM Navigability (15%) - Accessible DOM structure for crawlers
  • Metadata Completeness (10%) - Identity, authorship, and currency signals
  • HTTP Compliance (10%) - Reachability, crawl permissions, cacheability, agent content hints

Rendering Modes

Clipper can evaluate each URL under two assumptions via --render-mode raw|rendered|both:

  • rendered (default) — models agents that execute JavaScript. DOM navigability runs in headless Chrome via axe-core.
  • raw — models non-JS agents (RAG crawlers, search indexers, API clients). DOM navigability falls back to static analysis.
  • both — produces two ScoreResult entries per URL and a "Rendering-Mode Deltas" section. Pages with |rendered - raw| >= 15 are flagged as JS-dependent. Treat min(rendered, raw) as the pessimistic score of record.

See docs/scoring.md#rendering-modes for the full explanation.

Content Extractability Sub-Signals

The Content Extractability score (20% of overall) uses Mozilla Readability to measure extraction quality:

Sub-signal Max Points What it measures
Signal-to-Noise Ratio 40 Ratio of extracted meaningful text to total page text. Optimal range: 0.3-0.8.
Structure Preservation 30 Do headings, lists, and code blocks survive extraction? (10 pts each category)
Boundary Detection 30 Did Readability find a clear article boundary? Checks title extraction, content length, and <main>/<article> overlap.

Structured Data Sub-Signals

The Structured Data score (20% of overall) evaluates schema quality, not just presence:

Sub-signal Max Points What it measures
Type Appropriateness 20 Does the @type match recognized content types (Article, WebPage, HowTo, etc.)?
Field Completeness 30 Per-type required + recommended fields for the four validated @type values. See below.
Multiple Formats 20 Are JSON-LD, OpenGraph, and microdata all present?
Schema Validation 30 Are required properties present for the declared Schema.org type?

Per-type field expectations (Field Completeness is computed per JSON-LD item, averaged across validated items):

@type Required Recommended
Article headline, datePublished author, dateModified, description, publisher
FAQPage mainEntity (non-empty list of Question entries with acceptedAnswer)
HowTo name, step (non-empty list) description, totalTime
BreadcrumbList itemListElement (list with ≥2 items)

Items of other @type values fall back to a generic key-field check. Missing and structurally invalid fields are logged in audit_trail.structured_data.field_completeness_detail. See docs/scoring.md for the full specification.

HTTP Compliance Sub-Signals

The HTTP Compliance score (10% of overall) is split into five sub-signals:

Sub-signal Max Points What it measures
HTML Reachability 15 Does the URL serve a 200 response to Accept: text/html?
Redirect Efficiency 25 Chain length (0 hops optimal, >4 penalized), proper status codes, performance impact.
Crawl Permissions 20 robots.txt allows access + no <meta name="robots" content="noindex"> blocking.
Cache Headers 20 Presence of ETag, Last-Modified, and Cache-Control headers.
Agent Content Hints 20 Signals that the page offers machine-readable alternate formats or LLM-specific endpoints.

Agent Content Hints detects:

  • <link rel="alternate" type="text/markdown"> (6 pts) — markdown alternate link
  • <meta name="markdown_url"> (4 pts) — markdown URL metadata (e.g. Microsoft Learn)
  • data-llm-hint attributes (4 pts) — explicit LLM guidance in HTML
  • llms.txt references (3 pts) — site-level LLM endpoint declaration
  • Non-HTML <link rel="alternate"> (3 pts) — any non-HTML alternate format (JSON, XML, etc.)

Metadata Completeness Fields

The Metadata Completeness score (10% of overall) checks for 7 key fields across Dublin Core, Schema.org, and OpenGraph:

Field Max Points Sources checked
Title 15 <title>, og:title, Schema.org name/headline
Description 15 <meta name="description">, og:description, Schema.org description
Author/Publisher 15 <meta name="author">, Schema.org author/publisher
Date 15 <meta> date tags, Schema.org dateModified/datePublished, <time> elements
Topic/Category 15 <meta name="ms.topic">, Schema.org articleSection, <meta name="keywords">
Language 10 <html lang="">, <meta http-equiv="content-language">
Canonical URL 15 <link rel="canonical">

GitHub Integration

Clipper integrates seamlessly with CI/CD workflows:

# .github/workflows/clipper-quality-gate.yml
name: Clipper Quality Gate
on: [pull_request]
jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Clipper Standards Evaluation
        run: |
          python main.py express docs-urls.txt --out pr-evaluation/ --quiet
          # Gate on pillars individually — each maps to a published standard.
          # Don't gate on composite (parseability_score) — it failed corpus-003.
          jq -e '.[] | select(.component_scores.http_compliance < 70 or .component_scores.semantic_html < 60)' \
            pr-evaluation/report_scores.json && exit 1 || exit 0

No API keys required - works immediately in any CI environment!

File Structure

clipper/
├─ README.md                           # This comprehensive guide
├─ USER-INSTRUCTIONS.md                # End-user walkthrough
├─ main.py                             # CLI entry point
├─ requirements.txt                    # Standards-based dependencies
├─ retrievability/
│  ├─ cli.py                           # Clipper CLI interface
│  ├─ access_gate_evaluator.py         # Standards-based evaluation engine
│  ├─ performance_evaluator.py         # Async/parallel evaluator
│  ├─ score.py / performance_score.py  # Scoring orchestration
│  ├─ crawl.py                         # URL acquisition + redirect tracking
│  ├─ parse.py                         # Content signal extraction
│  ├─ report.py                        # Audit trail + markdown report
│  └─ schemas.py                       # Clipper data structures
├─ urls/                               # Curated URL lists for evaluation
├─ samples/                            # Sample URLs and snapshots
├─ scripts/                            # Automation utilities
└─ docs/                               # Technical documentation

Real-World Use Cases

📚 Documentation Teams

# Pre-publication quality gates
python main.py express staging-docs-urls.txt --out quality-check/
# Get standards-based compliance report immediately

🏢 Enterprise Compliance

# Quarterly accessibility audits
python main.py express corporate-docs.txt --out compliance-audit/
jq '.audit_trail.dom_navigability' compliance-audit/report_scores.json

🤖 Agent Integration Teams

# Validate agent-ready content
python main.py express api-documentation.txt --out agent-readiness/
# Verify content extractability, structured data, and metadata coverage

🔍 Quality Assurance

# Regression testing for content changes
python main.py express --urls https://docs.updated-site.com --out regression-test/
# Compare against baseline standards compliance

Why Clipper?

What Clipper does well

  • Per-pillar evidence against published standards — each score traceable to a named authority (W3C, Schema.org, Mozilla, WCAG/axe-core, IETF RFC 7231, Dublin Core).
  • API-free operation — works in any environment with Python and Chrome.
  • Audit trail per pillar — the JSON output records which checks fired and why.
  • Pre-registration discipline — v2's composite was pre-registered, ship-gated, and falsified before this release; the trail is in findings/post-v2-roadmap.md and evaluation/phase5-results/corpus-003-analysis/session-9.5-verdict.md. That makes Clipper's claims about what its pillars do (and don't) measure unusually defensible.

What Clipper deliberately does not do

  • Single-number AI-readiness ranking. The composite that aggregated the six pillars failed its held-out validation (corpus-003, three judges, r ≈ −0.20). Don't use it to rank pages.
  • Predict citation share or RAG accuracy. Tracks for those measurements were scoped in findings/clipper-next-design.md and parked — they need a different scale of evidence than Clipper has.
  • Replace WCAG conformance review. Clipper runs axe-core in headless Chrome, which catches a defined automated subset of WCAG issues. Manual conformance review remains separate.

🎯 Result: A defensible, per-pillar standards auditor for documentation and content teams. Use it to find specific failures against specific standards. Don't use it as a popularity oracle.

Contributing

Clipper welcomes contributions that enhance standards-based evaluation:

  1. Standards Integration - Add support for additional industry standards
  2. Evaluation Enhancement - Improve component-specific analysis
  3. Enterprise Features - Expand audit trail and compliance documentation
  4. Agent Optimization - Enhance agent-focused content quality metrics

Running tests

Pillar behavior is locked in by a small offline fixture suite. Install dev dependencies and run pytest:

pip install -r requirements-dev.txt
pytest -q

The suite completes in under a second, requires no network or browser, and runs automatically in CI on every pull request. See docs/testing.md for the fixture layout and guidance on adding new fixtures.

License

Clipper - Standards-Based Access Gate Evaluator. Licensed under the MIT License.


🚀 Clipper: Where industry standards meet agent-ready content evaluation.

About

Clipper is a CLI that evaluates HTML for agent retrievability using best practices.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages