diff --git a/CHANGELOG.md b/CHANGELOG.md index 5008e8a..a26b22e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,27 @@ # Changelog +## 2.1.0 (2026-06-03) + +### Added +- **Stable finding codes.** Every `AuditFinding` now carries a `code` namespaced as `.[.]` (e.g. `technical-seo.h1.multiple`, `schema-validity.singleton.duplicate`), so agents and integrations key on a stable machine identifier instead of regex-matching the human `message` (which can change between releases). 212 codes across all 19 analyzers; the full registry is in [docs/finding-codes.md](docs/finding-codes.md). Codes follow a documented convention and are unique across the tool (enforced by a test). `AuditFinding.code` is required, so the compiler guarantees no finding ships without one. +- `hasMissingMetaDescription` (the `--require-meta` gate) now keys on `technical-seo.meta-description.missing` rather than a message prefix — the first consumer migrated to codes. + +### Changed +- **`schemaVersion` bumped to `1.1`** (additive: findings gained the `code` field). Report shapes are otherwise unchanged. + +## 2.0.0 (2026-06-03) + +### Breaking +- **`SitemapAuditReport.prioritizedFixes` is now a structured `PrioritizedFix[]`, not `string[]`.** Each entry is a typed object — `{ kind, id, title, recommendation, severity?, affectedPages, affectsHomepage, prevalencePct, avgGrade?, summary }` — so an AI agent can act on the ranked to-do list without regex-parsing prose. The human-readable one-liner is preserved on `.summary`; migrate by reading `prioritizedFixes.map(f => f.summary)`. The text/markdown reports are unchanged in spirit (they render the structured fixes, now spelling out every affected page). +- **New `schemaVersion` field on `AuditReport` and `SitemapAuditReport`** (exported `SCHEMA_VERSION`, currently `"1.0"`). It versions the report's JSON shape independently of the npm package version so agent parsers can detect breaking drift instead of failing silently. Treat the absence of the field as "pre-2.0 / legacy shape." + +### Added +- **`--format agent` — a slim, agent-native decision output.** Returns `{ schemaVersion, tool, mode, url, score, grade, pass, criticalDefectCount, issues }` as JSON, where `issues` is the ranked `PrioritizedFix[]`, omitting the per-factor and per-page detail an agent would otherwise have to average and re-rank. Works for single-URL, sitemap, and static-output audits (single-page reuses the same critical-defect and cross-cutting aggregation over a one-page "site"); `--detect-platform` falls back to structured JSON. New `agentSummaryFromAudit()` / `agentSummaryFromSitemap()` exports, `AgentSummary` type, and `formatAgent` / `formatSitemapAgent` formatters. +- **Critical per-page defects surfaced by impact, not prevalence (#42).** Sitemap and static-directory reports now include a `criticalDefects` rollup and a **Critical Defects** section (text + markdown) that lists binary, one-line-fix structural defects — an `

` count other than one, a missing ``, a missing meta description — **regardless of how few pages exhibit them**. Previously these were detected per page but lost in aggregation: `prioritizedFixes` ranked only by prevalence (so a defect on a single page was structurally excluded), the factor score averaged the defect away to a passing grade, and `crossCuttingIssues` was keyed by factor, never the specific defect. An unambiguous, high-impact defect on the most important page (e.g. a homepage split across four `<h1>`s, or a `/contact-us` page with none) appeared nowhere in the top-level summary. Now each defect names **every** offending page (homepage and high sitemap-`priority` pages first), and critical-severity defects are promoted to the **top** of `prioritizedFixes`. Shown even with `--top-issues`. + - The end-of-report summaries no longer truncate: the Critical Defects block and each prioritized fix list **every** affected page (no "+N more"), and `prioritizedFixes` reports every cross-cutting issue ordered by prevalence rather than a top-5 slice — a fix the audit computed always reaches the report. + - New `detectCriticalDefects()`, `buildCriticalDefects()`, and `SCHEMA_VERSION` exports plus `CriticalDefect`, `CriticalDefectGroup`, `CriticalDefectAffectedPage`, `CriticalDefectId`, `CriticalDefectSeverity`, and `PrioritizedFix` types. `AuditReport` gains `criticalDefects` and `schemaVersion`; `SitemapAuditReport` gains `criticalDefects` and `schemaVersion`; `SitemapPageResult` gains the page's sitemap `priority`. + - Detection is independent of the weighted factor scores, so **no existing audit scores or grades change** (and exit codes are unaffected). + ## 1.13.0 (2026-05-31) ### Added diff --git a/README.md b/README.md index c622757..e2ccc43 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,8 @@ - Audit **built HTML offline** in CI: a `next export` / `dist` / `out` directory, no network. [Static output](docs/cli.md#static-output-mode) - Detect the **platform / CMS / framework**: WordPress, Webflow, Shopify, Next.js, Vercel. [Platform detection](docs/cli.md#platform-detection) - Opt in to **Lighthouse, geographic, and agent-skill** factors. [Optional factors](docs/scoring.md#optional-factors) -- `text`, `json`, and `markdown` output with **CI-friendly exit codes**. [CLI reference](docs/cli.md) +- `text`, `json`, `markdown`, and `agent` output with **CI-friendly exit codes**. [CLI reference](docs/cli.md) +- **Agent-native output**: a versioned `schemaVersion`, a slim `--format agent` decision, ranked structured fixes, and stable [finding codes](docs/finding-codes.md) so integrations key on codes, not prose. [API](docs/api.md#machine-readable-output-for-ai-agents) - Use as a **library** ([API](docs/api.md)) or from Claude Code via the **`/aeo` skill** ([skill](docs/skill.md)). Website: [canonry.ai](https://canonry.ai) diff --git a/docs/api.md b/docs/api.md index 8f77b11..54e5979 100644 --- a/docs/api.md +++ b/docs/api.md @@ -33,14 +33,26 @@ const report = await runSitemapAudit('https://example.com', { factors: ['schema-validity', 'structured-data'], // Optional subset }) -console.log(report.aggregateGrade) // 'B+' -console.log(report.pagesAudited) // 22 +console.log(report.schemaVersion) // '1.1', JSON shape version (see "Machine-readable output") +console.log(report.aggregateGrade) // 'B+' +console.log(report.pagesAudited) // 22 +console.log(report.criticalDefects) // Binary per-page defects (multiple/missing H1, missing title/meta), grouped by defect console.log(report.crossCuttingIssues) // Per-factor rollup with affectedUrls for every recommendation -console.log(report.prioritizedFixes) // Top 5 fixes ranked by site-wide impact +console.log(report.prioritizedFixes) // Ranked PrioritizedFix[]: critical defects first, then cross-cutting by impact ``` Each entry in `crossCuttingIssues[].topIssues` carries a `recommendation` plus the exact `affectedUrls` so you can attribute each problem to specific pages, e.g. "FAQPage duplicate" pointing at every blog post that has it. +`criticalDefects` surfaces **binary structural defects by impact, not prevalence**. The cross-cutting rollup ranks by how many pages a factor affects, so an unambiguous one-line-fix defect on a single important page (a homepage split across four `<h1>`s, or a `/contact-us` page with none) would otherwise be averaged into a passing factor grade and excluded from `prioritizedFixes`. Each group names the offending pages (homepage and high sitemap-`priority` pages first), and the critical-severity ones lead `prioritizedFixes`. + +### Machine-readable output (for AI agents) + +`--format json` and these return values are the contract for programmatic use. The report is built to be acted on, not just rendered: + +- **`schemaVersion`** (on `AuditReport` and `SitemapAuditReport`, exported as `SCHEMA_VERSION`) versions the JSON shape independently of the npm version. Pin to it and treat a major bump as breaking; treat its absence as a pre-2.0 report. +- **`prioritizedFixes: PrioritizedFix[]`** is the ranked, pre-computed to-do list, so an agent need not average factor scores and re-rank. Each fix carries a stable `id` (a defect id like `"multiple-h1"` or a factor id like `"technical-seo"`), `kind`, an optional `severity`, the complete `affectedPages` array (never truncated), `affectsHomepage`, `prevalencePct`, and a human `summary`. +- **Stable identifiers** everywhere: the decision surface (`criticalDefects[].id`, `prioritizedFixes[].id` / `kind`) and every individual factor finding (`factors[].findings[].code`, e.g. `technical-seo.h1.multiple`) carry stable codes, so integrations key on codes, not on matching message strings. The full code registry is in [finding-codes.md](finding-codes.md). + ## Static output (offline, from disk) ```ts @@ -55,6 +67,7 @@ if (result.kind === 'single') { console.log(result.report.overallGrade) // single .html file → AuditReport } else { console.log(result.report.aggregateGrade) // directory → SitemapAuditReport shape + console.log(result.report.criticalDefects) console.log(result.report.crossCuttingIssues) } ``` diff --git a/docs/cli.md b/docs/cli.md index d700aff..d99594e 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -19,8 +19,15 @@ npx @ainyc/aeo-audit https://example.com --format json # Markdown report npx @ainyc/aeo-audit https://example.com --format markdown + +# Agent summary: the slim JSON decision, not the full report +npx @ainyc/aeo-audit https://example.com --sitemap --format agent ``` +`--format json` is the contract for programmatic and agent consumers: every report carries a `schemaVersion` (so a parser can detect breaking shape drift) and sitemap reports expose a `criticalDefects` rollup plus a ranked `prioritizedFixes` array of structured objects. See [api.md](api.md#machine-readable-output-for-ai-agents) for the field shapes. + +`--format agent` returns just the decision, not the report: `{ schemaVersion, tool, mode, url, score, grade, pass, criticalDefectCount, issues }`, where `issues` is the ranked `PrioritizedFix[]` (critical defects first, then cross-cutting by prevalence). It omits the per-factor and per-page detail so an agent can act without averaging and re-ranking scores itself. Works for single-URL, sitemap, and static-output audits; in `--detect-platform` mode it falls back to the structured JSON. + ## Running a subset of factors ```bash @@ -76,7 +83,7 @@ npx @ainyc/aeo-audit https://example.com --sitemap https://example.com/sitemap.x # Cap the number of pages (default 200, sorted by sitemap priority) npx @ainyc/aeo-audit https://example.com --sitemap --limit 50 -# Skip per-page output and show only cross-cutting issues +# Skip per-page output and show only the cross-cutting issues and critical defects npx @ainyc/aeo-audit https://example.com --sitemap --top-issues # Rewrite each <loc>'s origin to the target you named (audit staging with prod's sitemap) @@ -92,6 +99,8 @@ Auto-discovery checks `/sitemap.xml` → `/sitemap-index.xml` → `Sitemap:` dir When the sitemap has more URLs than `--limit`, the run audits the highest-priority pages and prints a notice to stderr listing how many were skipped and how to audit them all. +A **Critical Defects** section lists binary, one-line-fix structural defects (an `<h1>` count other than one, a missing `<title>`, a missing meta description) surfaced **regardless of how few pages they affect**, with the offending pages named (homepage and high sitemap-`priority` pages first). These would otherwise be averaged into a passing factor grade and excluded from the prevalence-ranked fixes; the critical-severity ones also lead the prioritized fix list. The section is shown even with `--top-issues`. See the machine-readable shapes in [api.md](api.md#machine-readable-output-for-ai-agents). + The optional in-process factors are honored per page: pass `--include-geo` and/or `--include-agent-skills` to add them to every audited page. `--lighthouse` is the exception: it cannot be combined with `--sitemap` because each PageSpeed Insights call takes 15-30s. ## Static-output mode @@ -184,14 +193,14 @@ When fetching `/llms.txt`, `/llms-full.txt`, `/robots.txt`, and `/sitemap.xml` t | Flag | Description | |------|-------------| -| `--format <type>` | Output format: `text` (default), `json`, `markdown` | +| `--format <type>` | Output format: `text` (default), `json`, `markdown`, `agent`. `agent` emits the slim JSON decision (score, pass gate, `criticalDefectCount`, ranked `issues`) for AI agents. | | `--factors <list>` | Comma-separated factor IDs to run (runs all if omitted) | | `--include-geo` | Include the optional geographic signals factor | | `--include-agent-skills` | Include the optional agent skill exposure factor | | `--lighthouse` | Include the optional Lighthouse factor (Performance + Accessibility + Best Practices, mobile strategy) via Google PageSpeed Insights. Single-URL only; cannot combine with `--sitemap` or `--detect-platform`. Adds ~15-30s. Set `PAGESPEED_API_KEY` env var to lift anonymous rate limits. | | `--sitemap [url]` | Audit all pages from the sitemap. Auto-discovery tries `/sitemap.xml`, then `/sitemap-index.xml`, then `Sitemap:` directives in `/robots.txt`. Pass an explicit URL to override. | | `--limit <n>` | Max pages to audit in sitemap mode (default 200, sorted by sitemap priority) | -| `--top-issues` | In sitemap mode, skip per-page output and show only cross-cutting issues | +| `--top-issues` | In sitemap mode, skip per-page output and show only the cross-cutting issues and critical defects | | `--detect-platform` | Identify the platform/CMS/framework powering the site instead of running an audit | | `--urls <src>` | In `--detect-platform` mode, run on multiple URLs. `<src>` is a file path (one URL per line), a comma-separated list, or `-` for stdin | | `--concurrency <n>` | In `--detect-platform` batch mode, max in-flight fetches (default 5) | diff --git a/docs/finding-codes.md b/docs/finding-codes.md new file mode 100644 index 0000000..0eec9a9 --- /dev/null +++ b/docs/finding-codes.md @@ -0,0 +1,278 @@ +# Finding codes + +Every `AuditFinding` carries a stable `code` so integrations can key on a machine identifier instead of matching the human `message` string (which may change between releases). + +## Convention + +`<factor-id>.<check>[.<variant>]` — lowercase kebab-case, dot-separated. `<check>` names the sub-check (e.g. `h1`, `meta-description`); `<variant>` distinguishes the outcomes of one check (e.g. `missing`, `multiple`, `single`). All branches of one check share the `<check>` segment. Codes are stable across releases and unique across the tool. + +## Registry + +### Structured Data (JSON-LD) + +- `structured-data.json-ld.found` +- `structured-data.json-ld.missing` +- `structured-data.schema.found` +- `structured-data.schema.missing` +- `structured-data.schema-depth.strong` +- `structured-data.schema-depth.moderate` +- `structured-data.schema-depth.low` + +### Content Depth + +- `content-depth.word-count.strong` +- `content-depth.word-count.moderate` +- `content-depth.word-count.low` +- `content-depth.h1.single` +- `content-depth.h1.multiple` +- `content-depth.h1.missing` +- `content-depth.headings.strong` +- `content-depth.headings.moderate` +- `content-depth.headings.low` +- `content-depth.paragraphs.strong` +- `content-depth.paragraphs.moderate` +- `content-depth.paragraphs.low` +- `content-depth.lists.present` +- `content-depth.lists.none` + +### AI-Readable Content + +- `ai-readable-content.content-negotiation.found` +- `ai-readable-content.aux-resource.missing` +- `ai-readable-content.aux-resource.timeout` +- `ai-readable-content.aux-resource.unreachable` +- `ai-readable-content.aux-resource.not-html` +- `ai-readable-content.aux-resource.found` +- `ai-readable-content.llms-txt.strong` +- `ai-readable-content.llms-txt.short` +- `ai-readable-content.llms-full-txt.strong` +- `ai-readable-content.llms-full-txt.short` +- `ai-readable-content.robots-txt.found` +- `ai-readable-content.robots-txt.unreachable` +- `ai-readable-content.robots-txt.missing` +- `ai-readable-content.sitemap.found` +- `ai-readable-content.sitemap.unreachable` +- `ai-readable-content.sitemap.missing` +- `ai-readable-content.llms-txt-link.found` +- `ai-readable-content.llms-txt-link.missing` +- `ai-readable-content.markdown-endpoint.found` +- `ai-readable-content.markdown-endpoint.missing` + +### E-E-A-T Signals + +- `eeat-signals.author.credentialed` +- `eeat-signals.author.no-credentials` +- `eeat-signals.author.missing` +- `eeat-signals.author-meta.found` +- `eeat-signals.author-meta.missing` +- `eeat-signals.review.found` +- `eeat-signals.review.missing` +- `eeat-signals.trust-links.strong` +- `eeat-signals.trust-links.partial` +- `eeat-signals.trust-links.missing` +- `eeat-signals.organization.with-people` +- `eeat-signals.organization.no-people` +- `eeat-signals.organization.missing` + +### FAQ Content + +- `faq-content.faqpage.present` +- `faq-content.faqpage.missing` +- `faq-content.details.multiple` +- `faq-content.details.single` +- `faq-content.details.none` +- `faq-content.headings.multiple` +- `faq-content.headings.low` +- `faq-content.headings.missing` +- `faq-content.qa-pairs.multiple` +- `faq-content.qa-pairs.low` +- `faq-content.qa-pairs.none` + +### Citations & Authority Signals + +- `citations.external-links.strong` +- `citations.external-links.moderate` +- `citations.external-links.low` +- `citations.authoritative-domains.found` +- `citations.authoritative-domains.none` +- `citations.sameas.strong` +- `citations.sameas.moderate` +- `citations.sameas.missing` +- `citations.anchor-text.strong` +- `citations.anchor-text.moderate` +- `citations.anchor-text.low` + +### Schema Completeness + +- `schema-completeness.schema.none` +- `schema-completeness.local-business.strong` +- `schema-completeness.local-business.partial` +- `schema-completeness.local-business.low` +- `schema-completeness.faqpage.strong` +- `schema-completeness.faqpage.partial` +- `schema-completeness.faqpage.low` +- `schema-completeness.howto.strong` +- `schema-completeness.howto.partial` +- `schema-completeness.organization.strong` +- `schema-completeness.organization.partial` +- `schema-completeness.organization.low` +- `schema-completeness.schema-depth.moderate` +- `schema-completeness.schema-depth.low` + +### Schema Validity + +- `schema-validity.json-ld.none` +- `schema-validity.block.empty` +- `schema-validity.block.invalid` +- `schema-validity.singleton.duplicate` +- `schema-validity.block.valid` + +### Entity Consistency + +- `entity-consistency.name.missing` +- `entity-consistency.name.single` +- `entity-consistency.name.moderate` +- `entity-consistency.name.multiple` +- `entity-consistency.title.ok` +- `entity-consistency.title.long` +- `entity-consistency.canonical.present` +- `entity-consistency.canonical.missing` +- `entity-consistency.contact.ok` +- `entity-consistency.contact.partial` +- `entity-consistency.contact.missing` + +### Content Freshness + +- `content-freshness.date-modified.recent` +- `content-freshness.date-modified.moderate` +- `content-freshness.date-modified.stale` +- `content-freshness.date-modified.missing` +- `content-freshness.last-modified.recent` +- `content-freshness.last-modified.older` +- `content-freshness.last-modified.missing` +- `content-freshness.sitemap.recent` +- `content-freshness.sitemap.stale` +- `content-freshness.sitemap.no-match` +- `content-freshness.sitemap.timeout` +- `content-freshness.sitemap.unreachable` +- `content-freshness.sitemap.missing` +- `content-freshness.copyright.recent` +- `content-freshness.copyright.older` +- `content-freshness.copyright.missing` + +### Content Extractability + +- `content-extractability.content-ratio.strong` +- `content-extractability.content-ratio.moderate` +- `content-extractability.content-ratio.low` +- `content-extractability.citable-blocks.strong` +- `content-extractability.citable-blocks.moderate` +- `content-extractability.citable-blocks.missing` +- `content-extractability.paywall.found` +- `content-extractability.paywall.none` +- `content-extractability.ad-density.high` +- `content-extractability.ad-density.low` +- `content-extractability.ad-density.none` +- `content-extractability.direct-answer.strong` +- `content-extractability.direct-answer.moderate` +- `content-extractability.direct-answer.none` + +### Definition Blocks + +- `definition-blocks.headings.multiple` +- `definition-blocks.headings.single` +- `definition-blocks.headings.missing` +- `definition-blocks.lists.found` +- `definition-blocks.lists.none` +- `definition-blocks.schema.found` +- `definition-blocks.schema.missing` +- `definition-blocks.dl.found` +- `definition-blocks.dl.none` + +### AI Crawler Access + +- `ai-crawler-access.robots-txt.missing` +- `ai-crawler-access.robots-txt.unreachable` +- `ai-crawler-access.crawler.allowed` +- `ai-crawler-access.crawler.blocked` +- `ai-crawler-access.sitemap.found` +- `ai-crawler-access.content-signal.found` + +### Named Entities + +- `named-entities.brand-name.strong` +- `named-entities.brand-name.low` +- `named-entities.brand-name.missing` +- `named-entities.entity-name.missing` +- `named-entities.knows-about.present` +- `named-entities.knows-about.missing` +- `named-entities.proper-noun-density.strong` +- `named-entities.proper-noun-density.moderate` +- `named-entities.proper-noun-density.low` + +### Technical SEO + +- `technical-seo.h1.single` +- `technical-seo.h1.missing` +- `technical-seo.h1.multiple` +- `technical-seo.alt-text.none` +- `technical-seo.alt-text.ok` +- `technical-seo.alt-text.missing` +- `technical-seo.alt-text.empty` +- `technical-seo.meta-description.missing` +- `technical-seo.meta-description.short` +- `technical-seo.meta-description.long` +- `technical-seo.meta-description.present` +- `technical-seo.canonical.missing` +- `technical-seo.canonical.present` + +### Snippet Eligibility + +- `snippet-eligibility.directives.none` +- `snippet-eligibility.noindex.present` +- `snippet-eligibility.nosnippet.present` +- `snippet-eligibility.max-snippet.zero` +- `snippet-eligibility.max-snippet.low` +- `snippet-eligibility.noarchive.present` +- `snippet-eligibility.noimageindex.present` +- `snippet-eligibility.directives.not-restrictive` + +### Geographic Signals (optional) + +- `geographic-signals.localbusiness-schema.found` +- `geographic-signals.localbusiness-schema.missing` +- `geographic-signals.geo-coordinates.found` +- `geographic-signals.geo-coordinates.missing` +- `geographic-signals.postal-address.found` +- `geographic-signals.postal-address.missing` +- `geographic-signals.area-served.found` +- `geographic-signals.area-served.missing` +- `geographic-signals.geo-meta.found` +- `geographic-signals.geo-meta.missing` +- `geographic-signals.visible-location.found` +- `geographic-signals.visible-location.missing` + +### Agent Skill Exposure (optional) + +- `agent-skill-exposure.schema-action.well-formed` +- `agent-skill-exposure.schema-action.partial` +- `agent-skill-exposure.schema-action.missing` +- `agent-skill-exposure.mcp-discovery.found` +- `agent-skill-exposure.mcp-discovery.missing` +- `agent-skill-exposure.a2a-agent-card.found` +- `agent-skill-exposure.a2a-agent-card.missing` +- `agent-skill-exposure.openapi.found` +- `agent-skill-exposure.openapi.missing` +- `agent-skill-exposure.microdata.found` +- `agent-skill-exposure.microdata.missing` +- `agent-skill-exposure.forms.none` +- `agent-skill-exposure.forms.strong` +- `agent-skill-exposure.forms.partial` +- `agent-skill-exposure.forms.weak` + +### Lighthouse (optional) + +- `lighthouse.psi.unreachable` +- `lighthouse.category.missing` +- `lighthouse.category.score` +- `lighthouse.category.none` diff --git a/package.json b/package.json index 454d56a..cc3c13a 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@ainyc/aeo-audit", - "version": "1.13.0", + "version": "2.1.0", "description": "The most comprehensive open-source Answer Engine Optimization (AEO) audit tool. Scores websites across 16 ranking factors that determine AI citation.", "type": "module", "main": "./dist/index.js", diff --git a/skills/aeo/SKILL.md b/skills/aeo/SKILL.md index 295f749..3b43817 100644 --- a/skills/aeo/SKILL.md +++ b/skills/aeo/SKILL.md @@ -52,6 +52,7 @@ If no mode is provided, default to `audit`. - `audit https://example.com --sitemap` - `audit https://example.com --sitemap --limit 10` - `audit https://example.com --sitemap --top-issues` +- `audit https://example.com --sitemap --format agent` (slim decision for agents) - `audit https://example.com --lighthouse` - `audit https://example.com --require-meta` - `audit https://example.com --sitemap --require-meta` @@ -108,7 +109,7 @@ npx @ainyc/aeo-audit@1 "<url>" --sitemap --top-issues --format json Flags: - `--sitemap [url]` — auto-discover the sitemap (tries `/sitemap.xml`, then `/sitemap-index.xml`, then `Sitemap:` directives in `/robots.txt`) or provide an explicit URL - `--limit <n>` — cap pages audited (default 200, sorted by sitemap priority) -- `--top-issues` — skip per-page output, show only cross-cutting patterns +- `--top-issues` — skip per-page output, show only cross-cutting patterns and critical defects - `--rewrite-sitemap-origin` — rewrite every `<loc>`'s origin to the target URL's origin (preserving path/query) before crawling. Use when the sitemap hardcodes the prod/canonical domain but you want to audit a staging host or local dev server. - `--require-meta` — force exit `1` if any audited page is missing `<meta name="description">`, regardless of overall score (useful as a CI gate) - `--include-geo` / `--include-agent-skills` — honored per page in sitemap mode (adds the optional geographic-signals / agent-skill-exposure factors). `--lighthouse` is not available with `--sitemap`. @@ -117,9 +118,17 @@ Pages are audited with bounded concurrency (5 in flight) to avoid hammering the Returns: - Per-page scores and grades +- **Critical defects** — binary, one-line-fix structural defects (an `<h1>` count other than one, a missing `<title>`, a missing meta description) surfaced **regardless of how few pages they affect**, with the offending pages named (homepage and high sitemap-`priority` pages first). These would otherwise be averaged into a passing factor grade; the JSON field is `criticalDefects` and critical-severity ones are also promoted to the top of `prioritizedFixes`. Shown even with `--top-issues`. - Cross-cutting issues (factors failing across multiple pages) - Aggregate score and grade -- Prioritized fixes ranked by site-wide impact +- Prioritized fixes (critical defects first, then ranked by site-wide impact) + +#### Machine-readable output (for agents) + +Use `--format json` for the full report, or **`--format agent`** for just the decision: `{ schemaVersion, tool, mode, url, score, grade, pass, criticalDefectCount, issues }`, where `issues` is the ranked `prioritizedFixes` and the per-factor/per-page detail is omitted. Prefer `--format agent` when you only need to decide and act. Key fields for acting on the result without parsing prose: +- `schemaVersion` (on every audit report) versions the JSON shape independently of the package version — pin to it and treat a major bump as breaking; absence means a pre-2.0 report. +- `prioritizedFixes` is a ranked array of objects, each with a stable `id`, `kind`, optional `severity`, the complete `affectedPages` list (never truncated), `affectsHomepage`, `prevalencePct`, and a human `summary`. It's the pre-computed to-do list — no need to re-rank factor scores yourself. +- Stable identifiers everywhere — `criticalDefects[].id`, `prioritizedFixes[].id`, and every factor finding's `code` (e.g. `technical-seo.h1.multiple`) — let integrations key on codes rather than message strings. #### Auxiliary File Diagnostics diff --git a/src/agent-summary.ts b/src/agent-summary.ts new file mode 100644 index 0000000..0b70708 --- /dev/null +++ b/src/agent-summary.ts @@ -0,0 +1,48 @@ +import { buildCriticalDefects } from './critical-defects.js' +import { buildCrossCuttingIssues, buildPrioritizedFixes } from './sitemap.js' +import type { AgentSummary, AuditReport, SitemapAuditReport } from './types.js' + +const TOOL = '@ainyc/aeo-audit' + +// The score >= 70 gate, mirrored from the CLI's exit-code rule. Kept as a named +// constant so the agent surface and the exit code can't drift apart. +const PASS_THRESHOLD = 70 + +/** + * Reduce a single-page `AuditReport` to the decision an agent acts on. The ranked + * `issues` list is computed by running the same critical-defect and cross-cutting + * aggregation used for sitemaps over a one-page "site", so single-URL and sitemap + * runs return the identical `PrioritizedFix` shape. + */ +export function agentSummaryFromAudit(report: AuditReport): AgentSummary { + const criticalDefects = buildCriticalDefects([report]) + const crossCutting = buildCrossCuttingIssues([report]) + const issues = buildPrioritizedFixes(crossCutting, 1, criticalDefects) + + return { + schemaVersion: report.schemaVersion, + tool: TOOL, + mode: 'single', + url: report.finalUrl, + score: report.overallScore, + grade: report.overallGrade, + pass: report.overallScore >= PASS_THRESHOLD, + criticalDefectCount: criticalDefects.filter((g) => g.severity === 'critical').length, + issues, + } +} + +/** Reduce a multi-page `SitemapAuditReport` to the same decision shape. */ +export function agentSummaryFromSitemap(report: SitemapAuditReport): AgentSummary { + return { + schemaVersion: report.schemaVersion, + tool: TOOL, + mode: 'sitemap', + url: report.sitemapUrl, + score: report.aggregateScore, + grade: report.aggregateGrade, + pass: report.aggregateScore >= PASS_THRESHOLD, + criticalDefectCount: report.criticalDefects.filter((g) => g.severity === 'critical').length, + issues: report.prioritizedFixes, + } +} diff --git a/src/analyzers/agent-skill-exposure.ts b/src/analyzers/agent-skill-exposure.ts index 5eeafac..38ec5cf 100644 --- a/src/analyzers/agent-skill-exposure.ts +++ b/src/analyzers/agent-skill-exposure.ts @@ -124,15 +124,15 @@ export function analyzeAgentSkillExposure(context: AuditContext): AnalysisResult if (wellFormed.length > 0) { score += 35 const types = [...new Set(wellFormed.map((a) => a.type))].slice(0, 3).join(', ') - findings.push({ type: 'found', message: `Schema.org Action markup declared with target and inputs: ${types}.` }) + findings.push({ type: 'found', code: 'agent-skill-exposure.schema-action.well-formed', message: `Schema.org Action markup declared with target and inputs: ${types}.` }) } else { score += 18 const types = [...new Set(actions.map((a) => a.type))].slice(0, 3).join(', ') - findings.push({ type: 'info', message: `Schema.org Action types present (${types}) but missing target/urlTemplate or query-input/object shape.` }) + findings.push({ type: 'info', code: 'agent-skill-exposure.schema-action.partial', message: `Schema.org Action types present (${types}) but missing target/urlTemplate or query-input/object shape.` }) recommendations.push('Add target (with urlTemplate) and query-input/object to Action schema so agents know how to invoke it.') } } else { - findings.push({ type: 'missing', message: 'No Schema.org Action markup detected (PotentialAction / SearchAction / OrderAction / etc.).' }) + findings.push({ type: 'missing', code: 'agent-skill-exposure.schema-action.missing', message: 'No Schema.org Action markup detected (PotentialAction / SearchAction / OrderAction / etc.).' }) recommendations.push('Declare interactive affordances with Schema.org Action markup (e.g. SearchAction with urlTemplate and query-input) so agents can invoke them as tools.') } @@ -149,9 +149,9 @@ export function analyzeAgentSkillExposure(context: AuditContext): AnalysisResult : mcpMeta.length ? `<meta name="${mcpMeta.attr('name')}">` : 'Link header' - findings.push({ type: 'found', message: `Agent protocol discovery present (${src}).` }) + findings.push({ type: 'found', code: 'agent-skill-exposure.mcp-discovery.found', message: `Agent protocol discovery present (${src}).` }) } else { - findings.push({ type: 'missing', message: 'No MCP / WebMCP / ai-plugin discovery link or header.' }) + findings.push({ type: 'missing', code: 'agent-skill-exposure.mcp-discovery.missing', message: 'No MCP / WebMCP / ai-plugin discovery link or header.' }) recommendations.push('Expose an MCP server card via <link rel="mcp" href="/.well-known/mcp.json"> or a Link header so agents can discover your tools.') } @@ -166,9 +166,9 @@ export function analyzeAgentSkillExposure(context: AuditContext): AnalysisResult const agentCardHeader = /rel="?(agent-card|a2a)"?/i.test(linkHeader) if (agentCardLink.length || agentCardMeta.length || agentCardHeader) { score += 12 - findings.push({ type: 'found', message: 'A2A agent card discovery present — agents can fetch an agent card to negotiate capabilities.' }) + findings.push({ type: 'found', code: 'agent-skill-exposure.a2a-agent-card.found', message: 'A2A agent card discovery present — agents can fetch an agent card to negotiate capabilities.' }) } else { - findings.push({ type: 'info', message: 'No A2A agent card discovery (no link/meta/Link header pointing to an agent card).' }) + findings.push({ type: 'info', code: 'agent-skill-exposure.a2a-agent-card.missing', message: 'No A2A agent card discovery (no link/meta/Link header pointing to an agent card).' }) recommendations.push( `Publish an A2A agent card and advertise it via <link rel="agent-card" href="/.well-known/agent.json"> or a Link header. ${specCitation('a2a-agent-cards')}`, ) @@ -180,9 +180,9 @@ export function analyzeAgentSkillExposure(context: AuditContext): AnalysisResult ).first() if (openapiLink.length) { score += 10 - findings.push({ type: 'found', message: `Service description link found (type="${openapiLink.attr('type') || 'unspecified'}").` }) + findings.push({ type: 'found', code: 'agent-skill-exposure.openapi.found', message: `Service description link found (type="${openapiLink.attr('type') || 'unspecified'}").` }) } else { - findings.push({ type: 'info', message: 'No OpenAPI / service-description link found.' }) + findings.push({ type: 'info', code: 'agent-skill-exposure.openapi.missing', message: 'No OpenAPI / service-description link found.' }) recommendations.push('Link to an OpenAPI document via <link rel="describedby" type="application/openapi+json"> so agents can see the underlying endpoint shape.') } @@ -191,9 +191,9 @@ export function analyzeAgentSkillExposure(context: AuditContext): AnalysisResult const itemtypeCount = $('[itemtype]').length if (itempropCount >= 3 || itemtypeCount >= 1) { score += 10 - findings.push({ type: 'found', message: `Microdata present (${itempropCount} itemprop, ${itemtypeCount} itemtype) — helps agents map semantic meaning.` }) + findings.push({ type: 'found', code: 'agent-skill-exposure.microdata.found', message: `Microdata present (${itempropCount} itemprop, ${itemtypeCount} itemtype) — helps agents map semantic meaning.` }) } else { - findings.push({ type: 'info', message: 'Little or no microdata (itemprop / itemtype) found on the page.' }) + findings.push({ type: 'info', code: 'agent-skill-exposure.microdata.missing', message: 'Little or no microdata (itemprop / itemtype) found on the page.' }) } // ── Form structural fallback (up to 25) ───────────────────────────────── @@ -207,7 +207,7 @@ export function analyzeAgentSkillExposure(context: AuditContext): AnalysisResult }) if (candidateForms.length === 0) { - findings.push({ type: 'info', message: 'No interactive forms detected on this page.' }) + findings.push({ type: 'info', code: 'agent-skill-exposure.forms.none', message: 'No interactive forms detected on this page.' }) } else { const perFormScores: number[] = [] candidateForms.each((_, el) => { @@ -218,12 +218,12 @@ export function analyzeAgentSkillExposure(context: AuditContext): AnalysisResult score += formContribution if (avg >= 80) { - findings.push({ type: 'found', message: `${candidateForms.length} form(s) with strong agent-usable structure (labels, autocomplete, semantic types).` }) + findings.push({ type: 'found', code: 'agent-skill-exposure.forms.strong', message: `${candidateForms.length} form(s) with strong agent-usable structure (labels, autocomplete, semantic types).` }) } else if (avg >= 40) { - findings.push({ type: 'info', message: `${candidateForms.length} form(s) partially agent-usable. Average structure score ${Math.round(avg)}/100.` }) + findings.push({ type: 'info', code: 'agent-skill-exposure.forms.partial', message: `${candidateForms.length} form(s) partially agent-usable. Average structure score ${Math.round(avg)}/100.` }) recommendations.push('Strengthen forms with aria-label / <label for>, autocomplete tokens (email, tel, street-address…), and semantic input types (email, tel, number, date).') } else { - findings.push({ type: 'missing', message: `${candidateForms.length} form(s) have weak structure for agent use (avg ${Math.round(avg)}/100). Inputs lack labels, autocomplete, or semantic types.` }) + findings.push({ type: 'missing', code: 'agent-skill-exposure.forms.weak', message: `${candidateForms.length} form(s) have weak structure for agent use (avg ${Math.round(avg)}/100). Inputs lack labels, autocomplete, or semantic types.` }) recommendations.push('Add <label for> or aria-label to every input, set autocomplete tokens, and use semantic input types so agents can identify each field without guessing.') } } diff --git a/src/analyzers/ai-crawler-access.ts b/src/analyzers/ai-crawler-access.ts index 4323a7b..8e0f22f 100644 --- a/src/analyzers/ai-crawler-access.ts +++ b/src/analyzers/ai-crawler-access.ts @@ -116,11 +116,11 @@ export function analyzeAiCrawlerAccess(context: AuditContext): AnalysisResult { if (robotsState === 'missing') { // No robots.txt means everything is allowed score += 80 - findings.push({ type: 'info', message: 'No robots.txt found — AI crawlers are implicitly allowed.' }) + findings.push({ type: 'info', code: 'ai-crawler-access.robots-txt.missing', message: 'No robots.txt found — AI crawlers are implicitly allowed.' }) recommendations.push('Add a robots.txt that explicitly allows AI crawlers for clarity.') } else { score += 30 - findings.push({ type: robotsState === 'timeout' ? 'timeout' : 'unreachable', message: 'Could not reliably fetch robots.txt.' }) + findings.push({ type: robotsState === 'timeout' ? 'timeout' : 'unreachable', code: 'ai-crawler-access.robots-txt.unreachable', message: 'Could not reliably fetch robots.txt.' }) } return { score: clampScore(score), findings, recommendations } @@ -146,10 +146,10 @@ export function analyzeAiCrawlerAccess(context: AuditContext): AnalysisResult { if (allowed) { _allowedCount += 1 score += crawler.points - findings.push({ type: 'found', message: `${crawler.name} is allowed by robots.txt.` }) + findings.push({ type: 'found', code: 'ai-crawler-access.crawler.allowed', message: `${crawler.name} is allowed by robots.txt.` }) } else { blockedBots.push(crawler.name) - findings.push({ type: 'missing', message: `${crawler.name} is blocked by robots.txt.` }) + findings.push({ type: 'missing', code: 'ai-crawler-access.crawler.blocked', message: `${crawler.name} is blocked by robots.txt.` }) } } @@ -160,14 +160,14 @@ export function analyzeAiCrawlerAccess(context: AuditContext): AnalysisResult { // Bonus for explicit sitemap directive if (robotsTxt.toLowerCase().includes('sitemap:')) { score += 18 - findings.push({ type: 'found', message: 'Sitemap directive found in robots.txt.' }) + findings.push({ type: 'found', code: 'ai-crawler-access.sitemap.found', message: 'Sitemap directive found in robots.txt.' }) } // Content Signals — machine-readable AI usage preferences in robots.txt // (specification.website: content-signals). if (/^\s*content-signal\s*:/im.test(robotsTxt)) { score += 8 - findings.push({ type: 'found', message: 'robots.txt declares Content-Signal directives — AI search/input/train preferences are machine-readable.' }) + findings.push({ type: 'found', code: 'ai-crawler-access.content-signal.found', message: 'robots.txt declares Content-Signal directives — AI search/input/train preferences are machine-readable.' }) } else { recommendations.push( `Declare AI usage preferences with a Content-Signal directive in robots.txt (e.g. "Content-Signal: search=yes, ai-input=yes, ai-train=no"). ${specCitation('content-signals')}`, diff --git a/src/analyzers/ai-readable-content.ts b/src/analyzers/ai-readable-content.ts index 2ffc1a4..0535530 100644 --- a/src/analyzers/ai-readable-content.ts +++ b/src/analyzers/ai-readable-content.ts @@ -26,6 +26,7 @@ function pushDiagnosticFindings( if (diagnostics.contentNegotiation) { findings.push({ type: 'info', + code: 'ai-readable-content.content-negotiation.found', message: `${label} returns a non-2xx response when fetched with \`Accept: text/markdown\` — content negotiation hides it from AI content extraction tools that prefer markdown.`, }) recommendations.push( @@ -42,27 +43,27 @@ function scoreAuxState( recommendations: string[], ): number { if (!auxEntry || auxEntry.state === 'missing') { - findings.push({ type: 'missing', message: missingMessage }) + findings.push({ type: 'missing', code: 'ai-readable-content.aux-resource.missing', message: missingMessage }) recommendations.push(`Create ${missingMessage.split(' ')[0]} at your site root.`) return 0 } if (auxEntry.state === 'timeout') { - findings.push({ type: 'timeout', message: unavailableMessage }) + findings.push({ type: 'timeout', code: 'ai-readable-content.aux-resource.timeout', message: unavailableMessage }) return 8 } if (auxEntry.state === 'unreachable') { - findings.push({ type: 'unreachable', message: unavailableMessage }) + findings.push({ type: 'unreachable', code: 'ai-readable-content.aux-resource.unreachable', message: unavailableMessage }) return 8 } if (auxEntry.state === 'not-html') { - findings.push({ type: 'info', message: `${missingMessage.split(' ')[0]} returned an unexpected content type.` }) + findings.push({ type: 'info', code: 'ai-readable-content.aux-resource.not-html', message: `${missingMessage.split(' ')[0]} returned an unexpected content type.` }) return 10 } - findings.push({ type: 'found', message: `${missingMessage.split(' ')[0]} is available.` }) + findings.push({ type: 'found', code: 'ai-readable-content.aux-resource.found', message: `${missingMessage.split(' ')[0]} is available.` }) return 24 } @@ -86,9 +87,9 @@ export function analyzeAiReadableContent(context: AuditContext): AnalysisResult const wordCount = countWords(auxiliary.llmsTxt.body || '') if (wordCount >= 100) { score += 8 - findings.push({ type: 'found', message: '/llms.txt has useful content depth.' }) + findings.push({ type: 'found', code: 'ai-readable-content.llms-txt.strong', message: '/llms.txt has useful content depth.' }) } else { - findings.push({ type: 'info', message: '/llms.txt is present but short.' }) + findings.push({ type: 'info', code: 'ai-readable-content.llms-txt.short', message: '/llms.txt is present but short.' }) recommendations.push('Expand /llms.txt with concise service and entity context.') } } @@ -107,9 +108,9 @@ export function analyzeAiReadableContent(context: AuditContext): AnalysisResult const wordCount = countWords(auxiliary.llmsFullTxt.body || '') if (wordCount >= 200) { score += 10 - findings.push({ type: 'found', message: '/llms-full.txt has strong long-form coverage.' }) + findings.push({ type: 'found', code: 'ai-readable-content.llms-full-txt.strong', message: '/llms-full.txt has strong long-form coverage.' }) } else { - findings.push({ type: 'info', message: '/llms-full.txt exists but lacks detail.' }) + findings.push({ type: 'info', code: 'ai-readable-content.llms-full-txt.short', message: '/llms-full.txt exists but lacks detail.' }) recommendations.push('Add complete offerings, FAQ, and service-area coverage to /llms-full.txt.') } } @@ -118,12 +119,12 @@ export function analyzeAiReadableContent(context: AuditContext): AnalysisResult const robotsState = auxiliary.robotsTxt?.state if (robotsState === 'ok') { score += 16 - findings.push({ type: 'found', message: 'robots.txt is accessible.' }) + findings.push({ type: 'found', code: 'ai-readable-content.robots-txt.found', message: 'robots.txt is accessible.' }) } else if (robotsState === 'timeout' || robotsState === 'unreachable') { score += 6 - findings.push({ type: robotsState, message: 'Could not reliably fetch /robots.txt.' }) + findings.push({ type: robotsState, code: 'ai-readable-content.robots-txt.unreachable', message: 'Could not reliably fetch /robots.txt.' }) } else { - findings.push({ type: 'missing', message: '/robots.txt is missing.' }) + findings.push({ type: 'missing', code: 'ai-readable-content.robots-txt.missing', message: '/robots.txt is missing.' }) recommendations.push('Add a robots.txt file.') } pushDiagnosticFindings('/robots.txt', auxiliary.robotsTxt, findings, recommendations) @@ -132,12 +133,12 @@ export function analyzeAiReadableContent(context: AuditContext): AnalysisResult const sitemapState = auxiliary.sitemapXml?.state if (sitemapState === 'ok') { score += 16 - findings.push({ type: 'found', message: 'sitemap.xml is accessible.' }) + findings.push({ type: 'found', code: 'ai-readable-content.sitemap.found', message: 'sitemap.xml is accessible.' }) } else if (sitemapState === 'timeout' || sitemapState === 'unreachable') { score += 6 - findings.push({ type: sitemapState, message: 'Could not reliably fetch /sitemap.xml.' }) + findings.push({ type: sitemapState, code: 'ai-readable-content.sitemap.unreachable', message: 'Could not reliably fetch /sitemap.xml.' }) } else { - findings.push({ type: 'missing', message: '/sitemap.xml is missing.' }) + findings.push({ type: 'missing', code: 'ai-readable-content.sitemap.missing', message: '/sitemap.xml is missing.' }) recommendations.push('Add a sitemap.xml file.') } pushDiagnosticFindings('/sitemap.xml', auxiliary.sitemapXml, findings, recommendations) @@ -146,9 +147,9 @@ export function analyzeAiReadableContent(context: AuditContext): AnalysisResult const llmsLink = context.$('link[href*="llms.txt"]').length > 0 if (llmsLink) { score += 10 - findings.push({ type: 'found', message: 'HTML head links to llms.txt.' }) + findings.push({ type: 'found', code: 'ai-readable-content.llms-txt-link.found', message: 'HTML head links to llms.txt.' }) } else { - findings.push({ type: 'info', message: 'No llms.txt link detected in <head>.' }) + findings.push({ type: 'info', code: 'ai-readable-content.llms-txt-link.missing', message: 'No llms.txt link detected in <head>.' }) recommendations.push('Add a <link> reference to /llms.txt in your document head.') } @@ -160,9 +161,9 @@ export function analyzeAiReadableContent(context: AuditContext): AnalysisResult const markdownLinkHeader = /type="?text\/markdown"?/i.test(linkHeader) if (markdownLinkTag || markdownLinkHeader) { score += 10 - findings.push({ type: 'found', message: 'Per-page Markdown source endpoint advertised (text/markdown alternate) — agents can fetch unrendered source.' }) + findings.push({ type: 'found', code: 'ai-readable-content.markdown-endpoint.found', message: 'Per-page Markdown source endpoint advertised (text/markdown alternate) — agents can fetch unrendered source.' }) } else { - findings.push({ type: 'info', message: 'No per-page Markdown source endpoint advertised (text/markdown alternate link or Link header).' }) + findings.push({ type: 'info', code: 'ai-readable-content.markdown-endpoint.missing', message: 'No per-page Markdown source endpoint advertised (text/markdown alternate link or Link header).' }) recommendations.push( `Expose a Markdown version of each page (a .md URL or content negotiation) and advertise it via <link rel="alternate" type="text/markdown">. ${specCitation('markdown-source-endpoints')}`, ) diff --git a/src/analyzers/citations.ts b/src/analyzers/citations.ts index 01e8969..3cc8762 100644 --- a/src/analyzers/citations.ts +++ b/src/analyzers/citations.ts @@ -58,13 +58,13 @@ export function analyzeCitations(context: AuditContext): AnalysisResult { if (externalLinks.length >= 8) { score += 30 - findings.push({ type: 'found', message: `Strong external citation coverage (${externalLinks.length} links).` }) + findings.push({ type: 'found', code: 'citations.external-links.strong', message: `Strong external citation coverage (${externalLinks.length} links).` }) } else if (externalLinks.length >= 3) { score += 18 - findings.push({ type: 'info', message: `Moderate external citation coverage (${externalLinks.length} links).` }) + findings.push({ type: 'info', code: 'citations.external-links.moderate', message: `Moderate external citation coverage (${externalLinks.length} links).` }) } else { score += 6 - findings.push({ type: 'missing', message: 'Limited external citations detected.' }) + findings.push({ type: 'missing', code: 'citations.external-links.low', message: 'Limited external citations detected.' }) recommendations.push('Reference authoritative third-party sources to strengthen trust signals.') } @@ -76,10 +76,10 @@ export function analyzeCitations(context: AuditContext): AnalysisResult { if (authoritativeLinks.length > 0) { score += 24 - findings.push({ type: 'found', message: 'Authoritative domain citations detected (.gov/.edu/Wikipedia).' }) + findings.push({ type: 'found', code: 'citations.authoritative-domains.found', message: 'Authoritative domain citations detected (.gov/.edu/Wikipedia).' }) } else { score += 8 - findings.push({ type: 'info', message: 'No clearly authoritative domains detected in external links.' }) + findings.push({ type: 'info', code: 'citations.authoritative-domains.none', message: 'No clearly authoritative domains detected in external links.' }) } let sameAsCount = 0 @@ -94,13 +94,13 @@ export function analyzeCitations(context: AuditContext): AnalysisResult { if (sameAsCount >= 3) { score += 24 - findings.push({ type: 'found', message: `Structured data includes ${sameAsCount} sameAs link(s).` }) + findings.push({ type: 'found', code: 'citations.sameas.strong', message: `Structured data includes ${sameAsCount} sameAs link(s).` }) } else if (sameAsCount > 0) { score += 14 - findings.push({ type: 'info', message: `Structured data includes ${sameAsCount} sameAs link(s).` }) + findings.push({ type: 'info', code: 'citations.sameas.moderate', message: `Structured data includes ${sameAsCount} sameAs link(s).` }) } else { score += 4 - findings.push({ type: 'missing', message: 'No sameAs references found in structured data.' }) + findings.push({ type: 'missing', code: 'citations.sameas.missing', message: 'No sameAs references found in structured data.' }) recommendations.push('Add sameAs references for key profiles/directories in JSON-LD.') } @@ -111,13 +111,13 @@ export function analyzeCitations(context: AuditContext): AnalysisResult { if (quality >= 0.75) { score += 22 - findings.push({ type: 'found', message: 'External anchor text quality is strong.' }) + findings.push({ type: 'found', code: 'citations.anchor-text.strong', message: 'External anchor text quality is strong.' }) } else if (quality >= 0.45) { score += 12 - findings.push({ type: 'info', message: 'Anchor text quality is moderate.' }) + findings.push({ type: 'info', code: 'citations.anchor-text.moderate', message: 'Anchor text quality is moderate.' }) } else { score += 6 - findings.push({ type: 'info', message: 'Anchor text quality is weak or generic.' }) + findings.push({ type: 'info', code: 'citations.anchor-text.low', message: 'Anchor text quality is weak or generic.' }) recommendations.push('Use descriptive external anchor text instead of generic labels.') } diff --git a/src/analyzers/content-depth.ts b/src/analyzers/content-depth.ts index d5ac40f..3946830 100644 --- a/src/analyzers/content-depth.ts +++ b/src/analyzers/content-depth.ts @@ -16,57 +16,57 @@ export function analyzeContentDepth(context: AuditContext): AnalysisResult { if (wordCount >= 1200) { score += 35 - findings.push({ type: 'found', message: `Strong visible content depth (${wordCount} words).` }) + findings.push({ type: 'found', code: 'content-depth.word-count.strong', message: `Strong visible content depth (${wordCount} words).` }) } else if (wordCount >= 500) { score += 22 - findings.push({ type: 'info', message: `Moderate content depth (${wordCount} words).` }) + findings.push({ type: 'info', code: 'content-depth.word-count.moderate', message: `Moderate content depth (${wordCount} words).` }) recommendations.push('Increase topical depth with more explanatory content.') } else { score += 8 - findings.push({ type: 'missing', message: `Low content depth (${wordCount} words).` }) + findings.push({ type: 'missing', code: 'content-depth.word-count.low', message: `Low content depth (${wordCount} words).` }) recommendations.push('Add more comprehensive copy covering key user questions.') } if (h1Count === 1) { score += 15 - findings.push({ type: 'found', message: 'Exactly one H1 detected.' }) + findings.push({ type: 'found', code: 'content-depth.h1.single', message: 'Exactly one H1 detected.' }) } else if (h1Count > 1) { score += 6 - findings.push({ type: 'info', message: `Multiple H1 elements detected (${h1Count}).` }) + findings.push({ type: 'info', code: 'content-depth.h1.multiple', message: `Multiple H1 elements detected (${h1Count}).` }) recommendations.push('Use a single primary H1 and nest additional sections under H2/H3.') } else { - findings.push({ type: 'missing', message: 'No H1 heading detected.' }) + findings.push({ type: 'missing', code: 'content-depth.h1.missing', message: 'No H1 heading detected.' }) recommendations.push('Add an H1 that clearly defines the page topic.') } if (h2Count >= 3 && h3Count >= 2) { score += 22 - findings.push({ type: 'found', message: 'Heading hierarchy (H2/H3) is well developed.' }) + findings.push({ type: 'found', code: 'content-depth.headings.strong', message: 'Heading hierarchy (H2/H3) is well developed.' }) } else if (h2Count >= 2) { score += 14 - findings.push({ type: 'info', message: 'Basic heading hierarchy detected.' }) + findings.push({ type: 'info', code: 'content-depth.headings.moderate', message: 'Basic heading hierarchy detected.' }) recommendations.push('Expand section depth with additional H3 subsections.') } else { score += 4 - findings.push({ type: 'missing', message: 'Limited heading structure detected.' }) + findings.push({ type: 'missing', code: 'content-depth.headings.low', message: 'Limited heading structure detected.' }) recommendations.push('Break content into structured H2/H3 sections for parseability.') } if (paragraphCount >= 8) { score += 16 - findings.push({ type: 'found', message: 'Substantial paragraph-level content present.' }) + findings.push({ type: 'found', code: 'content-depth.paragraphs.strong', message: 'Substantial paragraph-level content present.' }) } else if (paragraphCount >= 4) { score += 10 - findings.push({ type: 'info', message: 'Some paragraph depth detected.' }) + findings.push({ type: 'info', code: 'content-depth.paragraphs.moderate', message: 'Some paragraph depth detected.' }) } else { - findings.push({ type: 'missing', message: 'Very few paragraph blocks detected.' }) + findings.push({ type: 'missing', code: 'content-depth.paragraphs.low', message: 'Very few paragraph blocks detected.' }) } if (listCount > 0) { score += 12 - findings.push({ type: 'found', message: 'Lists detected for structured information.' }) + findings.push({ type: 'found', code: 'content-depth.lists.present', message: 'Lists detected for structured information.' }) } else { - findings.push({ type: 'info', message: 'No list structures detected.' }) + findings.push({ type: 'info', code: 'content-depth.lists.none', message: 'No list structures detected.' }) recommendations.push('Use bullet/numbered lists for key concepts and process steps.') } diff --git a/src/analyzers/content-extractability.ts b/src/analyzers/content-extractability.ts index 558f45f..5426597 100644 --- a/src/analyzers/content-extractability.ts +++ b/src/analyzers/content-extractability.ts @@ -15,13 +15,13 @@ export function analyzeContentExtractability(context: AuditContext): AnalysisRes if (ratio > 0.3) { score += 25 - findings.push({ type: 'found', message: `Strong content-to-markup ratio (${(ratio * 100).toFixed(0)}%).` }) + findings.push({ type: 'found', code: 'content-extractability.content-ratio.strong', message: `Strong content-to-markup ratio (${(ratio * 100).toFixed(0)}%).` }) } else if (ratio > 0.15) { score += 15 - findings.push({ type: 'info', message: `Moderate content-to-markup ratio (${(ratio * 100).toFixed(0)}%).` }) + findings.push({ type: 'info', code: 'content-extractability.content-ratio.moderate', message: `Moderate content-to-markup ratio (${(ratio * 100).toFixed(0)}%).` }) } else { score += 5 - findings.push({ type: 'info', message: `Low content-to-markup ratio (${(ratio * 100).toFixed(0)}%).` }) + findings.push({ type: 'info', code: 'content-extractability.content-ratio.low', message: `Low content-to-markup ratio (${(ratio * 100).toFixed(0)}%).` }) recommendations.push('Reduce boilerplate HTML and increase content density.') } } @@ -38,14 +38,14 @@ export function analyzeContentExtractability(context: AuditContext): AnalysisRes if (citableBlocks >= 5) { score += 25 - findings.push({ type: 'found', message: `${citableBlocks} citation-ready text blocks found (40-200 words each).` }) + findings.push({ type: 'found', code: 'content-extractability.citable-blocks.strong', message: `${citableBlocks} citation-ready text blocks found (40-200 words each).` }) } else if (citableBlocks >= 2) { score += 15 - findings.push({ type: 'info', message: `${citableBlocks} citation-ready text blocks found.` }) + findings.push({ type: 'info', code: 'content-extractability.citable-blocks.moderate', message: `${citableBlocks} citation-ready text blocks found.` }) recommendations.push('Add more substantive paragraphs (40-200 words) for citation extraction.') } else { score += 5 - findings.push({ type: 'missing', message: 'Few citation-ready text blocks detected.' }) + findings.push({ type: 'missing', code: 'content-extractability.citable-blocks.missing', message: 'Few citation-ready text blocks detected.' }) recommendations.push('Structure content into focused paragraphs of 40-200 words each.') } @@ -64,11 +64,11 @@ export function analyzeContentExtractability(context: AuditContext): AnalysisRes if (paywallSignals.length > 0) { score -= 20 - findings.push({ type: 'missing', message: `Content gate signals detected: ${paywallSignals.join(', ')}.` }) + findings.push({ type: 'missing', code: 'content-extractability.paywall.found', message: `Content gate signals detected: ${paywallSignals.join(', ')}.` }) recommendations.push('Ensure primary content is accessible without login/subscription for AI crawlers.') } else { score += 10 - findings.push({ type: 'found', message: 'No paywall or content gate signals detected.' }) + findings.push({ type: 'found', code: 'content-extractability.paywall.none', message: 'No paywall or content gate signals detected.' }) } // Ad density @@ -77,14 +77,14 @@ export function analyzeContentExtractability(context: AuditContext): AnalysisRes if (adCount >= 5) { score -= 15 - findings.push({ type: 'info', message: `High ad element density detected (${adCount} elements).` }) + findings.push({ type: 'info', code: 'content-extractability.ad-density.high', message: `High ad element density detected (${adCount} elements).` }) recommendations.push('Reduce ad density to improve content extractability for AI crawlers.') } else if (adCount > 0) { score += 5 - findings.push({ type: 'info', message: `Low ad density (${adCount} elements).` }) + findings.push({ type: 'info', code: 'content-extractability.ad-density.low', message: `Low ad density (${adCount} elements).` }) } else { score += 10 - findings.push({ type: 'found', message: 'No ad elements detected.' }) + findings.push({ type: 'found', code: 'content-extractability.ad-density.none', message: 'No ad elements detected.' }) } // Direct answer blocks (content immediately following H2/H3 that's 1-3 sentences) @@ -102,12 +102,12 @@ export function analyzeContentExtractability(context: AuditContext): AnalysisRes if (directAnswerCount >= 3) { score += 15 - findings.push({ type: 'found', message: `${directAnswerCount} direct-answer blocks follow headings.` }) + findings.push({ type: 'found', code: 'content-extractability.direct-answer.strong', message: `${directAnswerCount} direct-answer blocks follow headings.` }) } else if (directAnswerCount >= 1) { score += 8 - findings.push({ type: 'info', message: `${directAnswerCount} direct-answer block(s) follow headings.` }) + findings.push({ type: 'info', code: 'content-extractability.direct-answer.moderate', message: `${directAnswerCount} direct-answer block(s) follow headings.` }) } else { - findings.push({ type: 'info', message: 'No clear direct-answer blocks following headings.' }) + findings.push({ type: 'info', code: 'content-extractability.direct-answer.none', message: 'No clear direct-answer blocks following headings.' }) recommendations.push('Place concise 1-3 sentence answers immediately after H2/H3 headings.') } diff --git a/src/analyzers/content-freshness.ts b/src/analyzers/content-freshness.ts index 41a9b07..4105e7c 100644 --- a/src/analyzers/content-freshness.ts +++ b/src/analyzers/content-freshness.ts @@ -49,17 +49,17 @@ export function analyzeContentFreshness(context: AuditContext): AnalysisResult { if (months <= 3) { score += 35 - findings.push({ type: 'found', message: 'Structured data indicates recent updates (<= 3 months).' }) + findings.push({ type: 'found', code: 'content-freshness.date-modified.recent', message: 'Structured data indicates recent updates (<= 3 months).' }) } else if (months <= 12) { score += 22 - findings.push({ type: 'info', message: 'Structured data indicates updates within the last year.' }) + findings.push({ type: 'info', code: 'content-freshness.date-modified.moderate', message: 'Structured data indicates updates within the last year.' }) } else { score += 10 - findings.push({ type: 'info', message: 'Structured data suggests content may be stale.' }) + findings.push({ type: 'info', code: 'content-freshness.date-modified.stale', message: 'Structured data suggests content may be stale.' }) recommendations.push('Refresh key pages and update dateModified in structured data.') } } else { - findings.push({ type: 'missing', message: 'No dateModified field detected in structured data.' }) + findings.push({ type: 'missing', code: 'content-freshness.date-modified.missing', message: 'No dateModified field detected in structured data.' }) recommendations.push('Add dateModified to relevant structured data entities.') } @@ -69,13 +69,13 @@ export function analyzeContentFreshness(context: AuditContext): AnalysisResult { const months = monthsAgo(parsedHeaderDate) if (months <= 3) { score += 20 - findings.push({ type: 'found', message: 'HTTP Last-Modified header is recent.' }) + findings.push({ type: 'found', code: 'content-freshness.last-modified.recent', message: 'HTTP Last-Modified header is recent.' }) } else { score += 12 - findings.push({ type: 'info', message: 'HTTP Last-Modified header exists but is older.' }) + findings.push({ type: 'info', code: 'content-freshness.last-modified.older', message: 'HTTP Last-Modified header exists but is older.' }) } } else { - findings.push({ type: 'info', message: 'No usable Last-Modified response header detected.' }) + findings.push({ type: 'info', code: 'content-freshness.last-modified.missing', message: 'No usable Last-Modified response header detected.' }) } const sitemapState = context.auxiliary?.sitemapXml?.state @@ -87,24 +87,24 @@ export function analyzeContentFreshness(context: AuditContext): AnalysisResult { const months = monthsAgo(sitemapDate) if (months <= 3) { score += 22 - findings.push({ type: 'found', message: 'Sitemap lastmod indicates recent updates.' }) + findings.push({ type: 'found', code: 'content-freshness.sitemap.recent', message: 'Sitemap lastmod indicates recent updates.' }) } else { score += 12 - findings.push({ type: 'info', message: 'Sitemap lastmod exists but may be stale.' }) + findings.push({ type: 'info', code: 'content-freshness.sitemap.stale', message: 'Sitemap lastmod exists but may be stale.' }) } } else { score += 4 - findings.push({ type: 'info', message: 'Sitemap found but no matching lastmod for this URL.' }) + findings.push({ type: 'info', code: 'content-freshness.sitemap.no-match', message: 'Sitemap found but no matching lastmod for this URL.' }) recommendations.push('Add a <lastmod> entry for this URL in sitemap.xml.') } } else if (sitemapState === 'timeout') { score += 8 - findings.push({ type: 'timeout', message: 'Could not reliably fetch sitemap.xml.' }) + findings.push({ type: 'timeout', code: 'content-freshness.sitemap.timeout', message: 'Could not reliably fetch sitemap.xml.' }) } else if (sitemapState === 'unreachable') { score += 8 - findings.push({ type: 'unreachable', message: 'Could not reliably fetch sitemap.xml.' }) + findings.push({ type: 'unreachable', code: 'content-freshness.sitemap.unreachable', message: 'Could not reliably fetch sitemap.xml.' }) } else { - findings.push({ type: 'missing', message: 'sitemap.xml is missing or inaccessible.' }) + findings.push({ type: 'missing', code: 'content-freshness.sitemap.missing', message: 'sitemap.xml is missing or inaccessible.' }) } const yearMatch = context.textContent.match(/(?:©|copyright)?\s*(20\d{2})/i) @@ -113,14 +113,14 @@ export function analyzeContentFreshness(context: AuditContext): AnalysisResult { const currentYear = new Date().getUTCFullYear() if (year >= currentYear - 1) { score += 23 - findings.push({ type: 'found', message: `Recent copyright year detected (${year}).` }) + findings.push({ type: 'found', code: 'content-freshness.copyright.recent', message: `Recent copyright year detected (${year}).` }) } else { score += 12 - findings.push({ type: 'info', message: `Older copyright year detected (${year}).` }) + findings.push({ type: 'info', code: 'content-freshness.copyright.older', message: `Older copyright year detected (${year}).` }) } } else { score += 6 - findings.push({ type: 'info', message: 'No copyright year signal detected.' }) + findings.push({ type: 'info', code: 'content-freshness.copyright.missing', message: 'No copyright year signal detected.' }) } return { diff --git a/src/analyzers/definition-blocks.ts b/src/analyzers/definition-blocks.ts index 51f1ca4..b06e4ff 100644 --- a/src/analyzers/definition-blocks.ts +++ b/src/analyzers/definition-blocks.ts @@ -18,12 +18,12 @@ export function analyzeDefinitionBlocks(context: AuditContext): AnalysisResult { if (definitionHeadingCount >= 2) { score += 30 - findings.push({ type: 'found', message: 'Multiple definition-style headings detected.' }) + findings.push({ type: 'found', code: 'definition-blocks.headings.multiple', message: 'Multiple definition-style headings detected.' }) } else if (definitionHeadingCount === 1) { score += 18 - findings.push({ type: 'info', message: 'One definition-style heading detected.' }) + findings.push({ type: 'info', code: 'definition-blocks.headings.single', message: 'One definition-style heading detected.' }) } else { - findings.push({ type: 'missing', message: 'No definition-style headings detected.' }) + findings.push({ type: 'missing', code: 'definition-blocks.headings.missing', message: 'No definition-style headings detected.' }) recommendations.push('Add sections like "What is..." and "How to..." for direct-answer relevance.') } @@ -37,26 +37,26 @@ export function analyzeDefinitionBlocks(context: AuditContext): AnalysisResult { if (stepLists > 0) { score += 24 - findings.push({ type: 'found', message: 'Numbered step-by-step list(s) detected.' }) + findings.push({ type: 'found', code: 'definition-blocks.lists.found', message: 'Numbered step-by-step list(s) detected.' }) } else { - findings.push({ type: 'info', message: 'No substantial ordered step lists detected.' }) + findings.push({ type: 'info', code: 'definition-blocks.lists.none', message: 'No substantial ordered step lists detected.' }) recommendations.push('Include ordered steps for procedural topics.') } const schemaTypes = extractSchemaTypes(context.structuredData) if (schemaTypes.has('HowTo')) { score += 26 - findings.push({ type: 'found', message: 'HowTo schema detected.' }) + findings.push({ type: 'found', code: 'definition-blocks.schema.found', message: 'HowTo schema detected.' }) } else { - findings.push({ type: 'missing', message: 'HowTo schema not detected.' }) + findings.push({ type: 'missing', code: 'definition-blocks.schema.missing', message: 'HowTo schema not detected.' }) recommendations.push('Add HowTo schema where instructional content exists.') } if (context.$('dl').length > 0) { score += 20 - findings.push({ type: 'found', message: 'Definition list (<dl>) elements detected.' }) + findings.push({ type: 'found', code: 'definition-blocks.dl.found', message: 'Definition list (<dl>) elements detected.' }) } else { - findings.push({ type: 'info', message: 'No <dl> definition lists detected.' }) + findings.push({ type: 'info', code: 'definition-blocks.dl.none', message: 'No <dl> definition lists detected.' }) } return { diff --git a/src/analyzers/eeat-signals.ts b/src/analyzers/eeat-signals.ts index 584792f..d781975 100644 --- a/src/analyzers/eeat-signals.ts +++ b/src/analyzers/eeat-signals.ts @@ -35,13 +35,13 @@ export function analyzeEeatSignals(context: AuditContext): AnalysisResult { if (credentialedPersons.length > 0) { score += 25 - findings.push({ type: 'found', message: 'Person schema with credentials detected.' }) + findings.push({ type: 'found', code: 'eeat-signals.author.credentialed', message: 'Person schema with credentials detected.' }) } else if (persons.length > 0) { score += 12 - findings.push({ type: 'info', message: 'Person schema found but lacks credential properties.' }) + findings.push({ type: 'info', code: 'eeat-signals.author.no-credentials', message: 'Person schema found but lacks credential properties.' }) recommendations.push('Add jobTitle, alumniOf, or hasCredential to Person schema.') } else { - findings.push({ type: 'missing', message: 'No Person schema found.' }) + findings.push({ type: 'missing', code: 'eeat-signals.author.missing', message: 'No Person schema found.' }) recommendations.push('Add Person schema with expertise signals for key team members.') } @@ -49,9 +49,9 @@ export function analyzeEeatSignals(context: AuditContext): AnalysisResult { const authorMeta = context.$('meta[name="author"]').attr('content') if (authorMeta && authorMeta.trim()) { score += 15 - findings.push({ type: 'found', message: `Author meta tag found: "${authorMeta.trim()}".` }) + findings.push({ type: 'found', code: 'eeat-signals.author-meta.found', message: `Author meta tag found: "${authorMeta.trim()}".` }) } else { - findings.push({ type: 'missing', message: 'No <meta name="author"> tag detected.' }) + findings.push({ type: 'missing', code: 'eeat-signals.author-meta.missing', message: 'No <meta name="author"> tag detected.' }) recommendations.push('Add a meta author tag to identify content authorship.') } @@ -59,9 +59,9 @@ export function analyzeEeatSignals(context: AuditContext): AnalysisResult { const schemaTypes = extractSchemaTypes(context.structuredData) if (schemaTypes.has('Review') || schemaTypes.has('AggregateRating')) { score += 20 - findings.push({ type: 'found', message: 'Review or AggregateRating schema detected.' }) + findings.push({ type: 'found', code: 'eeat-signals.review.found', message: 'Review or AggregateRating schema detected.' }) } else { - findings.push({ type: 'info', message: 'No Review or AggregateRating schema found.' }) + findings.push({ type: 'info', code: 'eeat-signals.review.missing', message: 'No Review or AggregateRating schema found.' }) recommendations.push('Add Review or AggregateRating schema if customer reviews exist.') } @@ -81,13 +81,13 @@ export function analyzeEeatSignals(context: AuditContext): AnalysisResult { if (trustLinkCount >= 2) { score += 15 - findings.push({ type: 'found', message: 'Trust page links detected (privacy, terms, about).' }) + findings.push({ type: 'found', code: 'eeat-signals.trust-links.strong', message: 'Trust page links detected (privacy, terms, about).' }) } else if (trustLinkCount === 1) { score += 8 - findings.push({ type: 'info', message: 'Some trust page links detected.' }) + findings.push({ type: 'info', code: 'eeat-signals.trust-links.partial', message: 'Some trust page links detected.' }) recommendations.push('Add links to privacy policy, terms of service, and about page.') } else { - findings.push({ type: 'missing', message: 'No trust page links detected.' }) + findings.push({ type: 'missing', code: 'eeat-signals.trust-links.missing', message: 'No trust page links detected.' }) recommendations.push('Add footer links to privacy, terms, and about pages.') } @@ -105,13 +105,13 @@ export function analyzeEeatSignals(context: AuditContext): AnalysisResult { if (orgWithPeople.length > 0) { score += 25 - findings.push({ type: 'found', message: 'Organization schema includes founder/employee signals.' }) + findings.push({ type: 'found', code: 'eeat-signals.organization.with-people', message: 'Organization schema includes founder/employee signals.' }) } else if (orgs.length > 0) { score += 10 - findings.push({ type: 'info', message: 'Organization schema found but lacks people associations.' }) + findings.push({ type: 'info', code: 'eeat-signals.organization.no-people', message: 'Organization schema found but lacks people associations.' }) recommendations.push('Add founder or employee properties to Organization schema.') } else { - findings.push({ type: 'missing', message: 'No Organization schema detected.' }) + findings.push({ type: 'missing', code: 'eeat-signals.organization.missing', message: 'No Organization schema detected.' }) } return { diff --git a/src/analyzers/entity-consistency.ts b/src/analyzers/entity-consistency.ts index 9c36dfe..1c68ce0 100644 --- a/src/analyzers/entity-consistency.ts +++ b/src/analyzers/entity-consistency.ts @@ -57,18 +57,18 @@ export function analyzeEntityConsistency(context: AuditContext): AnalysisResult const uniqueCandidates = [...new Set(normalizedCandidates)] if (!uniqueCandidates.length) { - findings.push({ type: 'missing', message: 'Could not determine a consistent business entity name.' }) + findings.push({ type: 'missing', code: 'entity-consistency.name.missing', message: 'Could not determine a consistent business entity name.' }) recommendations.push('Expose business name consistently in title tags and JSON-LD.') } else if (uniqueCandidates.length === 1) { score += 40 - findings.push({ type: 'found', message: 'Business naming looks consistent across key metadata.' }) + findings.push({ type: 'found', code: 'entity-consistency.name.single', message: 'Business naming looks consistent across key metadata.' }) } else if (uniqueCandidates.length === 2) { score += 24 - findings.push({ type: 'info', message: 'Minor business name inconsistencies found across metadata.' }) + findings.push({ type: 'info', code: 'entity-consistency.name.moderate', message: 'Minor business name inconsistencies found across metadata.' }) recommendations.push('Align title, og:title, and schema name fields to the same canonical brand name.') } else { score += 12 - findings.push({ type: 'missing', message: 'Business naming appears inconsistent across sources.' }) + findings.push({ type: 'missing', code: 'entity-consistency.name.multiple', message: 'Business naming appears inconsistent across sources.' }) recommendations.push('Standardize brand/entity naming in HTML metadata and JSON-LD.') } @@ -76,18 +76,18 @@ export function analyzeEntityConsistency(context: AuditContext): AnalysisResult const rawTitle = (context.pageTitle || '').trim() if (rawTitle.length > 0 && rawTitle.length <= 70) { score += 10 - findings.push({ type: 'found', message: `Page title is ${rawTitle.length} characters (within 70-char limit).` }) + findings.push({ type: 'found', code: 'entity-consistency.title.ok', message: `Page title is ${rawTitle.length} characters (within 70-char limit).` }) } else if (rawTitle.length > 70) { - findings.push({ type: 'info', message: `Page title is ${rawTitle.length} characters (exceeds 70-char limit).` }) + findings.push({ type: 'info', code: 'entity-consistency.title.long', message: `Page title is ${rawTitle.length} characters (exceeds 70-char limit).` }) recommendations.push('Shorten the page title to 70 characters or fewer to avoid truncation in AI citations.') } const canonicalHref = context.$('link[rel="canonical"]').attr('href') if (canonicalHref) { score += 20 - findings.push({ type: 'found', message: 'Canonical URL tag is present.' }) + findings.push({ type: 'found', code: 'entity-consistency.canonical.present', message: 'Canonical URL tag is present.' }) } else { - findings.push({ type: 'missing', message: 'Canonical URL tag is missing.' }) + findings.push({ type: 'missing', code: 'entity-consistency.canonical.missing', message: 'Canonical URL tag is missing.' }) recommendations.push('Add a canonical link tag to declare the primary page URL.') } @@ -105,13 +105,13 @@ export function analyzeEntityConsistency(context: AuditContext): AnalysisResult if (emailOverlap || phoneOverlap) { score += 40 - findings.push({ type: 'found', message: 'Contact information appears consistent between schema and page content.' }) + findings.push({ type: 'found', code: 'entity-consistency.contact.ok', message: 'Contact information appears consistent between schema and page content.' }) } else if (schemaContacts.emails.length || schemaContacts.phones.length) { score += 16 - findings.push({ type: 'info', message: 'Schema contact details were found but consistency is unclear in visible content.' }) + findings.push({ type: 'info', code: 'entity-consistency.contact.partial', message: 'Schema contact details were found but consistency is unclear in visible content.' }) recommendations.push('Mirror key contact details in visible content and JSON-LD.') } else { - findings.push({ type: 'missing', message: 'No reliable contact details found in structured data.' }) + findings.push({ type: 'missing', code: 'entity-consistency.contact.missing', message: 'No reliable contact details found in structured data.' }) recommendations.push('Add email/telephone contact fields in LocalBusiness schema.') } diff --git a/src/analyzers/faq-content.ts b/src/analyzers/faq-content.ts index f86323b..ce116bb 100644 --- a/src/analyzers/faq-content.ts +++ b/src/analyzers/faq-content.ts @@ -9,21 +9,21 @@ export function analyzeFaqContent(context: AuditContext): AnalysisResult { const schemaTypes = extractSchemaTypes(context.structuredData) if (schemaTypes.has('FAQPage')) { score += 34 - findings.push({ type: 'found', message: 'FAQPage schema detected.' }) + findings.push({ type: 'found', code: 'faq-content.faqpage.present', message: 'FAQPage schema detected.' }) } else { - findings.push({ type: 'missing', message: 'FAQPage schema not detected.' }) + findings.push({ type: 'missing', code: 'faq-content.faqpage.missing', message: 'FAQPage schema not detected.' }) recommendations.push('Add FAQPage schema for key question-and-answer content.') } const detailsCount = context.$('details > summary').length if (detailsCount >= 3) { score += 24 - findings.push({ type: 'found', message: `Detected ${detailsCount} FAQ details blocks.` }) + findings.push({ type: 'found', code: 'faq-content.details.multiple', message: `Detected ${detailsCount} FAQ details blocks.` }) } else if (detailsCount > 0) { score += 14 - findings.push({ type: 'info', message: `Detected ${detailsCount} details-based FAQ block(s).` }) + findings.push({ type: 'info', code: 'faq-content.details.single', message: `Detected ${detailsCount} details-based FAQ block(s).` }) } else { - findings.push({ type: 'info', message: 'No details/summary FAQ blocks detected.' }) + findings.push({ type: 'info', code: 'faq-content.details.none', message: 'No details/summary FAQ blocks detected.' }) } let questionHeadingCount = 0 @@ -36,12 +36,12 @@ export function analyzeFaqContent(context: AuditContext): AnalysisResult { if (questionHeadingCount >= 3) { score += 24 - findings.push({ type: 'found', message: 'Multiple question-style headings detected.' }) + findings.push({ type: 'found', code: 'faq-content.headings.multiple', message: 'Multiple question-style headings detected.' }) } else if (questionHeadingCount > 0) { score += 12 - findings.push({ type: 'info', message: 'A small number of question headings detected.' }) + findings.push({ type: 'info', code: 'faq-content.headings.low', message: 'A small number of question headings detected.' }) } else { - findings.push({ type: 'missing', message: 'No explicit question headings detected.' }) + findings.push({ type: 'missing', code: 'faq-content.headings.missing', message: 'No explicit question headings detected.' }) recommendations.push('Use question-style headings to match conversational prompts.') } @@ -52,12 +52,12 @@ export function analyzeFaqContent(context: AuditContext): AnalysisResult { if (qaPairs >= 3) { score += 18 - findings.push({ type: 'found', message: 'FAQ content includes multiple question-answer pairs.' }) + findings.push({ type: 'found', code: 'faq-content.qa-pairs.multiple', message: 'FAQ content includes multiple question-answer pairs.' }) } else if (qaPairs > 0) { score += 10 - findings.push({ type: 'info', message: 'FAQ pairs exist but are limited in count.' }) + findings.push({ type: 'info', code: 'faq-content.qa-pairs.low', message: 'FAQ pairs exist but are limited in count.' }) } else { - findings.push({ type: 'info', message: 'Question-answer pairing appears limited.' }) + findings.push({ type: 'info', code: 'faq-content.qa-pairs.none', message: 'Question-answer pairing appears limited.' }) } return { diff --git a/src/analyzers/geographic-signals.ts b/src/analyzers/geographic-signals.ts index 9e9f4cf..8f0bb24 100644 --- a/src/analyzers/geographic-signals.ts +++ b/src/analyzers/geographic-signals.ts @@ -21,49 +21,49 @@ export function analyzeGeographicSignals(context: AuditContext): AnalysisResult const schemaTypes = extractSchemaTypes(context.structuredData) if (schemaTypes.has('LocalBusiness')) { score += 32 - findings.push({ type: 'found', message: 'LocalBusiness schema detected.' }) + findings.push({ type: 'found', code: 'geographic-signals.localbusiness-schema.found', message: 'LocalBusiness schema detected.' }) } else { - findings.push({ type: 'missing', message: 'LocalBusiness schema not detected.' }) + findings.push({ type: 'missing', code: 'geographic-signals.localbusiness-schema.missing', message: 'LocalBusiness schema not detected.' }) recommendations.push('Add LocalBusiness schema for local search relevance.') } if (hasGeoInSchema(context.structuredData)) { score += 22 - findings.push({ type: 'found', message: 'GeoCoordinates found in structured data.' }) + findings.push({ type: 'found', code: 'geographic-signals.geo-coordinates.found', message: 'GeoCoordinates found in structured data.' }) } else { - findings.push({ type: 'info', message: 'No geo coordinates found in structured data.' }) + findings.push({ type: 'info', code: 'geographic-signals.geo-coordinates.missing', message: 'No geo coordinates found in structured data.' }) recommendations.push('Add geo coordinates to LocalBusiness schema.') } if (hasAddressInSchema(context.structuredData)) { score += 18 - findings.push({ type: 'found', message: 'Postal address found in structured data.' }) + findings.push({ type: 'found', code: 'geographic-signals.postal-address.found', message: 'Postal address found in structured data.' }) } else { - findings.push({ type: 'info', message: 'No postal address found in structured data.' }) + findings.push({ type: 'info', code: 'geographic-signals.postal-address.missing', message: 'No postal address found in structured data.' }) } if (hasAreaServed(context.structuredData)) { score += 14 - findings.push({ type: 'found', message: 'areaServed signal detected in structured data.' }) + findings.push({ type: 'found', code: 'geographic-signals.area-served.found', message: 'areaServed signal detected in structured data.' }) } else { - findings.push({ type: 'missing', message: 'No areaServed signal detected.' }) + findings.push({ type: 'missing', code: 'geographic-signals.area-served.missing', message: 'No areaServed signal detected.' }) recommendations.push('Declare areaServed to clarify geographic coverage.') } const hasGeoMeta = context.$('meta[name^="geo."]').length > 0 if (hasGeoMeta) { score += 8 - findings.push({ type: 'found', message: 'Geo meta tags detected.' }) + findings.push({ type: 'found', code: 'geographic-signals.geo-meta.found', message: 'Geo meta tags detected.' }) } else { - findings.push({ type: 'info', message: 'Geo meta tags not detected.' }) + findings.push({ type: 'info', code: 'geographic-signals.geo-meta.missing', message: 'Geo meta tags not detected.' }) } const addressPattern = /(\b\d{1,5}\s+[A-Za-z0-9.'\-\s]+,?\s+[A-Za-z.'\-\s]+,?\s+[A-Z]{2}\b)/i if (addressPattern.test(context.textContent)) { score += 12 - findings.push({ type: 'found', message: 'Visible content includes geographic/location signals.' }) + findings.push({ type: 'found', code: 'geographic-signals.visible-location.found', message: 'Visible content includes geographic/location signals.' }) } else { - findings.push({ type: 'info', message: 'Visible location signals appear limited.' }) + findings.push({ type: 'info', code: 'geographic-signals.visible-location.missing', message: 'Visible location signals appear limited.' }) recommendations.push('Include service area/city signals in visible content.') } diff --git a/src/analyzers/lighthouse.ts b/src/analyzers/lighthouse.ts index 343ce72..33b13e0 100644 --- a/src/analyzers/lighthouse.ts +++ b/src/analyzers/lighthouse.ts @@ -98,7 +98,7 @@ export async function analyzeLighthouse(context: AuditContext): Promise<Analysis return { score: 0, - findings: [{ type: isAbort ? 'timeout' : 'unreachable', message }], + findings: [{ type: isAbort ? 'timeout' : 'unreachable', code: 'lighthouse.psi.unreachable', message }], recommendations: [ 'Confirm the URL is publicly reachable from Google\'s infrastructure (PSI cannot audit localhost or auth-walled pages). Set PAGESPEED_API_KEY to lift anonymous rate limits.', ], @@ -114,7 +114,7 @@ export async function analyzeLighthouse(context: AuditContext): Promise<Analysis const label = category?.title ?? id if (typeof rawScore !== 'number') { - findings.push({ type: 'info', message: `Lighthouse did not return a score for ${label}.` }) + findings.push({ type: 'info', code: 'lighthouse.category.missing', message: `Lighthouse did not return a score for ${label}.` }) continue } @@ -122,6 +122,7 @@ export async function analyzeLighthouse(context: AuditContext): Promise<Analysis categoryScores.push(percent) findings.push({ type: classifyByScore(percent), + code: 'lighthouse.category.score', message: `${label}: ${percent}/100`, }) } @@ -129,7 +130,7 @@ export async function analyzeLighthouse(context: AuditContext): Promise<Analysis if (categoryScores.length === 0) { return { score: 0, - findings: [...findings, { type: 'unreachable', message: 'Lighthouse returned no category scores.' }], + findings: [...findings, { type: 'unreachable', code: 'lighthouse.category.none', message: 'Lighthouse returned no category scores.' }], recommendations: ['Confirm the URL is publicly reachable from Google PageSpeed Insights.'], } } diff --git a/src/analyzers/named-entities.ts b/src/analyzers/named-entities.ts index aeb96dc..a80969c 100644 --- a/src/analyzers/named-entities.ts +++ b/src/analyzers/named-entities.ts @@ -25,18 +25,18 @@ export function analyzeNamedEntities(context: AuditContext): AnalysisResult { if (occurrences >= 3) { score += 36 - findings.push({ type: 'found', message: `Brand/entity name appears ${occurrences} times in content.` }) + findings.push({ type: 'found', code: 'named-entities.brand-name.strong', message: `Brand/entity name appears ${occurrences} times in content.` }) } else if (occurrences > 0) { score += 20 - findings.push({ type: 'info', message: `Brand/entity name appears ${occurrences} time(s) in content.` }) + findings.push({ type: 'info', code: 'named-entities.brand-name.low', message: `Brand/entity name appears ${occurrences} time(s) in content.` }) recommendations.push('Use consistent brand naming throughout key content sections.') } else { score += 6 - findings.push({ type: 'missing', message: 'Brand/entity name not clearly present in visible text.' }) + findings.push({ type: 'missing', code: 'named-entities.brand-name.missing', message: 'Brand/entity name not clearly present in visible text.' }) recommendations.push('Include business/entity name in key headings and explanatory text.') } } else { - findings.push({ type: 'missing', message: 'Could not infer a primary business/entity name.' }) + findings.push({ type: 'missing', code: 'named-entities.entity-name.missing', message: 'Could not infer a primary business/entity name.' }) recommendations.push('Ensure schema and titles expose a clear entity name.') } @@ -56,23 +56,23 @@ export function analyzeNamedEntities(context: AuditContext): AnalysisResult { if (knowsAboutCount > 0 || founderCount > 0) { score += 34 - findings.push({ type: 'found', message: 'Schema includes entity knowledge/founder signals.' }) + findings.push({ type: 'found', code: 'named-entities.knows-about.present', message: 'Schema includes entity knowledge/founder signals.' }) } else { score += 10 - findings.push({ type: 'info', message: 'No explicit knowsAbout/founder entity signals in schema.' }) + findings.push({ type: 'info', code: 'named-entities.knows-about.missing', message: 'No explicit knowsAbout/founder entity signals in schema.' }) recommendations.push('Add knowsAbout and founder/person associations in schema where relevant.') } const density = properNounDensity(text) if (density >= 0.08) { score += 30 - findings.push({ type: 'found', message: 'Proper noun density indicates strong entity context.' }) + findings.push({ type: 'found', code: 'named-entities.proper-noun-density.strong', message: 'Proper noun density indicates strong entity context.' }) } else if (density >= 0.04) { score += 18 - findings.push({ type: 'info', message: 'Moderate proper noun density detected.' }) + findings.push({ type: 'info', code: 'named-entities.proper-noun-density.moderate', message: 'Moderate proper noun density detected.' }) } else { score += 8 - findings.push({ type: 'info', message: 'Low proper noun density detected.' }) + findings.push({ type: 'info', code: 'named-entities.proper-noun-density.low', message: 'Low proper noun density detected.' }) recommendations.push('Add explicit entities: brands, places, people, and product/service names.') } diff --git a/src/analyzers/schema-completeness.ts b/src/analyzers/schema-completeness.ts index 38e9156..ec1b9ac 100644 --- a/src/analyzers/schema-completeness.ts +++ b/src/analyzers/schema-completeness.ts @@ -51,7 +51,7 @@ export function analyzeSchemaCompleteness(context: AuditContext): AnalysisResult const detection = detectSiteCategory(context) if (!structuredData.length) { - findings.push({ type: 'missing', message: 'No structured data found to evaluate completeness.' }) + findings.push({ type: 'missing', code: 'schema-completeness.schema.none', message: 'No structured data found to evaluate completeness.' }) // Issue #33: tailor the missing-schema recommendation to the detected // category so the suggestion is actually applicable. recommendations.push( @@ -79,15 +79,15 @@ export function analyzeSchemaCompleteness(context: AuditContext): AnalysisResult const pct = best.score if (pct >= 0.75) { checksScore += 100 - findings.push({ type: 'found', message: `LocalBusiness schema is ${Math.round(pct * 100)}% complete.` }) + findings.push({ type: 'found', code: 'schema-completeness.local-business.strong', message: `LocalBusiness schema is ${Math.round(pct * 100)}% complete.` }) } else if (pct >= 0.5) { checksScore += 60 const missing = LOCAL_BUSINESS_PROPS.filter((p) => !best.item?.[p]) - findings.push({ type: 'info', message: `LocalBusiness schema is ${Math.round(pct * 100)}% complete.` }) + findings.push({ type: 'info', code: 'schema-completeness.local-business.partial', message: `LocalBusiness schema is ${Math.round(pct * 100)}% complete.` }) recommendations.push(`Add missing LocalBusiness properties: ${missing.join(', ')}.`) } else { checksScore += 25 - findings.push({ type: 'missing', message: `LocalBusiness schema is only ${Math.round(pct * 100)}% complete.` }) + findings.push({ type: 'missing', code: 'schema-completeness.local-business.low', message: `LocalBusiness schema is only ${Math.round(pct * 100)}% complete.` }) recommendations.push('Expand LocalBusiness schema with address, telephone, openingHours, geo, etc.') } } @@ -110,15 +110,15 @@ export function analyzeSchemaCompleteness(context: AuditContext): AnalysisResult if (substantiveAnswers.length >= FAQ_MIN_PAIRS) { checksScore += 100 - findings.push({ type: 'found', message: `FAQPage has ${questions.length} Q&A pairs with substantive answers.` }) + findings.push({ type: 'found', code: 'schema-completeness.faqpage.strong', message: `FAQPage has ${questions.length} Q&A pairs with substantive answers.` }) } else { checksScore += 65 - findings.push({ type: 'info', message: `FAQPage has ${questions.length} questions but some answers are thin.` }) + findings.push({ type: 'info', code: 'schema-completeness.faqpage.partial', message: `FAQPage has ${questions.length} questions but some answers are thin.` }) recommendations.push('Expand FAQ answers to at least 15 words each for citation readiness.') } } else { checksScore += 35 - findings.push({ type: 'info', message: `FAQPage has only ${questions.length} Q&A pair(s) (recommend >= ${FAQ_MIN_PAIRS}).` }) + findings.push({ type: 'info', code: 'schema-completeness.faqpage.low', message: `FAQPage has only ${questions.length} Q&A pair(s) (recommend >= ${FAQ_MIN_PAIRS}).` }) recommendations.push(`Add at least ${FAQ_MIN_PAIRS} question-answer pairs to FAQPage schema.`) } } @@ -135,10 +135,10 @@ export function analyzeSchemaCompleteness(context: AuditContext): AnalysisResult if (stepsWithText.length >= HOWTO_MIN_STEPS) { checksScore += 100 - findings.push({ type: 'found', message: `HowTo schema has ${stepsWithText.length} detailed steps.` }) + findings.push({ type: 'found', code: 'schema-completeness.howto.strong', message: `HowTo schema has ${stepsWithText.length} detailed steps.` }) } else { checksScore += 40 - findings.push({ type: 'info', message: `HowTo schema has only ${stepsWithText.length} step(s).` }) + findings.push({ type: 'info', code: 'schema-completeness.howto.partial', message: `HowTo schema has only ${stepsWithText.length} step(s).` }) recommendations.push(`Add at least ${HOWTO_MIN_STEPS} steps with descriptive text to HowTo schema.`) } } @@ -156,15 +156,15 @@ export function analyzeSchemaCompleteness(context: AuditContext): AnalysisResult const pct = best.score if (pct >= 0.7) { checksScore += 100 - findings.push({ type: 'found', message: `Organization schema is ${Math.round(pct * 100)}% complete.` }) + findings.push({ type: 'found', code: 'schema-completeness.organization.strong', message: `Organization schema is ${Math.round(pct * 100)}% complete.` }) } else if (pct >= 0.4) { checksScore += 55 const missing = ORGANIZATION_PROPS.filter((p) => !best.item?.[p]) - findings.push({ type: 'info', message: `Organization schema is ${Math.round(pct * 100)}% complete.` }) + findings.push({ type: 'info', code: 'schema-completeness.organization.partial', message: `Organization schema is ${Math.round(pct * 100)}% complete.` }) recommendations.push(`Add missing Organization properties: ${missing.join(', ')}.`) } else { checksScore += 20 - findings.push({ type: 'missing', message: `Organization schema is only ${Math.round(pct * 100)}% complete.` }) + findings.push({ type: 'missing', code: 'schema-completeness.organization.low', message: `Organization schema is only ${Math.round(pct * 100)}% complete.` }) } } @@ -173,10 +173,10 @@ export function analyzeSchemaCompleteness(context: AuditContext): AnalysisResult const avgProps = structuredData.reduce((sum, item) => sum + Object.keys(item).length, 0) / structuredData.length if (avgProps >= 8) { score = 70 - findings.push({ type: 'info', message: 'Structured data has reasonable depth but uses no recognized high-priority schema types.' }) + findings.push({ type: 'info', code: 'schema-completeness.schema-depth.moderate', message: 'Structured data has reasonable depth but uses no recognized high-priority schema types.' }) } else { score = 30 - findings.push({ type: 'info', message: 'Structured data present but shallow and uses no recognized schema types.' }) + findings.push({ type: 'info', code: 'schema-completeness.schema-depth.low', message: 'Structured data present but shallow and uses no recognized schema types.' }) } // Issue #33: recommendation reflects the detected site category instead of diff --git a/src/analyzers/schema-validity.ts b/src/analyzers/schema-validity.ts index b7aec82..e592b62 100644 --- a/src/analyzers/schema-validity.ts +++ b/src/analyzers/schema-validity.ts @@ -21,6 +21,7 @@ export function analyzeSchemaValidity(context: AuditContext): AnalysisResult { if (totalBlocks === 0) { findings.push({ type: 'info', + code: 'schema-validity.json-ld.none', message: 'No JSON-LD blocks found; nothing to validate. Presence of structured data is scored by the structured-data factor.', }) return { score: 100, findings, recommendations } @@ -33,6 +34,7 @@ export function analyzeSchemaValidity(context: AuditContext): AnalysisResult { score -= 5 findings.push({ type: 'missing', + code: 'schema-validity.block.empty', message: `JSON-LD block #${block.index + 1} is empty or whitespace-only.`, }) recommendations.push(`Remove the empty <script type="application/ld+json"> block at position ${block.index + 1}, or populate it with valid JSON-LD.`) @@ -44,6 +46,7 @@ export function analyzeSchemaValidity(context: AuditContext): AnalysisResult { score -= 15 findings.push({ type: 'missing', + code: 'schema-validity.block.invalid', message: `JSON-LD block #${block.index + 1} has invalid JSON syntax: ${block.parseError}`, }) recommendations.push(`Fix JSON syntax error in block #${block.index + 1} (${block.parseError}). Invalid JSON is silently dropped by Google and AI crawlers.`) @@ -68,6 +71,7 @@ export function analyzeSchemaValidity(context: AuditContext): AnalysisResult { score -= 25 findings.push({ type: 'missing', + code: 'schema-validity.singleton.duplicate', message: `Duplicate singleton @type "${type}" appears ${positions.length} times (blocks #${positions.join(', #')}). Google Search Console flags this as "Duplicate field ${type}" and invalidates rich results.`, }) recommendations.push(`Remove duplicate "${type}" — keep one canonical block. Duplicate "${type}" entries cause Google to drop both from rich results.`) @@ -88,6 +92,7 @@ export function analyzeSchemaValidity(context: AuditContext): AnalysisResult { if (findings.length === 0) { findings.push({ type: 'found', + code: 'schema-validity.block.valid', message: `All ${totalBlocks} JSON-LD block(s) are valid and unique.`, }) } diff --git a/src/analyzers/snippet-eligibility.ts b/src/analyzers/snippet-eligibility.ts index e2e7989..cc41f1c 100644 --- a/src/analyzers/snippet-eligibility.ts +++ b/src/analyzers/snippet-eligibility.ts @@ -138,6 +138,7 @@ export function analyzeSnippetEligibility(context: AuditContext): AnalysisResult if (sources.length === 0) { findings.push({ type: 'found', + code: 'snippet-eligibility.directives.none', message: 'No restrictive indexing directives found. Page is eligible for indexing and AI snippet features per Google.', }) return { score: 100, findings, recommendations } @@ -151,6 +152,7 @@ export function analyzeSnippetEligibility(context: AuditContext): AnalysisResult const label = directives.none ? '"none" (implies noindex, nofollow)' : '"noindex"' findings.push({ type: 'missing', + code: 'snippet-eligibility.noindex.present', message: `Page declares ${label} (${directives.raw}). Google explicitly requires a page to be indexed to appear in AI Overviews and AI Mode.`, }) recommendations.push( @@ -162,6 +164,7 @@ export function analyzeSnippetEligibility(context: AuditContext): AnalysisResult score = 0 findings.push({ type: 'missing', + code: 'snippet-eligibility.nosnippet.present', message: `Page declares "nosnippet" (${directives.raw}). Per Google's AI optimization guide, "a page must be indexed and eligible to be shown in Google Search with a snippet" to appear in AI features — nosnippet makes the page ineligible.`, }) recommendations.push( @@ -173,6 +176,7 @@ export function analyzeSnippetEligibility(context: AuditContext): AnalysisResult score = 0 findings.push({ type: 'missing', + code: 'snippet-eligibility.max-snippet.zero', message: `Page declares "max-snippet:0" (${directives.raw}), which is equivalent to nosnippet and blocks Google's AI features.`, }) recommendations.push( @@ -182,6 +186,7 @@ export function analyzeSnippetEligibility(context: AuditContext): AnalysisResult score = Math.min(score, 60) findings.push({ type: 'info', + code: 'snippet-eligibility.max-snippet.low', message: `Page declares "max-snippet:${directives.maxSnippet}" — Google can use at most ${directives.maxSnippet} characters of preview text, which heavily constrains AI snippets.`, }) recommendations.push( @@ -193,18 +198,21 @@ export function analyzeSnippetEligibility(context: AuditContext): AnalysisResult if (directives.noarchive) { findings.push({ type: 'info', + code: 'snippet-eligibility.noarchive.present', message: 'Page declares "noarchive" — Google won\'t show a cached copy, but this does not block AI features. Safe to keep if intentional.', }) } if (directives.noimageindex) { findings.push({ type: 'info', + code: 'snippet-eligibility.noimageindex.present', message: 'Page declares "noimageindex" — images on this page won\'t be indexed. This does not block AI text features.', }) } if (findings.length === 0) { findings.push({ type: 'found', + code: 'snippet-eligibility.directives.not-restrictive', message: `Indexing directives present but not restrictive: "${directives.raw}".`, }) } diff --git a/src/analyzers/structured-data.ts b/src/analyzers/structured-data.ts index 3e49afe..5e37c47 100644 --- a/src/analyzers/structured-data.ts +++ b/src/analyzers/structured-data.ts @@ -21,9 +21,9 @@ export function analyzeStructuredData(context: AuditContext): AnalysisResult { if (structuredData.length > 0) { score += 30 - findings.push({ type: 'found', message: `Detected ${structuredData.length} JSON-LD block(s).` }) + findings.push({ type: 'found', code: 'structured-data.json-ld.found', message: `Detected ${structuredData.length} JSON-LD block(s).` }) } else { - findings.push({ type: 'missing', message: 'No JSON-LD structured data found.' }) + findings.push({ type: 'missing', code: 'structured-data.json-ld.missing', message: 'No JSON-LD structured data found.' }) // Issue #33: recommend schemas that fit the detected site category instead // of always suggesting LocalBusiness/Service (which is wrong for SaaS, // dev tools, blogs, e-commerce, etc.). @@ -33,9 +33,9 @@ export function analyzeStructuredData(context: AuditContext): AnalysisResult { for (const type of PRIORITY_TYPES) { if (schemaTypes.has(type)) { score += 12 - findings.push({ type: 'found', message: `${type} schema detected.` }) + findings.push({ type: 'found', code: 'structured-data.schema.found', message: `${type} schema detected.` }) } else { - findings.push({ type: 'missing', message: `${type} schema not found.` }) + findings.push({ type: 'missing', code: 'structured-data.schema.missing', message: `${type} schema not found.` }) } } @@ -45,13 +45,13 @@ export function analyzeStructuredData(context: AuditContext): AnalysisResult { if (avgProperties >= 8) { score += 22 - findings.push({ type: 'found', message: 'Structured data has strong property depth.' }) + findings.push({ type: 'found', code: 'structured-data.schema-depth.strong', message: 'Structured data has strong property depth.' }) } else if (avgProperties >= 4) { score += 12 - findings.push({ type: 'info', message: 'Structured data exists but could be more detailed.' }) + findings.push({ type: 'info', code: 'structured-data.schema-depth.moderate', message: 'Structured data exists but could be more detailed.' }) recommendations.push('Expand schema properties (contact, areaServed, sameAs, etc.).') } else if (structuredData.length) { - findings.push({ type: 'info', message: 'Structured data appears shallow.' }) + findings.push({ type: 'info', code: 'structured-data.schema-depth.low', message: 'Structured data appears shallow.' }) recommendations.push('Increase schema completeness with richer properties.') } diff --git a/src/analyzers/technical-seo.ts b/src/analyzers/technical-seo.ts index dd17256..d6bbbda 100644 --- a/src/analyzers/technical-seo.ts +++ b/src/analyzers/technical-seo.ts @@ -13,13 +13,13 @@ export function analyzeTechnicalSeo(context: AuditContext): AnalysisResult { if (h1Count === 1) { score += 40 const h1Text = context.$(h1Elements[0]).text().trim() - findings.push({ type: 'found', message: `One H1 found: "${h1Text.slice(0, 80)}${h1Text.length > 80 ? '…' : ''}"` }) + findings.push({ type: 'found', code: 'technical-seo.h1.single', message: `One H1 found: "${h1Text.slice(0, 80)}${h1Text.length > 80 ? '…' : ''}"` }) } else if (h1Count === 0) { - findings.push({ type: 'missing', message: 'No H1 tag found. AI models and search engines use the H1 as the primary page topic signal.' }) + findings.push({ type: 'missing', code: 'technical-seo.h1.missing', message: 'No H1 tag found. AI models and search engines use the H1 as the primary page topic signal.' }) recommendations.push('Add exactly one H1 tag that clearly states the page topic or primary keyword.') } else { score += 20 - findings.push({ type: 'info', message: `${h1Count} H1 tags found. Pages should have exactly one H1 for clear topic signaling.` }) + findings.push({ type: 'info', code: 'technical-seo.h1.multiple', message: `${h1Count} H1 tags found. Pages should have exactly one H1 for clear topic signaling.` }) recommendations.push(`Consolidate to a single H1. Currently ${h1Count} H1 tags are present.`) } @@ -29,7 +29,7 @@ export function analyzeTechnicalSeo(context: AuditContext): AnalysisResult { if (totalImages === 0) { score += 30 - findings.push({ type: 'info', message: 'No images found on this page.' }) + findings.push({ type: 'info', code: 'technical-seo.alt-text.none', message: 'No images found on this page.' }) } else { const missingAlt: string[] = [] const emptyAlt: string[] = [] @@ -49,7 +49,7 @@ export function analyzeTechnicalSeo(context: AuditContext): AnalysisResult { if (problematic === 0) { score += 30 - findings.push({ type: 'found', message: `All ${totalImages} image(s) have descriptive alt text.` }) + findings.push({ type: 'found', code: 'technical-seo.alt-text.ok', message: `All ${totalImages} image(s) have descriptive alt text.` }) } else { const ratio = covered / totalImages score += Math.round(ratio * 30) @@ -58,6 +58,7 @@ export function analyzeTechnicalSeo(context: AuditContext): AnalysisResult { const preview = missingAlt.slice(0, 3).map((s) => s.split('/').pop() || s).join(', ') findings.push({ type: 'missing', + code: 'technical-seo.alt-text.missing', message: `${missingAlt.length} image(s) missing alt attribute entirely: ${preview}${missingAlt.length > 3 ? ` (+${missingAlt.length - 3} more)` : ''}.`, }) } @@ -66,6 +67,7 @@ export function analyzeTechnicalSeo(context: AuditContext): AnalysisResult { const preview = emptyAlt.slice(0, 3).map((s) => s.split('/').pop() || s).join(', ') findings.push({ type: 'info', + code: 'technical-seo.alt-text.empty', message: `${emptyAlt.length} image(s) have empty alt="" (acceptable for decorative images, but verify): ${preview}${emptyAlt.length > 3 ? ` (+${emptyAlt.length - 3} more)` : ''}.`, }) } @@ -87,12 +89,13 @@ export function analyzeTechnicalSeo(context: AuditContext): AnalysisResult { const metaDesc = context.$('meta[name="description"]').attr('content')?.trim() ?? '' if (!metaDesc) { - findings.push({ type: 'missing', message: 'No meta description found.' }) + findings.push({ type: 'missing', code: 'technical-seo.meta-description.missing', message: 'No meta description found.' }) recommendations.push('Add a meta description (150–160 characters) summarising the page. Short or missing descriptions reduce click-through rates and give AI crawlers less context about the page.') } else if (metaDesc.length < 120) { score += 8 findings.push({ type: 'info', + code: 'technical-seo.meta-description.short', message: `Meta description is too short (${metaDesc.length} chars; target 150–160): "${metaDesc}"`, }) recommendations.push( @@ -100,11 +103,11 @@ export function analyzeTechnicalSeo(context: AuditContext): AnalysisResult { ) } else if (metaDesc.length > 160) { score += 12 - findings.push({ type: 'info', message: `Meta description is long (${metaDesc.length} chars) and may be truncated in search results.` }) + findings.push({ type: 'info', code: 'technical-seo.meta-description.long', message: `Meta description is long (${metaDesc.length} chars) and may be truncated in search results.` }) recommendations.push('Trim the meta description to 150–160 characters so it isn\'t truncated in search snippets.') } else { score += 20 - findings.push({ type: 'found', message: `Meta description present (${metaDesc.length} chars).` }) + findings.push({ type: 'found', code: 'technical-seo.meta-description.present', message: `Meta description present (${metaDesc.length} chars).` }) } // ── Canonical tag ───────────────────────────────────────────────────────── @@ -112,11 +115,11 @@ export function analyzeTechnicalSeo(context: AuditContext): AnalysisResult { const canonicalHref = canonicalEl.attr('href')?.trim() ?? '' if (!canonicalHref) { - findings.push({ type: 'missing', message: 'No canonical tag found. Without a canonical, duplicate content issues can dilute crawl signals.' }) + findings.push({ type: 'missing', code: 'technical-seo.canonical.missing', message: 'No canonical tag found. Without a canonical, duplicate content issues can dilute crawl signals.' }) recommendations.push('Add <link rel="canonical" href="<page-url>"> to prevent duplicate content issues.') } else { score += 10 - findings.push({ type: 'found', message: `Canonical tag present: "${canonicalHref}"` }) + findings.push({ type: 'found', code: 'technical-seo.canonical.present', message: `Canonical tag present: "${canonicalHref}"` }) } return { diff --git a/src/cli.ts b/src/cli.ts index e0d4c8c..c581477 100644 --- a/src/cli.ts +++ b/src/cli.ts @@ -23,6 +23,7 @@ import { formatSitemapText, formatText, } from './formatters/text.js' +import { formatAgent, formatSitemapAgent } from './formatters/agent.js' import type { BatchPlatformDetectionReport, PlatformConfidence, @@ -33,28 +34,35 @@ import type { SitemapPageResult, } from './types.js' +// `agent` is the slim machine-readable decision (score, pass gate, ranked fixes) +// for audits. Platform-detection output has no decision list, so there `agent` +// falls back to the already-structured JSON. const FORMATTERS = { json: formatJson, markdown: formatMarkdown, text: formatText, + agent: formatAgent, } const SITEMAP_FORMATTERS = { json: (report: SitemapAuditReport, _topIssuesOnly: boolean) => formatSitemapJson(report), markdown: (report: SitemapAuditReport, topIssuesOnly: boolean) => formatSitemapMarkdown(report, topIssuesOnly), text: (report: SitemapAuditReport, topIssuesOnly: boolean) => formatSitemapText(report, topIssuesOnly), + agent: (report: SitemapAuditReport, _topIssuesOnly: boolean) => formatSitemapAgent(report), } const PLATFORM_FORMATTERS = { json: (report: PlatformDetectionReport) => formatPlatformJson(report), markdown: (report: PlatformDetectionReport) => formatPlatformMarkdown(report), text: (report: PlatformDetectionReport) => formatPlatformText(report), + agent: (report: PlatformDetectionReport) => formatPlatformJson(report), } const BATCH_PLATFORM_FORMATTERS = { json: (report: BatchPlatformDetectionReport) => formatBatchPlatformJson(report), markdown: (report: BatchPlatformDetectionReport) => formatBatchPlatformMarkdown(report), text: (report: BatchPlatformDetectionReport) => formatBatchPlatformText(report), + agent: (report: BatchPlatformDetectionReport) => formatBatchPlatformJson(report), } type FormatterName = keyof typeof FORMATTERS @@ -177,9 +185,9 @@ export function hasMissingMetaDescription(factors: ScoredFactor[] | undefined): if (!factors) return false const tech = factors.find((f) => f.id === 'technical-seo') if (!tech) return false - return tech.findings.some( - (f) => f.type === 'missing' && f.message.startsWith('No meta description found'), - ) + // Key on the stable finding code rather than the message prefix — that's the + // whole point of finding codes: gates don't break when copy changes. + return tech.findings.some((f) => f.code === 'technical-seo.meta-description.missing') } export function parseUrlList(text: string): string[] { @@ -251,7 +259,10 @@ Pass a URL to audit a live site, or a filesystem path (a .html file or a directory of built HTML, e.g. ./out) to audit static output offline. Options: - --format <type> Output format: text (default), json, markdown + --format <type> Output format: text (default), json, markdown, agent. + 'agent' emits a slim JSON decision (score, pass gate, + criticalDefectCount, ranked issues[]) for AI agents — + none of the per-factor/per-page detail. --factors <list> Comma-separated factor IDs to run (runs all if omitted) --include-geo Include optional geographic signals factor --include-agent-skills Include optional agent skill exposure factor (Schema.org Action, MCP, form affordances) @@ -264,7 +275,8 @@ Options: explicit URL to override. Pages are fetched with bounded concurrency (5). --limit <n> Max pages to audit in sitemap mode (default 200, sorted by sitemap priority). When the sitemap exceeds the limit, a notice is printed to stderr. - --top-issues In sitemap mode, skip per-page output and show only cross-cutting issues + --top-issues In sitemap mode, skip per-page output and show only the cross-cutting + issues and critical defects --detect-platform Detect what platform/CMS/framework the site is built on (WordPress, Webflow, Shopify, Next.js, etc.) instead of running a full audit. --urls <src> In --detect-platform mode, run on multiple URLs. <src> can be a path @@ -292,6 +304,8 @@ Options: Examples: aeo-audit https://example.com aeo-audit https://example.com --format json + aeo-audit https://example.com --format agent + aeo-audit https://example.com --sitemap --format agent aeo-audit https://example.com --factors structured-data,faq-content aeo-audit https://example.com --factors schema-validity aeo-audit https://example.com --include-geo @@ -334,7 +348,7 @@ export async function main(argv: string[] = process.argv): Promise<number> { } if (!isFormatterName(args.format)) { - console.error(`Error: Unknown format "${args.format}". Use: text, json, markdown`) + console.error(`Error: Unknown format "${args.format}". Use: text, json, markdown, agent`) return 1 } diff --git a/src/critical-defects.ts b/src/critical-defects.ts new file mode 100644 index 0000000..73af59f --- /dev/null +++ b/src/critical-defects.ts @@ -0,0 +1,148 @@ +import type { + AuditContext, + AuditReport, + CriticalDefect, + CriticalDefectGroup, + CriticalDefectId, + CriticalDefectSeverity, +} from './types.js' + +/** Human-readable labels for each defect, used in rollups and formatters. */ +const DEFECT_TITLES: Record<CriticalDefectId, string> = { + 'missing-h1': 'Missing H1', + 'multiple-h1': 'Multiple H1 tags', + 'missing-title': 'Missing <title>', + 'missing-meta-description': 'Missing meta description', +} + +const SEVERITY_RANK: Record<CriticalDefectSeverity, number> = { + critical: 0, + warning: 1, +} + +/** + * Detect binary structural defects on a single page straight from the DOM. + * + * These are deliberately independent of the weighted factor scores. The technical + * factors already fold an H1-count or meta-description check into a bundled score + * that can read "healthy" (issue #42) even when one sub-check fails; here each + * defect is an unambiguous, one-line-fixable yes/no, so it can be surfaced on its + * own merits regardless of how the surrounding factor happened to average out. + */ +export function detectCriticalDefects(context: AuditContext): CriticalDefect[] { + const defects: CriticalDefect[] = [] + + const h1Count = context.$('h1').length + if (h1Count === 0) { + defects.push({ + id: 'missing-h1', + severity: 'critical', + detail: 'No H1 tag — AI models use the H1 as the primary page-topic signal.', + recommendation: 'Add exactly one H1 that clearly states the page topic.', + }) + } else if (h1Count > 1) { + defects.push({ + id: 'multiple-h1', + severity: 'critical', + detail: `${h1Count} H1 tags found (expected exactly one).`, + recommendation: `Consolidate to a single H1; ${h1Count} are present.`, + }) + } + + if (!context.pageTitle) { + defects.push({ + id: 'missing-title', + severity: 'critical', + detail: 'No <title> element — search and AI snippets have no canonical page name to use.', + recommendation: 'Add a concise <title> that names the page.', + }) + } + + const metaDesc = context.$('meta[name="description"]').attr('content')?.trim() ?? '' + if (!metaDesc) { + defects.push({ + id: 'missing-meta-description', + severity: 'warning', + detail: 'No meta description.', + recommendation: 'Add a meta description (150–160 characters) summarising the page.', + }) + } + + return defects +} + +/** A URL is the homepage when its path is the site root and it carries no query. */ +export function isHomepageUrl(url: string): boolean { + try { + const parsed = new URL(url) + return (parsed.pathname === '/' || parsed.pathname === '') && parsed.search === '' + } catch { + return false + } +} + +// Sitemaps without an explicit <priority> default to 0.5 per the protocol, so we +// treat an absent priority the same way when ranking. +const effectivePriority = (priority: number | undefined): number => priority ?? 0.5 + +/** + * Roll per-page critical defects up across a sitemap/static run, grouped by + * defect. Pages within a group are ordered by importance (homepage first, then + * sitemap priority); groups are ordered by severity, then by whether they hit an + * important page — so the homepage's broken H1 leads even at 1-of-25 prevalence, + * which is exactly the case the prevalence-based ranking buries. + * + * `priorityByUrl` maps a page's final URL to its sitemap `<priority>`. It is + * optional: static-output mode has no sitemap priorities, and homepage detection + * (from the URL path) still works without it. + */ +export function buildCriticalDefects( + successPages: AuditReport[], + priorityByUrl: Map<string, number | undefined> = new Map(), +): CriticalDefectGroup[] { + const groups = new Map<CriticalDefectId, CriticalDefectGroup>() + + for (const page of successPages) { + for (const defect of page.criticalDefects ?? []) { + let group = groups.get(defect.id) + if (!group) { + group = { + id: defect.id, + severity: defect.severity, + title: DEFECT_TITLES[defect.id], + recommendation: defect.recommendation, + pages: [], + } + groups.set(defect.id, group) + } + group.pages.push({ + url: page.finalUrl, + detail: defect.detail, + isHomepage: isHomepageUrl(page.finalUrl), + priority: priorityByUrl.get(page.finalUrl), + }) + } + } + + for (const group of groups.values()) { + group.pages.sort( + (a, b) => + Number(b.isHomepage) - Number(a.isHomepage) || + effectivePriority(b.priority) - effectivePriority(a.priority) || + a.url.localeCompare(b.url), + ) + } + + const hasHomepage = (g: CriticalDefectGroup): number => (g.pages.some((p) => p.isHomepage) ? 1 : 0) + const maxPriority = (g: CriticalDefectGroup): number => + g.pages.reduce((max, p) => Math.max(max, effectivePriority(p.priority)), 0) + + return [...groups.values()].sort( + (a, b) => + SEVERITY_RANK[a.severity] - SEVERITY_RANK[b.severity] || + hasHomepage(b) - hasHomepage(a) || + maxPriority(b) - maxPriority(a) || + b.pages.length - a.pages.length || + a.title.localeCompare(b.title), + ) +} diff --git a/src/formatters/agent.ts b/src/formatters/agent.ts new file mode 100644 index 0000000..4a704cc --- /dev/null +++ b/src/formatters/agent.ts @@ -0,0 +1,15 @@ +import { agentSummaryFromAudit, agentSummaryFromSitemap } from '../agent-summary.js' +import type { AuditReport, SitemapAuditReport } from '../types.js' + +/** + * `--format agent`: emit the pre-computed decision (score, pass gate, critical + * defect count, ranked fix list) as JSON, omitting the full per-factor and + * per-page detail an agent would otherwise have to average and re-rank itself. + */ +export function formatAgent(report: AuditReport): string { + return JSON.stringify(agentSummaryFromAudit(report), null, 2) +} + +export function formatSitemapAgent(report: SitemapAuditReport): string { + return JSON.stringify(agentSummaryFromSitemap(report), null, 2) +} diff --git a/src/formatters/markdown.ts b/src/formatters/markdown.ts index 303c974..6e80f2e 100644 --- a/src/formatters/markdown.ts +++ b/src/formatters/markdown.ts @@ -1,3 +1,4 @@ +import { isHomepageUrl } from '../critical-defects.js' import type { AuditReport, BatchDetectionEntry, @@ -97,6 +98,27 @@ export function formatSitemapMarkdown(report: SitemapAuditReport, topIssuesOnly lines.push(``) } + if (report.criticalDefects.length > 0) { + lines.push(`## Critical Defects`) + lines.push(``) + lines.push(`High-impact, binary structural defects — surfaced regardless of how few pages they affect.`) + lines.push(``) + + for (const group of report.criticalDefects) { + const count = group.pages.length + lines.push(`### ${group.title} _(${group.severity}, ${count} page${count === 1 ? '' : 's'})_`) + lines.push(``) + lines.push(group.recommendation) + lines.push(``) + // List every affected page — a report must surface all issues, not a sample. + for (const page of group.pages) { + const home = page.isHomepage ? ' **(homepage)**' : '' + lines.push(`- \`${page.url}\`${home} — ${page.detail}`) + } + lines.push(``) + } + } + if (report.crossCuttingIssues.length > 0) { lines.push(`## Cross-Cutting Issues`) lines.push(``) @@ -130,10 +152,18 @@ export function formatSitemapMarkdown(report: SitemapAuditReport, topIssuesOnly } if (report.prioritizedFixes.length > 0) { - lines.push(`## Prioritized Fixes (by site-wide impact)`) + lines.push(`## Prioritized Fixes (critical defects first, then site-wide impact)`) lines.push(``) for (let i = 0; i < report.prioritizedFixes.length; i++) { - lines.push(`${i + 1}. ${report.prioritizedFixes[i]}`) + const fix = report.prioritizedFixes[i] + const tag = fix.severity ? `**[${fix.severity}]** ` : '' + const grade = fix.avgGrade ? ` (avg ${fix.avgGrade})` : '' + lines.push(`${i + 1}. ${tag}**${fix.title}**${grade} _(${fix.prevalencePct}% of pages)_ — ${fix.recommendation}`) + // Spell out every affected page — agents and humans both need the full set. + for (const url of fix.affectedPages) { + const home = isHomepageUrl(url) ? ' **(homepage)**' : '' + lines.push(` - \`${url}\`${home}`) + } } lines.push(``) } diff --git a/src/formatters/text.ts b/src/formatters/text.ts index 06e37d0..603576a 100644 --- a/src/formatters/text.ts +++ b/src/formatters/text.ts @@ -6,6 +6,7 @@ const YELLOW = '\x1b[33m' const RED = '\x1b[31m' const CYAN = '\x1b[36m' +import { isHomepageUrl } from '../critical-defects.js' import type { AuditReport, BatchDetectionEntry, @@ -118,6 +119,26 @@ export function formatSitemapText(report: SitemapAuditReport, topIssuesOnly = fa lines.push(``) } + if (report.criticalDefects.length > 0) { + lines.push(`${BOLD}Critical Defects${RESET} ${DIM}(high-impact, shown regardless of prevalence)${RESET}`) + lines.push(`${'─'.repeat(70)}`) + + for (const group of report.criticalDefects) { + const tag = group.severity === 'critical' ? `${RED}critical${RESET}` : `${YELLOW}warning${RESET}` + const count = group.pages.length + lines.push(` [${tag}] ${BOLD}${group.title}${RESET} ${DIM}(${count} page${count === 1 ? '' : 's'})${RESET}`) + lines.push(` ${DIM}→ ${group.recommendation}${RESET}`) + // List every affected page — a report must surface all issues, not a sample. + for (const page of group.pages) { + const home = page.isHomepage ? ` ${CYAN}(homepage)${RESET}` : '' + lines.push(` ${DIM}- ${page.url}${home}: ${page.detail}${RESET}`) + } + } + + lines.push(`${'─'.repeat(70)}`) + lines.push(``) + } + if (report.crossCuttingIssues.length > 0) { lines.push(`${BOLD}Cross-Cutting Issues${RESET}`) lines.push(`${'─'.repeat(70)}`) @@ -140,9 +161,18 @@ export function formatSitemapText(report: SitemapAuditReport, topIssuesOnly = fa } if (report.prioritizedFixes.length > 0) { - lines.push(`${BOLD}Prioritized Fixes (by site-wide impact)${RESET}`) + lines.push(`${BOLD}Prioritized Fixes (critical defects first, then site-wide impact)${RESET}`) for (let i = 0; i < report.prioritizedFixes.length; i++) { - lines.push(` ${CYAN}${i + 1}.${RESET} ${report.prioritizedFixes[i]}`) + const fix = report.prioritizedFixes[i] + const tag = fix.severity ? `[${fix.severity === 'critical' ? RED : YELLOW}${fix.severity}${RESET}] ` : '' + const grade = fix.avgGrade ? `${DIM} avg ${fix.avgGrade}${RESET}` : '' + lines.push(` ${CYAN}${i + 1}.${RESET} ${tag}${BOLD}${fix.title}${RESET}${grade} ${DIM}(${fix.prevalencePct}% of pages)${RESET}`) + lines.push(` ${DIM}→ ${fix.recommendation}${RESET}`) + // Spell out every affected page — agents and humans both need the full set. + for (const url of fix.affectedPages) { + const home = isHomepageUrl(url) ? ` ${CYAN}(homepage)${RESET}` : '' + lines.push(` ${DIM}- ${url}${home}${RESET}`) + } } lines.push(``) } diff --git a/src/index.ts b/src/index.ts index 161a655..32652be 100644 --- a/src/index.ts +++ b/src/index.ts @@ -21,6 +21,8 @@ import { analyzeSnippetEligibility } from './analyzers/snippet-eligibility.js' import { analyzeAgentSkillExposure } from './analyzers/agent-skill-exposure.js' import { analyzeLighthouse } from './analyzers/lighthouse.js' import { getVisibleText, parseJsonLdScripts, countWords } from './analyzers/helpers.js' +import { detectCriticalDefects } from './critical-defects.js' +import { SCHEMA_VERSION } from './schema.js' import { FACTOR_DEFINITIONS, OPTIONAL_FACTOR_DEFINITIONS, scoreFactors } from './scoring.js' import type { Analyzer, @@ -34,10 +36,22 @@ import type { export { runSitemapAudit } from './sitemap.js' export { runStaticAudit } from './static-audit.js' +export { detectCriticalDefects, buildCriticalDefects } from './critical-defects.js' +export { agentSummaryFromAudit, agentSummaryFromSitemap } from './agent-summary.js' +export { SCHEMA_VERSION } from './schema.js' export { detectPlatform, detectPlatformBatch } from './detect-platform.js' export { SPEC_RULES, FACTOR_SPEC_RULES, SPEC_SITE, specCitation } from './spec-references.js' export type { SpecRule, SpecRuleId, SpecStatus } from './spec-references.js' export type { SitemapAuditReport, SitemapAuditOptions } from './types.js' +export type { + AgentSummary, + CriticalDefect, + CriticalDefectAffectedPage, + CriticalDefectGroup, + CriticalDefectId, + CriticalDefectSeverity, + PrioritizedFix, +} from './types.js' export type { StaticAuditOptions, StaticAuditResult } from './static-audit.js' export type { BatchDetectionEntry, @@ -175,6 +189,7 @@ export async function auditHtmlPage(page: AuditHtmlPageInput, options: RunAeoAud const { overallScore, overallGrade, factors } = scoreFactors(rawFactorResults) return { + schemaVersion: SCHEMA_VERSION, url: page.inputUrl, finalUrl: page.finalUrl, auditedAt: new Date().toISOString(), @@ -182,6 +197,7 @@ export async function auditHtmlPage(page: AuditHtmlPageInput, options: RunAeoAud overallGrade, summary: buildSummary(factors, overallGrade), factors, + criticalDefects: detectCriticalDefects(context), metadata: { fetchTimeMs: page.fetchTimeMs, pageTitle: context.pageTitle, diff --git a/src/schema.ts b/src/schema.ts new file mode 100644 index 0000000..1d31ce3 --- /dev/null +++ b/src/schema.ts @@ -0,0 +1,9 @@ +/** + * Version of the report JSON shape (`AuditReport` / `SitemapAuditReport`), + * independent of the npm package version so agents can pin to a shape rather than + * a release. Bump the minor for additive fields, the major for breaking changes. + * + * Lives in its own module (not `index.ts`) so report builders can read it without + * importing the audit entry points — which test suites routinely mock. + */ +export const SCHEMA_VERSION = '1.1' diff --git a/src/sitemap.ts b/src/sitemap.ts index bac4c58..98ba5a2 100644 --- a/src/sitemap.ts +++ b/src/sitemap.ts @@ -1,10 +1,14 @@ import { AeoAuditError } from './errors.js' +import { buildCriticalDefects, isHomepageUrl } from './critical-defects.js' import { normalizeTargetUrl } from './fetch-page.js' import { runAeoAudit } from './index.js' +import { SCHEMA_VERSION } from './schema.js' import { scoreToGrade } from './scoring.js' import type { AuditReport, + CriticalDefectGroup, CrossCuttingIssue, + PrioritizedFix, RunAeoAuditOptions, SitemapAuditOptions, SitemapAuditReport, @@ -322,14 +326,65 @@ function buildCrossCuttingIssues(successPages: AuditReport[]): CrossCuttingIssue return issues } -function buildPrioritizedFixes(issues: CrossCuttingIssue[], totalPages: number): string[] { - return issues - .slice(0, 5) - .map((issue) => { - const pct = Math.round((issue.affectedPages / totalPages) * 100) - const rec = issue.topRecommendations[0] || 'Review and improve this factor.' - return `${issue.factorName} (avg ${issue.avgGrade}, affects ${pct}% of pages): ${rec}` +function buildPrioritizedFixes( + issues: CrossCuttingIssue[], + totalPages: number, + criticalDefects: CriticalDefectGroup[] = [], +): PrioritizedFix[] { + const pct = (n: number): number => (totalPages > 0 ? Math.round((n / totalPages) * 100) : 0) + + // Lead with high-impact binary defects (issue #42). These are excluded from the + // prevalence ranking below because they typically hit only one or two pages, but + // they're unambiguous and one-line-fixable, so they belong at the top. Only + // critical-severity defects are promoted; warnings (e.g. a missing meta + // description) already flow into the prevalence ranking via factor recommendations. + const criticalFixes: PrioritizedFix[] = criticalDefects + .filter((group) => group.severity === 'critical') + .map((group): PrioritizedFix => { + const affectedPages = group.pages.map((p) => p.url) + const affectsHomepage = group.pages.some((p) => p.isHomepage) + const count = affectedPages.length + return { + kind: 'critical-defect', + id: group.id, + title: group.title, + recommendation: group.recommendation, + severity: group.severity, + affectedPages, + affectsHomepage, + prevalencePct: pct(count), + summary: `${group.title} (${group.severity}) — ${count} page${count === 1 ? '' : 's'}${affectsHomepage ? ', incl. homepage' : ''}: ${group.recommendation}`, + } }) + + // Report every cross-cutting issue, ordered by prevalence — not a top-N slice. + // A fix the report computed must reach the report; truncating the tail silently + // drops real issues a consumer reading only this section would never see. + const crossCuttingFixes: PrioritizedFix[] = issues.map((issue): PrioritizedFix => { + const top = issue.topIssues[0] + const recommendation = issue.topRecommendations[0] ?? top?.recommendation ?? 'Review and improve this factor.' + // Union every recommendation's pages — not just the top one's — so reach, + // prevalence, and the homepage flag describe the whole factor, which is what + // the entry is identified by (factorId / factorName). Sorted homepage-first. + const affectedPages = [...new Set(issue.topIssues.flatMap((d) => d.affectedUrls))].sort( + (a, b) => Number(isHomepageUrl(b)) - Number(isHomepageUrl(a)) || a.localeCompare(b), + ) + const affectsHomepage = affectedPages.some(isHomepageUrl) + const count = affectedPages.length + return { + kind: 'cross-cutting', + id: issue.factorId, + title: issue.factorName, + recommendation, + affectedPages, + affectsHomepage, + prevalencePct: pct(count), + avgGrade: issue.avgGrade, + summary: `${issue.factorName} (avg ${issue.avgGrade}) — ${count} page${count === 1 ? '' : 's'}: ${recommendation}`, + } + }) + + return [...criticalFixes, ...crossCuttingFixes] } export async function runSitemapAudit(rawUrl: string, options: SitemapAuditOptions = {}): Promise<SitemapAuditReport> { @@ -430,6 +485,7 @@ export async function runSitemapAudit(rawUrl: string, options: SitemapAuditOptio status: 'success', factors: report.factors, metadata: report.metadata, + priority: entry.priority, }, report, } @@ -442,6 +498,7 @@ export async function runSitemapAudit(rawUrl: string, options: SitemapAuditOptio overallGrade: 'F', status: 'error', error: message, + priority: entry.priority, }, report: null, } @@ -460,10 +517,19 @@ export async function runSitemapAudit(rawUrl: string, options: SitemapAuditOptio ? Math.round(successScores.reduce((a, b) => a + b, 0) / successScores.length) : 0 + // Map each successful page's final URL to its sitemap priority so the critical + // defect rollup can rank affected pages by importance (issue #42). + const priorityByUrl = new Map<string, number | undefined>() + for (const page of pageResults) { + if (page.status === 'success') priorityByUrl.set(page.url, page.priority) + } + + const criticalDefects = buildCriticalDefects(successReports, priorityByUrl) const crossCuttingIssues = buildCrossCuttingIssues(successReports) - const prioritizedFixes = buildPrioritizedFixes(crossCuttingIssues, successReports.length) + const prioritizedFixes = buildPrioritizedFixes(crossCuttingIssues, successReports.length, criticalDefects) return { + schemaVersion: SCHEMA_VERSION, sitemapUrl, auditedAt: new Date().toISOString(), pagesDiscovered: discovered, @@ -475,6 +541,7 @@ export async function runSitemapAudit(rawUrl: string, options: SitemapAuditOptio aggregateScore, aggregateGrade: scoreToGrade(aggregateScore), pages: pageResults, + criticalDefects, crossCuttingIssues, prioritizedFixes, } diff --git a/src/static-audit.ts b/src/static-audit.ts index dce2377..13ad777 100644 --- a/src/static-audit.ts +++ b/src/static-audit.ts @@ -3,6 +3,8 @@ import path from 'node:path' import { AeoAuditError } from './errors.js' import { normalizeTargetUrl } from './fetch-page.js' import { auditHtmlPage } from './index.js' +import { buildCriticalDefects } from './critical-defects.js' +import { SCHEMA_VERSION } from './schema.js' import { buildCrossCuttingIssues, buildPrioritizedFixes, mapWithConcurrency } from './sitemap.js' import { scoreToGrade } from './scoring.js' import type { @@ -272,10 +274,14 @@ export async function runStaticAudit(targetPath: string, options: StaticAuditOpt ? Math.round(successScores.reduce((a, b) => a + b, 0) / successScores.length) : 0 + // Static output has no sitemap <priority>, so the rollup ranks by homepage + // (derived from the file path → URL) only — no priority map is passed. + const criticalDefects = buildCriticalDefects(successReports) const crossCuttingIssues = buildCrossCuttingIssues(successReports) - const prioritizedFixes = buildPrioritizedFixes(crossCuttingIssues, successReports.length) + const prioritizedFixes = buildPrioritizedFixes(crossCuttingIssues, successReports.length, criticalDefects) const report: SitemapAuditReport = { + schemaVersion: SCHEMA_VERSION, sitemapUrl: resolved, auditedAt: new Date().toISOString(), pagesDiscovered: discovered, @@ -287,6 +293,7 @@ export async function runStaticAudit(targetPath: string, options: StaticAuditOpt aggregateScore, aggregateGrade: scoreToGrade(aggregateScore), pages: pageResults, + criticalDefects, crossCuttingIssues, prioritizedFixes, } diff --git a/src/types.ts b/src/types.ts index ab2449c..01be217 100644 --- a/src/types.ts +++ b/src/types.ts @@ -4,6 +4,13 @@ export type FindingType = 'found' | 'missing' | 'info' | 'timeout' | 'unreachabl export interface AuditFinding { type: FindingType + /** + * Stable machine code for this finding, namespaced as + * `<factor-id>.<check>[.<variant>]` (e.g. `technical-seo.h1.multiple`). Lets an + * agent key on the specific finding rather than regex-matching `message`. Codes + * are stable across releases; the full registry lives in docs/finding-codes.md. + */ + code: string message: string } @@ -123,7 +130,36 @@ export interface AuditMetadata { redirectChain: RedirectHop[] } +export type CriticalDefectId = + | 'missing-h1' + | 'multiple-h1' + | 'missing-title' + | 'missing-meta-description' + +export type CriticalDefectSeverity = 'critical' | 'warning' + +/** + * A binary, page-level structural defect (issue #42). Unlike the weighted factor + * scores — which bundle many sub-checks and can average a single bad signal away — + * these are detected directly from the DOM and are simply present or not. They are + * surfaced separately so a high-impact defect on one important page (e.g. a + * homepage with four `<h1>`s) is never hidden by low prevalence or a passing grade. + */ +export interface CriticalDefect { + id: CriticalDefectId + severity: CriticalDefectSeverity + /** Page-specific description, e.g. `"4 H1 tags found (expected exactly one)."` */ + detail: string + recommendation: string +} + export interface AuditReport { + /** + * Version of the report's JSON shape, independent of the package version, so an + * agent parser can detect breaking shape drift. Bumps minor for additive fields, + * major for breaking changes. See `SCHEMA_VERSION`. + */ + schemaVersion: string url: string finalUrl: string auditedAt: string @@ -131,6 +167,8 @@ export interface AuditReport { overallGrade: string summary: string factors: ScoredFactor[] + /** Binary structural defects on this page, detected independently of scoring. */ + criticalDefects: CriticalDefect[] metadata: AuditMetadata } @@ -172,6 +210,88 @@ export interface SitemapPageResult { error?: string factors?: ScoredFactor[] metadata?: AuditMetadata + /** Sitemap `<priority>` for this URL, when the sitemap declared one. Absent in static-output mode. */ + priority?: number +} + +export interface CriticalDefectAffectedPage { + url: string + /** Page-specific defect description carried up from the per-page audit. */ + detail: string + /** True when this URL is the site root (`/`). Such pages are ranked first. */ + isHomepage: boolean + /** Sitemap `<priority>` for this URL, when declared. */ + priority?: number +} + +/** + * A single binary defect (issue #42) rolled up across every page that exhibits + * it. Keyed by defect rather than by factor, so the specific actionable — and the + * exact pages it lives on — survives into the top-level report instead of being + * collapsed into a factor average. + */ +export interface CriticalDefectGroup { + id: CriticalDefectId + severity: CriticalDefectSeverity + /** Short human label, e.g. `"Multiple H1 tags"`. */ + title: string + recommendation: string + /** Affected pages, most important first (homepage, then sitemap priority). */ + pages: CriticalDefectAffectedPage[] +} + +/** + * A single ranked, machine-readable fix — the unit of the prioritized to-do list. + * Carries stable identifiers and the complete affected-page set so an agent can + * act on it without parsing prose (issue #42). The ranking puts critical per-page + * defects first, then cross-cutting factor issues by prevalence. + */ +export interface PrioritizedFix { + /** Source of this fix: a binary per-page defect, or a cross-cutting factor issue. */ + kind: 'critical-defect' | 'cross-cutting' + /** Stable machine code: a `CriticalDefectId` (e.g. `"multiple-h1"`) or a factor id (e.g. `"technical-seo"`). */ + id: string + /** Short human label, e.g. `"Multiple H1 tags"` or `"Technical SEO"`. */ + title: string + /** The single highest-priority recommendation to apply for this entry. */ + recommendation: string + /** Severity, for critical-defect fixes. Cross-cutting entries are ranked by prevalence instead. */ + severity?: CriticalDefectSeverity + /** Every page this fix applies to — the complete list, never truncated. */ + affectedPages: string[] + /** Whether any affected page is the site homepage. */ + affectsHomepage: boolean + /** Share of audited pages this fix applies to (0–100). */ + prevalencePct: number + /** Average grade across audited pages for the factor (cross-cutting only). */ + avgGrade?: string + /** Ready-to-display one-line headline (does not inline the page list). */ + summary: string +} + +/** + * The slim, pre-computed decision an agent consumes via `--format agent`: the + * score, the pass/fail gate, and the ranked fix list, with none of the per-factor + * or per-page detail. Same underlying data as the full report, shaped as a + * decision an agent can act on directly instead of re-ranking factor scores. + */ +export interface AgentSummary { + /** Report schema version (see `AuditReport.schemaVersion`). */ + schemaVersion: string + /** Package identity, for consumers aggregating output from multiple tools. */ + tool: string + /** `single` for a one-URL/one-file audit, `sitemap` for a multi-page run. */ + mode: 'single' | 'sitemap' + /** The audited page URL (single) or the sitemap/root URL (multi). */ + url: string + score: number + grade: string + /** True when the score meets the >= 70 gate (the default exit-0 threshold). */ + pass: boolean + /** Number of critical-severity binary defects (e.g. a missing or duplicated H1). */ + criticalDefectCount: number + /** The ranked to-do list: critical defects first, then cross-cutting by prevalence. */ + issues: PrioritizedFix[] } export interface CrossCuttingIssueDetail { @@ -191,6 +311,8 @@ export interface CrossCuttingIssue { } export interface SitemapAuditReport { + /** Version of the report's JSON shape; see `AuditReport.schemaVersion` and `SCHEMA_VERSION`. */ + schemaVersion: string sitemapUrl: string auditedAt: string pagesDiscovered: number @@ -202,8 +324,19 @@ export interface SitemapAuditReport { aggregateScore: number aggregateGrade: string pages: SitemapPageResult[] + /** + * High-impact binary defects surfaced regardless of prevalence (issue #42). + * These do not depend on the prevalence ranking that drives `prioritizedFixes`, + * so a defect on a single important page still appears here. + */ + criticalDefects: CriticalDefectGroup[] crossCuttingIssues: CrossCuttingIssue[] - prioritizedFixes: string[] + /** + * The ranked, machine-readable to-do list: critical per-page defects first, then + * cross-cutting factor issues by prevalence. Each entry carries stable ids and the + * full affected-page set so an agent can act without parsing prose. + */ + prioritizedFixes: PrioritizedFix[] } export interface SitemapAuditPlan { diff --git a/test/agent-summary.test.ts b/test/agent-summary.test.ts new file mode 100644 index 0000000..903d7f7 --- /dev/null +++ b/test/agent-summary.test.ts @@ -0,0 +1,177 @@ +import { describe, it, expect } from 'vitest' + +import { agentSummaryFromAudit, agentSummaryFromSitemap } from '../src/agent-summary.js' +import { formatAgent, formatSitemapAgent } from '../src/formatters/agent.js' +import type { + AuditReport, + CriticalDefect, + CriticalDefectGroup, + PrioritizedFix, + ScoredFactor, + SitemapAuditReport, +} from '../src/types.js' + +function factor(overrides: Partial<ScoredFactor> & { id: string; name: string }): ScoredFactor { + return { + id: overrides.id, + name: overrides.name, + weight: 8, + score: overrides.score ?? 40, + grade: overrides.grade ?? 'F', + status: overrides.status ?? 'fail', + findings: overrides.findings ?? [], + recommendations: overrides.recommendations ?? [], + } +} + +function auditReport(overrides: Partial<AuditReport> = {}): AuditReport { + return { + schemaVersion: '1.1', + url: 'https://example.com/', + finalUrl: 'https://example.com/', + auditedAt: '2026-04-18T00:00:00.000Z', + overallScore: 60, + overallGrade: 'D-', + summary: '', + factors: [], + criticalDefects: [], + metadata: { + fetchTimeMs: 0, + pageTitle: '', + wordCount: 0, + auxiliary: { llmsTxt: 'missing', llmsFullTxt: 'missing', robotsTxt: 'missing', sitemapXml: 'missing' }, + redirectChain: [], + }, + ...overrides, + } +} + +const MULTIPLE_H1: CriticalDefect = { + id: 'multiple-h1', + severity: 'critical', + detail: '2 H1 tags found (expected exactly one).', + recommendation: 'Consolidate to a single H1; 2 are present.', +} + +describe('agentSummaryFromAudit', () => { + it('reduces a single-page report to a decision with a ranked issue list', () => { + const report = auditReport({ + criticalDefects: [MULTIPLE_H1], + factors: [factor({ id: 'faq-content', name: 'FAQ Content', score: 40, recommendations: ['Add FAQPage schema.'] })], + }) + const summary = agentSummaryFromAudit(report) + + expect(summary.mode).toBe('single') + expect(summary.url).toBe('https://example.com/') + expect(summary.score).toBe(60) + expect(summary.pass).toBe(false) + expect(summary.criticalDefectCount).toBe(1) + // Critical defect leads, then the cross-cutting factor fix. + expect(summary.issues[0]).toMatchObject({ kind: 'critical-defect', id: 'multiple-h1' }) + expect(summary.issues.some((i) => i.id === 'faq-content')).toBe(true) + }) + + it('reports pass=true and no issues for a clean, passing page', () => { + const report = auditReport({ + overallScore: 92, + overallGrade: 'A', + factors: [factor({ id: 'structured-data', name: 'Structured Data', score: 95, status: 'pass', grade: 'A', recommendations: [] })], + }) + const summary = agentSummaryFromAudit(report) + + expect(summary.pass).toBe(true) + expect(summary.criticalDefectCount).toBe(0) + expect(summary.issues).toEqual([]) + }) +}) + +describe('agentSummaryFromSitemap', () => { + function sitemapReport( + criticalDefects: CriticalDefectGroup[], + prioritizedFixes: PrioritizedFix[], + ): SitemapAuditReport { + return { + schemaVersion: '1.1', + sitemapUrl: 'https://example.com/sitemap.xml', + auditedAt: '2026-04-18T00:00:00.000Z', + pagesDiscovered: 25, + pagesAudited: 25, + pagesSkipped: 0, + pagesFiltered: 0, + pagesTruncated: 0, + effectiveLimit: 200, + aggregateScore: 64, + aggregateGrade: 'D', + pages: [], + criticalDefects, + crossCuttingIssues: [], + prioritizedFixes, + } + } + + it('maps aggregate fields and forwards prioritizedFixes as issues', () => { + const group: CriticalDefectGroup = { + id: 'missing-h1', + severity: 'critical', + title: 'Missing H1', + recommendation: 'Add exactly one H1.', + pages: [{ url: 'https://example.com/contact', detail: 'No H1 tag.', isHomepage: false }], + } + const fix: PrioritizedFix = { + kind: 'critical-defect', + id: 'missing-h1', + title: 'Missing H1', + recommendation: 'Add exactly one H1.', + severity: 'critical', + affectedPages: ['https://example.com/contact'], + affectsHomepage: false, + prevalencePct: 4, + summary: 'Missing H1 (critical) — 1 page: Add exactly one H1.', + } + const summary = agentSummaryFromSitemap(sitemapReport([group], [fix])) + + expect(summary.mode).toBe('sitemap') + expect(summary.url).toBe('https://example.com/sitemap.xml') + expect(summary.score).toBe(64) + expect(summary.pass).toBe(false) + expect(summary.criticalDefectCount).toBe(1) + expect(summary.issues).toEqual([fix]) + }) +}) + +describe('formatAgent / formatSitemapAgent', () => { + it('emits valid JSON with the decision keys and none of the heavy detail', () => { + const parsed = JSON.parse(formatAgent(auditReport({ factors: [factor({ id: 'x', name: 'X' })] }))) + expect(Object.keys(parsed).sort()).toEqual( + ['criticalDefectCount', 'grade', 'issues', 'mode', 'pass', 'schemaVersion', 'score', 'tool', 'url'].sort(), + ) + // The point of agent mode: no 27 pages of factor/page detail. + expect(parsed.factors).toBeUndefined() + expect(parsed.pages).toBeUndefined() + expect(parsed.tool).toBe('@ainyc/aeo-audit') + }) + + it('formatSitemapAgent emits valid JSON', () => { + const report: SitemapAuditReport = { + schemaVersion: '1.1', + sitemapUrl: 'https://example.com/sitemap.xml', + auditedAt: '2026-04-18T00:00:00.000Z', + pagesDiscovered: 1, + pagesAudited: 1, + pagesSkipped: 0, + pagesFiltered: 0, + pagesTruncated: 0, + effectiveLimit: 200, + aggregateScore: 80, + aggregateGrade: 'B-', + pages: [], + criticalDefects: [], + crossCuttingIssues: [], + prioritizedFixes: [], + } + const parsed = JSON.parse(formatSitemapAgent(report)) + expect(parsed.mode).toBe('sitemap') + expect(parsed.pass).toBe(true) + expect(parsed.pages).toBeUndefined() + }) +}) diff --git a/test/cli-require-meta.test.ts b/test/cli-require-meta.test.ts index 78d505f..2152da2 100644 --- a/test/cli-require-meta.test.ts +++ b/test/cli-require-meta.test.ts @@ -43,7 +43,9 @@ describe('hasMissingMetaDescription', () => { it('returns true when technical-seo has a missing-meta-description finding', () => { expect( hasMissingMetaDescription([ - technicalSeoFactor([{ type: 'missing', message: 'No meta description found.' }]), + technicalSeoFactor([ + { type: 'missing', code: 'technical-seo.meta-description.missing', message: 'No meta description found.' }, + ]), ]), ).toBe(true) }) @@ -52,7 +54,7 @@ describe('hasMissingMetaDescription', () => { expect( hasMissingMetaDescription([ technicalSeoFactor([ - { type: 'found', message: 'Meta description present (152 chars).' }, + { type: 'found', code: 'technical-seo.meta-description.present', message: 'Meta description present (152 chars).' }, ]), ]), ).toBe(false) @@ -64,6 +66,7 @@ describe('hasMissingMetaDescription', () => { technicalSeoFactor([ { type: 'info', + code: 'technical-seo.meta-description.short', message: 'Meta description is too short (90 chars; target 150–160): "..."', }, ]), @@ -71,11 +74,11 @@ describe('hasMissingMetaDescription', () => { ).toBe(false) }) - it('returns false when finding type is missing but unrelated message', () => { + it('returns false when a different finding is missing but the meta-description code is absent', () => { expect( hasMissingMetaDescription([ technicalSeoFactor([ - { type: 'missing', message: 'No canonical tag found.' }, + { type: 'missing', code: 'technical-seo.canonical.missing', message: 'No canonical tag found.' }, ]), ]), ).toBe(false) diff --git a/test/critical-defects.test.ts b/test/critical-defects.test.ts new file mode 100644 index 0000000..f17a267 --- /dev/null +++ b/test/critical-defects.test.ts @@ -0,0 +1,363 @@ +import { describe, it, expect } from 'vitest' +import { load } from 'cheerio' + +import { buildCriticalDefects, detectCriticalDefects, isHomepageUrl } from '../src/critical-defects.js' +import { buildPrioritizedFixes } from '../src/sitemap.js' +import { formatSitemapMarkdown } from '../src/formatters/markdown.js' +import { formatSitemapText } from '../src/formatters/text.js' +import { getVisibleText, parseJsonLdScripts } from '../src/analyzers/helpers.js' +import type { + AuditContext, + AuditReport, + AuxiliaryResources, + CriticalDefect, + CriticalDefectGroup, + CrossCuttingIssue, + PrioritizedFix, + SitemapAuditReport, +} from '../src/types.js' + +function aux(): AuxiliaryResources { + return { + llmsTxt: { state: 'missing', body: '' }, + llmsFullTxt: { state: 'missing', body: '' }, + robotsTxt: { state: 'missing', body: '' }, + sitemapXml: { state: 'missing', body: '' }, + } +} + +function buildContext(html: string): AuditContext { + const $ = load(html) + return { + $, + html, + url: 'https://example.com/', + headers: {}, + auxiliary: aux(), + structuredData: parseJsonLdScripts($), + textContent: getVisibleText($, html), + pageTitle: $('title').first().text().trim(), + } +} + +const HEAD = '<title>Page' + +function report(url: string, criticalDefects: CriticalDefect[]): AuditReport { + return { + schemaVersion: '1.1', + url, + finalUrl: url, + auditedAt: '2026-04-18T00:00:00.000Z', + overallScore: 75, + overallGrade: 'C', + summary: '', + factors: [], + criticalDefects, + metadata: { + fetchTimeMs: 0, + pageTitle: '', + wordCount: 0, + auxiliary: { llmsTxt: 'missing', llmsFullTxt: 'missing', robotsTxt: 'missing', sitemapXml: 'missing' }, + redirectChain: [], + }, + } +} + +const MULTIPLE_H1: CriticalDefect = { + id: 'multiple-h1', + severity: 'critical', + detail: '4 H1 tags found (expected exactly one).', + recommendation: 'Consolidate to a single H1; 4 are present.', +} +const MISSING_H1: CriticalDefect = { + id: 'missing-h1', + severity: 'critical', + detail: 'No H1 tag.', + recommendation: 'Add exactly one H1.', +} +const MISSING_META: CriticalDefect = { + id: 'missing-meta-description', + severity: 'warning', + detail: 'No meta description.', + recommendation: 'Add a meta description.', +} + +describe('detectCriticalDefects', () => { + it('returns no defects for a structurally healthy page', () => { + const html = `${HEAD}

Topic

` + expect(detectCriticalDefects(buildContext(html))).toEqual([]) + }) + + it('flags a missing H1 as critical', () => { + const html = `${HEAD}

No heading.

` + const defects = detectCriticalDefects(buildContext(html)) + const h1 = defects.find((d) => d.id === 'missing-h1') + expect(h1).toBeDefined() + expect(h1?.severity).toBe('critical') + }) + + it('flags multiple H1 tags as critical and reports the count', () => { + const html = `${HEAD}

A

B

C

D

` + const defects = detectCriticalDefects(buildContext(html)) + const h1 = defects.find((d) => d.id === 'multiple-h1') + expect(h1).toBeDefined() + expect(h1?.severity).toBe('critical') + expect(h1?.detail).toContain('4 H1 tags') + }) + + it('does not flag an H1 defect when exactly one H1 is present', () => { + const html = `${HEAD}

Only one

` + const defects = detectCriticalDefects(buildContext(html)) + expect(defects.some((d) => d.id === 'missing-h1' || d.id === 'multiple-h1')).toBe(false) + }) + + it('flags a missing as critical', () => { + const html = `<!doctype html><html><head><meta name="description" content="${'x'.repeat(150)}"></head><body><h1>Topic</h1></body></html>` + const defects = detectCriticalDefects(buildContext(html)) + const title = defects.find((d) => d.id === 'missing-title') + expect(title).toBeDefined() + expect(title?.severity).toBe('critical') + }) + + it('flags a missing meta description as a warning', () => { + const html = `<!doctype html><html><head><title>Page

Topic

` + const defects = detectCriticalDefects(buildContext(html)) + const meta = defects.find((d) => d.id === 'missing-meta-description') + expect(meta).toBeDefined() + expect(meta?.severity).toBe('warning') + }) + + it('detects several defects on one page', () => { + const html = `

nothing

` + const ids = detectCriticalDefects(buildContext(html)).map((d) => d.id).sort() + expect(ids).toEqual(['missing-h1', 'missing-meta-description', 'missing-title']) + }) +}) + +describe('isHomepageUrl', () => { + it('treats the bare origin as the homepage', () => { + expect(isHomepageUrl('https://example.com/')).toBe(true) + expect(isHomepageUrl('https://example.com')).toBe(true) + }) + + it('rejects sub-paths and query strings', () => { + expect(isHomepageUrl('https://example.com/contact-us')).toBe(false) + expect(isHomepageUrl('https://example.com/?utm=1')).toBe(false) + }) + + it('returns false for unparseable input', () => { + expect(isHomepageUrl('not a url')).toBe(false) + }) +}) + +describe('buildCriticalDefects', () => { + it('returns no groups when no page has a defect', () => { + expect(buildCriticalDefects([report('https://example.com/', [])])).toEqual([]) + }) + + it('groups the same defect across pages and names each page', () => { + const pages = [ + report('https://example.com/a', [MISSING_H1]), + report('https://example.com/b', [MISSING_H1]), + ] + const groups = buildCriticalDefects(pages) + expect(groups).toHaveLength(1) + expect(groups[0].id).toBe('missing-h1') + expect(groups[0].pages.map((p) => p.url)).toEqual(['https://example.com/a', 'https://example.com/b']) + }) + + it('surfaces a single-page defect on the homepage (issue #42 scenario)', () => { + // Homepage has 4 H1s; one deep page is missing its H1. Both are 1-of-N + // prevalence yet must both appear, with the homepage defect ranked first. + const pages = [ + report('https://example.com/contact-us', [MISSING_H1]), + report('https://example.com/', [MULTIPLE_H1]), + ...Array.from({ length: 23 }, (_, i) => report(`https://example.com/p${i}`, [])), + ] + const groups = buildCriticalDefects(pages) + expect(groups.map((g) => g.id)).toEqual(['multiple-h1', 'missing-h1']) + expect(groups[0].pages[0].url).toBe('https://example.com/') + expect(groups[0].pages[0].isHomepage).toBe(true) + }) + + it('ranks the homepage first within a group regardless of input order', () => { + const pages = [ + report('https://example.com/deep', [MISSING_H1]), + report('https://example.com/', [MISSING_H1]), + ] + const groups = buildCriticalDefects(pages) + expect(groups[0].pages[0].url).toBe('https://example.com/') + expect(groups[0].pages[0].isHomepage).toBe(true) + }) + + it('orders pages within a group by sitemap priority when no homepage is involved', () => { + const priorityByUrl = new Map([ + ['https://example.com/low', 0.2], + ['https://example.com/high', 0.9], + ]) + const pages = [ + report('https://example.com/low', [MISSING_H1]), + report('https://example.com/high', [MISSING_H1]), + ] + const groups = buildCriticalDefects(pages, priorityByUrl) + expect(groups[0].pages.map((p) => p.url)).toEqual(['https://example.com/high', 'https://example.com/low']) + expect(groups[0].pages[0].priority).toBe(0.9) + }) + + it('orders critical-severity groups ahead of warnings', () => { + const pages = [ + report('https://example.com/a', [MISSING_META]), + report('https://example.com/b', [MISSING_META, MISSING_H1]), + ] + const groups = buildCriticalDefects(pages) + expect(groups.map((g) => g.severity)).toEqual(['critical', 'warning']) + expect(groups[0].id).toBe('missing-h1') + }) +}) + +describe('buildPrioritizedFixes with critical defects', () => { + function crossCutting(factorName = 'FAQ Content', affectedPages = 20): CrossCuttingIssue { + const rec = `Improve ${factorName}.` + return { + factorId: factorName.toLowerCase().replace(/\s+/g, '-'), + factorName, + avgScore: 40, + avgGrade: 'F', + affectedPages, + totalPages: 25, + topRecommendations: [rec], + topIssues: [{ recommendation: rec, affectedUrls: [] }], + } + } + + it('reports every cross-cutting issue, not just the top five', () => { + const issues = Array.from({ length: 8 }, (_, i) => crossCutting(`Factor ${i}`, 20 - i)) + const fixes = buildPrioritizedFixes(issues, 25, []) + expect(fixes).toHaveLength(8) + for (let i = 0; i < 8; i++) { + expect(fixes.some((f) => f.title === `Factor ${i}`)).toBe(true) + } + }) + + it('returns structured fixes with stable ids and a kind', () => { + const fixes = buildPrioritizedFixes([crossCutting('Technical SEO')], 25, []) + expect(fixes[0]).toMatchObject({ + kind: 'cross-cutting', + id: 'technical-seo', + title: 'Technical SEO', + avgGrade: 'F', + }) + expect(typeof fixes[0].summary).toBe('string') + expect(typeof fixes[0].prevalencePct).toBe('number') + }) + + it('prepends critical-severity defects above the prevalence-ranked fixes', () => { + const defects = buildCriticalDefects([ + report('https://example.com/', [MULTIPLE_H1]), + report('https://example.com/contact-us', [MISSING_H1]), + ]) + const fixes = buildPrioritizedFixes([crossCutting()], 25, defects) + + expect(fixes[0]).toMatchObject({ kind: 'critical-defect', id: 'multiple-h1', severity: 'critical', affectsHomepage: true }) + expect(fixes[0].affectedPages).toContain('https://example.com/') + expect(fixes[1]).toMatchObject({ id: 'missing-h1', affectsHomepage: false }) + // The prevalence-ranked fix still follows the promoted defects. + expect(fixes[fixes.length - 1]).toMatchObject({ kind: 'cross-cutting', title: 'FAQ Content' }) + }) + + it('does not promote warning-severity defects into prioritized fixes', () => { + const defects = buildCriticalDefects([report('https://example.com/a', [MISSING_META])]) + const fixes = buildPrioritizedFixes([crossCutting()], 25, defects) + expect(fixes.every((f) => f.id !== 'missing-meta-description')).toBe(true) + }) + + it('spells out every affected page rather than truncating with a count', () => { + const defects = buildCriticalDefects([ + report('https://example.com/', [MULTIPLE_H1]), + report('https://example.com/x', [MULTIPLE_H1]), + report('https://example.com/y', [MULTIPLE_H1]), + ]) + const fixes = buildPrioritizedFixes([], 25, defects) + expect(fixes[0].affectedPages).toEqual([ + 'https://example.com/', + 'https://example.com/x', + 'https://example.com/y', + ]) + expect(fixes[0].summary).not.toContain('more page') + }) +}) + +describe('formatters list every affected page (no truncation)', () => { + function sitemapReport( + criticalDefects: CriticalDefectGroup[], + prioritizedFixes: PrioritizedFix[] = [], + ): SitemapAuditReport { + return { + schemaVersion: '1.1', + sitemapUrl: 'https://example.com/sitemap.xml', + auditedAt: '2026-04-18T00:00:00.000Z', + pagesDiscovered: 0, + pagesAudited: 0, + pagesSkipped: 0, + pagesFiltered: 0, + pagesTruncated: 0, + effectiveLimit: 200, + aggregateScore: 50, + aggregateGrade: 'F', + pages: [], + criticalDefects, + crossCuttingIssues: [], + prioritizedFixes, + } + } + + // More pages than the old display cap (10) to prove the cap is gone. + const manyPages = Array.from({ length: 14 }, (_, i) => ({ + url: `https://example.com/page-${i}`, + detail: 'No H1 tag.', + isHomepage: false, + })) + const group: CriticalDefectGroup = { + id: 'missing-h1', + severity: 'critical', + title: 'Missing H1', + recommendation: 'Add exactly one H1.', + pages: manyPages, + } + + it('renders all affected pages in text output without a "more pages" elision', () => { + const text = formatSitemapText(sitemapReport([group])) + for (const page of manyPages) expect(text).toContain(page.url) + expect(text).not.toMatch(/more page/i) + }) + + it('renders all affected pages in markdown output without a "more pages" elision', () => { + const md = formatSitemapMarkdown(sitemapReport([group])) + for (const page of manyPages) expect(md).toContain(page.url) + expect(md).not.toMatch(/more page/i) + }) + + const bigFix: PrioritizedFix = { + kind: 'cross-cutting', + id: 'technical-seo', + title: 'Technical SEO', + recommendation: 'Add a meta description.', + affectedPages: manyPages.map((p) => p.url), + affectsHomepage: false, + prevalencePct: 100, + avgGrade: 'F', + summary: 'Technical SEO (avg F) — 14 pages: Add a meta description.', + } + + it('spells out every page of each prioritized fix in text output', () => { + const text = formatSitemapText(sitemapReport([], [bigFix])) + for (const page of manyPages) expect(text).toContain(page.url) + expect(text).not.toMatch(/more page/i) + }) + + it('spells out every page of each prioritized fix in markdown output', () => { + const md = formatSitemapMarkdown(sitemapReport([], [bigFix])) + for (const page of manyPages) expect(md).toContain(page.url) + expect(md).not.toMatch(/more page/i) + }) +}) diff --git a/test/finding-codes.test.ts b/test/finding-codes.test.ts new file mode 100644 index 0000000..584dcf7 --- /dev/null +++ b/test/finding-codes.test.ts @@ -0,0 +1,86 @@ +import { describe, it, expect } from 'vitest' +import { readFileSync } from 'node:fs' + +// Maps each analyzer source file to its factor id. Every finding code must be +// namespaced `.[.]` (issue: agent-native finding codes). +const ANALYZERS: Record = { + 'structured-data.ts': 'structured-data', + 'ai-readable-content.ts': 'ai-readable-content', + 'entity-consistency.ts': 'entity-consistency', + 'content-depth.ts': 'content-depth', + 'definition-blocks.ts': 'definition-blocks', + 'faq-content.ts': 'faq-content', + 'named-entities.ts': 'named-entities', + 'citations.ts': 'citations', + 'content-freshness.ts': 'content-freshness', + 'geographic-signals.ts': 'geographic-signals', + 'eeat-signals.ts': 'eeat-signals', + 'ai-crawler-access.ts': 'ai-crawler-access', + 'schema-completeness.ts': 'schema-completeness', + 'schema-validity.ts': 'schema-validity', + 'content-extractability.ts': 'content-extractability', + 'technical-seo.ts': 'technical-seo', + 'snippet-eligibility.ts': 'snippet-eligibility', + 'agent-skill-exposure.ts': 'agent-skill-exposure', + 'lighthouse.ts': 'lighthouse', +} + +// kebab-case dot segments, at least `.`. +const CODE_RE = /^[a-z0-9]+(?:-[a-z0-9]+)*(?:\.[a-z0-9]+(?:-[a-z0-9]+)*)+$/ + +function readAnalyzer(file: string): string { + return readFileSync(new URL(`../src/analyzers/${file}`, import.meta.url), 'utf8') +} + +function extractCodes(source: string): string[] { + const codes: string[] = [] + const re = /\bcode:\s*['"]([^'"]+)['"]/g + let m: RegExpExecArray | null + while ((m = re.exec(source)) !== null) codes.push(m[1]) + return codes +} + +function countFindings(source: string): number { + return (source.match(/findings\.push\(/g) ?? []).length +} + +describe('finding codes', () => { + const allCodes: string[] = [] + + for (const [file, factorId] of Object.entries(ANALYZERS)) { + describe(file, () => { + const source = readAnalyzer(file) + const codes = extractCodes(source) + + it('codes at least every findings.push site', () => { + // The required `code` on AuditFinding makes the compiler the real + // completeness gate; here we sanity-check that codes are present and + // cover every push site. Some analyzers also build findings as inline + // array literals (e.g. lighthouse early returns), so allow >=. + expect(codes.length).toBeGreaterThan(0) + expect(codes.length).toBeGreaterThanOrEqual(countFindings(source)) + }) + + it('every code follows the .[.] convention', () => { + for (const code of codes) expect(code, code).toMatch(CODE_RE) + }) + + it(`every code is namespaced under "${factorId}."`, () => { + for (const code of codes) expect(code.startsWith(`${factorId}.`), code).toBe(true) + }) + + it('codes are unique within the analyzer', () => { + expect(new Set(codes).size).toBe(codes.length) + }) + + allCodes.push(...codes) + }) + } + + it('codes are globally unique across all analyzers', () => { + const seen = new Map() + for (const c of allCodes) seen.set(c, (seen.get(c) ?? 0) + 1) + const dupes = [...seen.entries()].filter(([, n]) => n > 1).map(([c]) => c) + expect(dupes, `duplicate codes: ${dupes.join(', ')}`).toEqual([]) + }) +}) diff --git a/test/sitemap-cross-cutting.test.ts b/test/sitemap-cross-cutting.test.ts index 40cc4ce..102cd1b 100644 --- a/test/sitemap-cross-cutting.test.ts +++ b/test/sitemap-cross-cutting.test.ts @@ -1,6 +1,6 @@ import { describe, it, expect } from 'vitest' -import { buildCrossCuttingIssues } from '../src/sitemap.js' +import { buildCrossCuttingIssues, buildPrioritizedFixes } from '../src/sitemap.js' import type { AuditReport, ScoredFactor } from '../src/types.js' function factor(overrides: Partial & { id: string; name: string }): ScoredFactor { @@ -18,6 +18,7 @@ function factor(overrides: Partial & { id: string; name: string }) function report(url: string, factors: ScoredFactor[]): AuditReport { return { + schemaVersion: '1.1', url, finalUrl: url, auditedAt: '2026-04-18T00:00:00.000Z', @@ -25,6 +26,7 @@ function report(url: string, factors: ScoredFactor[]): AuditReport { overallGrade: 'C', summary: '', factors, + criticalDefects: [], metadata: { fetchTimeMs: 0, pageTitle: '', @@ -91,3 +93,37 @@ describe('buildCrossCuttingIssues', () => { expect(buildCrossCuttingIssues(pages)).toHaveLength(0) }) }) + +describe('buildPrioritizedFixes', () => { + it('unions every recommendation, so a homepage hit by a non-top recommendation still flips affectsHomepage', () => { + const metaRec = 'Expand the meta description to 150–160 characters.' // top: two non-homepage pages + const canonicalRec = 'Add ' // homepage only + + const pages: AuditReport[] = [ + report('https://example.com/a', [ + factor({ id: 'technical-seo', name: 'Technical SEO', score: 50, recommendations: [metaRec] }), + ]), + report('https://example.com/b', [ + factor({ id: 'technical-seo', name: 'Technical SEO', score: 55, recommendations: [metaRec] }), + ]), + report('https://example.com/', [ + factor({ id: 'technical-seo', name: 'Technical SEO', score: 50, recommendations: [canonicalRec] }), + ]), + ] + + const fixes = buildPrioritizedFixes(buildCrossCuttingIssues(pages), pages.length) + const tech = fixes.find((f) => f.id === 'technical-seo') + expect(tech).toBeDefined() + + // The top recommendation (most pages) hits /a and /b, not the homepage; the homepage + // is only hit by the canonical recommendation. Reach must union both, so the homepage + // is included, flagged, and counted — not dropped because it wasn't the top sub-issue. + expect(tech?.affectsHomepage).toBe(true) + expect(tech?.affectedPages).toEqual([ + 'https://example.com/', + 'https://example.com/a', + 'https://example.com/b', + ]) + expect(tech?.prevalencePct).toBe(100) + }) +}) diff --git a/test/static-audit.test.ts b/test/static-audit.test.ts index e5278f6..cd1ba23 100644 --- a/test/static-audit.test.ts +++ b/test/static-audit.test.ts @@ -90,3 +90,50 @@ describe('runStaticAudit', () => { }) }) }) + +describe('runStaticAudit critical defects (issue #42)', () => { + let dir: string + + beforeAll(async () => { + dir = await mkdtemp(path.join(os.tmpdir(), 'aeo-defects-')) + // Homepage with two H1s (a split headline) — a single-page defect that the + // prevalence ranking would otherwise bury. + await writeFile( + path.join(dir, 'index.html'), + 'Home' + + '' + + '

Build

faster

Some content for the analyzers.

', + ) + // A clean page so the defect really is low-prevalence. + await writeFile( + path.join(dir, 'about.html'), + 'About' + + '' + + '

About

Some content for the analyzers.

', + ) + }) + + afterAll(async () => { + await rm(dir, { recursive: true, force: true }) + }) + + it('surfaces the homepage H1 defect in criticalDefects and at the top of prioritizedFixes', async () => { + const result = await runStaticAudit(dir, { baseUrl: 'https://example.com' }) + if (result.kind !== 'multi') throw new Error('expected multi') + + const multipleH1 = result.report.criticalDefects.find((g) => g.id === 'multiple-h1') + expect(multipleH1).toBeDefined() + expect(multipleH1?.pages[0].url).toBe('https://example.com/') + expect(multipleH1?.pages[0].isHomepage).toBe(true) + + // The defect leads the prioritized fixes despite affecting only 1 of 2 pages. + const topFix = result.report.prioritizedFixes[0] + expect(topFix.kind).toBe('critical-defect') + expect(topFix.id).toBe('multiple-h1') + expect(topFix.affectsHomepage).toBe(true) + expect(topFix.affectedPages).toContain('https://example.com/') + + // The report carries a schema version so agent parsers can detect shape drift. + expect(result.report.schemaVersion).toBe('1.1') + }) +})