diff --git a/README.md b/README.md
index 3c19d90..cc5c6ce 100644
--- a/README.md
+++ b/README.md
@@ -76,6 +76,52 @@ npx @ainyc/aeo-audit https://example.com --include-geo
npx @ainyc/aeo-audit https://example.com --include-agent-skills
```
+### Platform Detection Mode
+
+Detect what platform, CMS, framework, or static site generator a website is built on. Useful for competitor research, lead qualification, and triage before an audit.
+
+```bash
+# Identify the stack (WordPress, Webflow, Shopify, Next.js, Vercel, etc.)
+npx @ainyc/aeo-audit https://example.com --detect-platform
+
+# JSON for programmatic use
+npx @ainyc/aeo-audit https://example.com --detect-platform --format json
+
+# Only show high-confidence matches
+npx @ainyc/aeo-audit https://example.com --detect-platform --min-confidence high
+```
+
+The detector inspects HTML, response headers, ``, script and link sources, and platform-specific globals to fingerprint:
+
+- **CMS:** WordPress, Drupal, Joomla, Ghost, HubSpot, Craft CMS, Sanity, Contentful, Notion
+- **Site builders:** Wix, Squarespace, Webflow, Framer, Carrd, Bubble
+- **E-commerce:** Shopify, WooCommerce, BigCommerce, Magento, PrestaShop
+- **Frameworks:** Next.js, Nuxt, Gatsby, Remix, Astro, SvelteKit, Angular, Vue, React, Ember, Qwik
+- **Static site generators:** Hugo, Jekyll, Eleventy, Hexo, Docusaurus, MkDocs
+- **Hosting / CDN:** Vercel, Netlify, Cloudflare, GitHub Pages, Fastly, AWS CloudFront
+
+Each detected platform is reported with a confidence bucket (`high`, `medium`, `low`), a numeric score, an optional version, and the list of signals that matched. When no CMS, site builder, or e-commerce platform is found, the report flags the site as `custom-built` (framework and hosting fingerprints are still surfaced for context). Exit code is `0` when at least one platform is detected, `1` otherwise.
+
+#### Batch detection
+
+Pass `--urls` to fingerprint many sites in a single run. Pages are fetched with bounded concurrency (5 in flight by default; tune with `--concurrency`).
+
+```bash
+# From a file (one URL per line; # comments and blank lines are skipped)
+npx @ainyc/aeo-audit --detect-platform --urls urls.txt
+
+# Inline comma-separated list
+npx @ainyc/aeo-audit --detect-platform --urls https://a.com,https://b.com,https://c.com
+
+# From stdin
+cat urls.txt | npx @ainyc/aeo-audit --detect-platform --urls -
+
+# JSON for downstream processing
+npx @ainyc/aeo-audit --detect-platform --urls urls.txt --format json
+```
+
+Per-URL fetch errors don't abort the batch — each entry is reported with `status: 'success'` or `status: 'error'`. Exit code is `0` when at least one URL succeeded, `1` otherwise.
+
### Sitemap Mode
Audit every page discovered from the site's sitemap with bounded concurrency (5 in flight):
@@ -107,6 +153,10 @@ When the sitemap has more URLs than `--limit`, the run audits the highest-priori
| `--sitemap [url]` | Audit all pages from the sitemap (auto-discovers `/sitemap.xml` or uses an explicit URL) |
| `--limit ` | Max pages to audit in sitemap mode (default 200, sorted by sitemap priority) |
| `--top-issues` | In sitemap mode, skip per-page output and show only cross-cutting issues |
+| `--detect-platform` | Identify the platform/CMS/framework powering the site instead of running an audit |
+| `--urls ` | In `--detect-platform` mode, run on multiple URLs. `` is a file path (one URL per line), a comma-separated list, or `-` for stdin |
+| `--concurrency ` | In `--detect-platform` batch mode, max in-flight fetches (default 5) |
+| `--min-confidence ` | In platform-detect mode, only report matches at or above this level: `low` (default), `medium`, `high` |
| `-h`, `--help` | Show the help message |
Exit code `0` for score >= 70, `1` for < 70 (CI-friendly). In sitemap mode the exit code is based on the aggregate score.
diff --git a/package.json b/package.json
index d9c16ac..7fbef65 100644
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
{
"name": "@ainyc/aeo-audit",
- "version": "1.5.0",
+ "version": "1.6.0",
"description": "The most comprehensive open-source Answer Engine Optimization (AEO) audit tool. Scores websites across 13 ranking factors that determine AI citation.",
"type": "module",
"main": "./dist/index.js",
diff --git a/skills/aeo/SKILL.md b/skills/aeo/SKILL.md
index 686e1a9..78d7804 100644
--- a/skills/aeo/SKILL.md
+++ b/skills/aeo/SKILL.md
@@ -42,6 +42,7 @@ npx @ainyc/aeo-audit@1 "" [flags] --format json
- `schema`: validate JSON-LD and entity consistency
- `llms`: create or improve `llms.txt` and `llms-full.txt`
- `monitor`: compare changes over time or benchmark competitors
+- `detect-platform`: identify the CMS, site builder, framework, or hosting stack a site uses
If no mode is provided, default to `audit`.
@@ -55,10 +56,14 @@ If no mode is provided, default to `audit`.
- `schema https://example.com`
- `llms https://example.com`
- `monitor https://site-a.com --compare https://site-b.com`
+- `detect-platform https://example.com`
+- `detect-platform https://example.com --min-confidence high`
+- `detect-platform --urls competitors.txt`
+- `detect-platform --urls https://a.com,https://b.com`
## Mode Selection
-- If the first argument is one of `audit`, `fix`, `schema`, `llms`, or `monitor`, use that mode.
+- If the first argument is one of `audit`, `fix`, `schema`, `llms`, `monitor`, or `detect-platform`, use that mode.
- If no explicit mode is given, infer the intent from the request and default to `audit`.
## Audit
@@ -101,6 +106,35 @@ Returns:
- Aggregate score and grade
- Prioritized fixes ranked by site-wide impact
+### Detect Platform Mode
+
+Use `--detect-platform` when the user wants to know what stack a site is built on (e.g., "is this WordPress?", "what framework does competitor X use?", "is this site custom-built?"). This is much faster than a full audit because it skips analyzer scoring.
+
+```bash
+npx @ainyc/aeo-audit@1 "" --detect-platform --format json
+npx @ainyc/aeo-audit@1 "" --detect-platform --min-confidence high --format json
+```
+
+Flags:
+- `--detect-platform` — switch to detection mode instead of auditing
+- `--min-confidence ` — filter to `low` (default), `medium`, or `high` confidence
+- `--urls ` — run on multiple URLs at once (file path, comma-separated list, or `-` for stdin)
+- `--concurrency ` — max in-flight fetches in batch mode (default 5)
+
+The report groups detections by category (CMS, site builder, e-commerce, framework, SSG, hosting), each with a confidence bucket, a 0–100 score, an optional version, and the signals that matched. When the report's `isCustom` flag is true, no CMS/site-builder/e-commerce platform was identified — the site is likely custom-built. Exit code is `0` when at least one platform is detected, `1` otherwise.
+
+#### Batch detection
+
+When the user wants to fingerprint many sites at once (competitor lists, customer cohorts), pass `--urls`:
+
+```bash
+npx @ainyc/aeo-audit@1 --detect-platform --urls urls.txt --format json
+npx @ainyc/aeo-audit@1 --detect-platform --urls https://a.com,https://b.com --format json
+cat urls.txt | npx @ainyc/aeo-audit@1 --detect-platform --urls - --format json
+```
+
+The batch report contains a `results` array; each entry has `status: 'success'` or `'error'`, plus the same shape as a single-URL report on success. Per-URL fetch errors do not abort the run. Exit code is `0` when at least one URL succeeded, `1` otherwise.
+
## Fix
Use when the user wants code changes applied after the audit.
diff --git a/src/cli.ts b/src/cli.ts
index 214f115..8e6cddd 100644
--- a/src/cli.ts
+++ b/src/cli.ts
@@ -1,11 +1,33 @@
+import { readFile } from 'node:fs/promises'
import { runAeoAudit } from './index.js'
import { runSitemapAudit } from './sitemap.js'
+import { detectPlatform, detectPlatformBatch } from './detect-platform.js'
import { isAeoAuditError } from './errors.js'
-import { formatJson } from './formatters/json.js'
-import { formatSitemapJson } from './formatters/json.js'
-import { formatMarkdown, formatSitemapMarkdown } from './formatters/markdown.js'
-import { formatText, formatSitemapText } from './formatters/text.js'
-import type { AuditReport, SitemapAuditReport, SitemapAuditOptions } from './types.js'
+import {
+ formatBatchPlatformJson,
+ formatJson,
+ formatPlatformJson,
+ formatSitemapJson,
+} from './formatters/json.js'
+import {
+ formatBatchPlatformMarkdown,
+ formatMarkdown,
+ formatPlatformMarkdown,
+ formatSitemapMarkdown,
+} from './formatters/markdown.js'
+import {
+ formatBatchPlatformText,
+ formatPlatformText,
+ formatSitemapText,
+ formatText,
+} from './formatters/text.js'
+import type {
+ BatchPlatformDetectionReport,
+ PlatformConfidence,
+ PlatformDetectionReport,
+ SitemapAuditOptions,
+ SitemapAuditReport,
+} from './types.js'
const FORMATTERS = {
json: formatJson,
@@ -19,6 +41,18 @@ const SITEMAP_FORMATTERS = {
text: (report: SitemapAuditReport, topIssuesOnly: boolean) => formatSitemapText(report, topIssuesOnly),
}
+const PLATFORM_FORMATTERS = {
+ json: (report: PlatformDetectionReport) => formatPlatformJson(report),
+ markdown: (report: PlatformDetectionReport) => formatPlatformMarkdown(report),
+ text: (report: PlatformDetectionReport) => formatPlatformText(report),
+}
+
+const BATCH_PLATFORM_FORMATTERS = {
+ json: (report: BatchPlatformDetectionReport) => formatBatchPlatformJson(report),
+ markdown: (report: BatchPlatformDetectionReport) => formatBatchPlatformMarkdown(report),
+ text: (report: BatchPlatformDetectionReport) => formatBatchPlatformText(report),
+}
+
type FormatterName = keyof typeof FORMATTERS
interface ParsedArgs {
@@ -32,6 +66,10 @@ interface ParsedArgs {
sitemapUrl: string | null
limit: number | null
topIssues: boolean
+ detectPlatform: boolean
+ minConfidence: PlatformConfidence | null
+ urls: string | null
+ concurrency: number | null
}
function isFormatterName(value: string): value is FormatterName {
@@ -51,6 +89,10 @@ function parseArgs(argv: string[]): ParsedArgs {
sitemapUrl: null,
limit: null,
topIssues: false,
+ detectPlatform: false,
+ minConfidence: null,
+ urls: null,
+ concurrency: null,
}
for (let i = 0; i < args.length; i += 1) {
@@ -79,6 +121,23 @@ function parseArgs(argv: string[]): ParsedArgs {
i += 1
} else if (args[i] === '--top-issues') {
result.topIssues = true
+ } else if (args[i] === '--detect-platform') {
+ result.detectPlatform = true
+ } else if (args[i] === '--min-confidence' && args[i + 1]) {
+ const value = args[i + 1]
+ if (value === 'high' || value === 'medium' || value === 'low') {
+ result.minConfidence = value
+ }
+ i += 1
+ } else if (args[i] === '--urls' && args[i + 1]) {
+ result.urls = args[i + 1]
+ i += 1
+ } else if (args[i] === '--concurrency' && args[i + 1]) {
+ const num = parseInt(args[i + 1], 10)
+ if (Number.isFinite(num) && num > 0) {
+ result.concurrency = num
+ }
+ i += 1
} else if (args[i] === '--help' || args[i] === '-h') {
result.help = true
} else if (!args[i].startsWith('-')) {
@@ -89,6 +148,39 @@ function parseArgs(argv: string[]): ParsedArgs {
return result
}
+export function parseUrlList(text: string): string[] {
+ const urls: string[] = []
+ for (const rawLine of text.split(/\r?\n/)) {
+ const line = rawLine.trim()
+ if (line.length === 0 || line.startsWith('#')) continue
+ // Allow comma-separated values on a single line too.
+ for (const part of line.split(',')) {
+ const candidate = part.trim()
+ if (candidate.length > 0) urls.push(candidate)
+ }
+ }
+ return urls
+}
+
+async function readStdin(): Promise {
+ const chunks: Buffer[] = []
+ for await (const chunk of process.stdin) {
+ chunks.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk))
+ }
+ return Buffer.concat(chunks).toString('utf-8')
+}
+
+async function resolveUrls(spec: string): Promise {
+ if (spec === '-') {
+ return parseUrlList(await readStdin())
+ }
+ if (spec.startsWith('http://') || spec.startsWith('https://')) {
+ return parseUrlList(spec)
+ }
+ const text = await readFile(spec, 'utf-8')
+ return parseUrlList(text)
+}
+
function printHelp() {
console.log(`
Usage: aeo-audit [options]
@@ -103,6 +195,14 @@ Options:
--limit Max pages to audit in sitemap mode (default 200, sorted by sitemap priority).
When the sitemap exceeds the limit, a notice is printed to stderr.
--top-issues In sitemap mode, skip per-page output and show only cross-cutting issues
+ --detect-platform Detect what platform/CMS/framework the site is built on (WordPress,
+ Webflow, Shopify, Next.js, etc.) instead of running a full audit.
+ --urls In --detect-platform mode, run on multiple URLs. can be a path
+ to a text file (one URL per line, # comments allowed), a comma-separated
+ list (e.g. https://a.com,https://b.com), or - to read from stdin.
+ --concurrency In --detect-platform batch mode, max in-flight fetches (default 5).
+ --min-confidence In platform-detect mode, only report platforms at or above this
+ confidence level: low (default), medium, high.
-h, --help Show this help message
Examples:
@@ -115,8 +215,16 @@ Examples:
aeo-audit https://example.com --sitemap https://example.com/sitemap.xml
aeo-audit https://example.com --sitemap --limit 10
aeo-audit https://example.com --sitemap --top-issues
+ aeo-audit https://example.com --detect-platform
+ aeo-audit https://example.com --detect-platform --format json
+ aeo-audit https://example.com --detect-platform --min-confidence medium
+ aeo-audit --detect-platform --urls urls.txt
+ aeo-audit --detect-platform --urls https://a.com,https://b.com --format json
+ cat urls.txt | aeo-audit --detect-platform --urls -
Exit code: 0 when score >= 70, 1 otherwise. In sitemap mode, the aggregate score is used.
+In --detect-platform mode, exit code is 0 if any platform is detected, 1 otherwise.
+In --detect-platform batch mode, exit code is 0 if at least one URL succeeded, 1 otherwise.
`)
}
@@ -128,17 +236,57 @@ export async function main(argv: string[] = process.argv): Promise {
return 0
}
- if (!args.url) {
- console.error('Error: URL is required. Run with --help for usage.')
+ if (!isFormatterName(args.format)) {
+ console.error(`Error: Unknown format "${args.format}". Use: text, json, markdown`)
return 1
}
- if (!isFormatterName(args.format)) {
- console.error(`Error: Unknown format "${args.format}". Use: text, json, markdown`)
+ if (args.urls && !args.detectPlatform) {
+ console.error('Error: --urls is only supported with --detect-platform.')
return 1
}
try {
+ if (args.detectPlatform) {
+ if (args.urls) {
+ if (args.url) {
+ console.error('Error: cannot combine a positional URL with --urls. Use one or the other.')
+ return 1
+ }
+
+ const urls = await resolveUrls(args.urls)
+ if (urls.length === 0) {
+ console.error('Error: no URLs found in --urls input.')
+ return 1
+ }
+
+ const batch = await detectPlatformBatch(urls, {
+ minConfidence: args.minConfidence ?? undefined,
+ concurrency: args.concurrency ?? undefined,
+ })
+ const batchFormatter = BATCH_PLATFORM_FORMATTERS[args.format]
+ console.log(batchFormatter(batch))
+ return batch.successful > 0 ? 0 : 1
+ }
+
+ if (!args.url) {
+ console.error('Error: URL is required (or pass --urls |). Run with --help for usage.')
+ return 1
+ }
+
+ const report = await detectPlatform(args.url, {
+ minConfidence: args.minConfidence ?? undefined,
+ })
+ const platformFormatter = PLATFORM_FORMATTERS[args.format]
+ console.log(platformFormatter(report))
+ return report.detected.length > 0 ? 0 : 1
+ }
+
+ if (!args.url) {
+ console.error('Error: URL is required. Run with --help for usage.')
+ return 1
+ }
+
if (args.sitemap) {
const options: SitemapAuditOptions = {
factors: args.factors,
diff --git a/src/detect-platform.ts b/src/detect-platform.ts
new file mode 100644
index 0000000..5227a6d
--- /dev/null
+++ b/src/detect-platform.ts
@@ -0,0 +1,811 @@
+import { load, type CheerioAPI } from 'cheerio'
+import { isAeoAuditError } from './errors.js'
+import { fetchPage, normalizeTargetUrl } from './fetch-page.js'
+import { mapWithConcurrency } from './sitemap.js'
+import type {
+ BatchDetectionEntry,
+ BatchPlatformDetectionReport,
+ DetectedPlatform,
+ PlatformCategory,
+ PlatformConfidence,
+ PlatformDetectionReport,
+} from './types.js'
+
+const HIGH_THRESHOLD = 70
+const MEDIUM_THRESHOLD = 40
+const LOW_THRESHOLD = 15
+
+interface DetectionInput {
+ $: CheerioAPI
+ html: string
+ headers: Record
+}
+
+interface SignalHit {
+ evidence: string
+ weight: number
+ version?: string
+}
+
+type SignalFn = (input: DetectionInput) => SignalHit | null
+
+interface PlatformDef {
+ id: string
+ name: string
+ category: PlatformCategory
+ signals: SignalFn[]
+}
+
+/* ── Signal helpers ───────────────────────────────────────────────────── */
+
+function getHeader(headers: Record, name: string): string | undefined {
+ const lower = name.toLowerCase()
+ for (const key of Object.keys(headers)) {
+ if (key.toLowerCase() === lower) return headers[key]
+ }
+ return undefined
+}
+
+function metaGenerator(pattern: RegExp, weight: number): SignalFn {
+ return ({ $ }) => {
+ const content = $('meta[name="generator" i]').attr('content')?.trim()
+ if (!content) return null
+ const match = content.match(pattern)
+ if (!match) return null
+ return {
+ evidence: ``,
+ weight,
+ version: match[1],
+ }
+ }
+}
+
+function scriptSrcContains(needle: string | RegExp, weight: number): SignalFn {
+ return ({ $ }) => {
+ let hit: string | undefined
+ $('script[src]').each((_, el) => {
+ if (hit) return
+ const src = $(el).attr('src') || ''
+ if (typeof needle === 'string' ? src.includes(needle) : needle.test(src)) {
+ hit = src
+ }
+ })
+ if (!hit) return null
+ return { evidence: `script src includes "${truncate(hit, 80)}"`, weight }
+ }
+}
+
+function linkHrefContains(needle: string | RegExp, weight: number): SignalFn {
+ return ({ $ }) => {
+ let hit: string | undefined
+ $('link[href]').each((_, el) => {
+ if (hit) return
+ const href = $(el).attr('href') || ''
+ if (typeof needle === 'string' ? href.includes(needle) : needle.test(href)) {
+ hit = href
+ }
+ })
+ if (!hit) return null
+ return { evidence: `link href includes "${truncate(hit, 80)}"`, weight }
+ }
+}
+
+function imgSrcContains(needle: string | RegExp, weight: number): SignalFn {
+ return ({ $ }) => {
+ let hit: string | undefined
+ $('img[src]').each((_, el) => {
+ if (hit) return
+ const src = $(el).attr('src') || ''
+ if (typeof needle === 'string' ? src.includes(needle) : needle.test(src)) {
+ hit = src
+ }
+ })
+ if (!hit) return null
+ return { evidence: `img src includes "${truncate(hit, 80)}"`, weight }
+ }
+}
+
+function htmlMatches(pattern: RegExp, weight: number, label?: string): SignalFn {
+ return ({ html }) => {
+ const match = html.match(pattern)
+ if (!match) return null
+ return {
+ evidence: label ?? `HTML contains ${pattern.source.slice(0, 60)}`,
+ weight,
+ version: match[1],
+ }
+ }
+}
+
+function headerMatches(name: string, pattern: RegExp, weight: number): SignalFn {
+ return ({ headers }) => {
+ const value = getHeader(headers, name)
+ if (!value) return null
+ const match = value.match(pattern)
+ if (!match) return null
+ return {
+ evidence: `${name} header: "${truncate(value, 80)}"`,
+ weight,
+ version: match[1],
+ }
+ }
+}
+
+function headerExists(name: string, weight: number): SignalFn {
+ return ({ headers }) => {
+ const value = getHeader(headers, name)
+ if (!value) return null
+ return { evidence: `${name} header present: "${truncate(value, 80)}"`, weight }
+ }
+}
+
+function bodyClassContains(token: string, weight: number): SignalFn {
+ return ({ $ }) => {
+ const cls = $('body').attr('class') || ''
+ if (!cls.split(/\s+/).some((c) => c.includes(token))) return null
+ return { evidence: `body class contains "${token}"`, weight }
+ }
+}
+
+function elementExists(selector: string, weight: number, label?: string): SignalFn {
+ return ({ $ }) => {
+ if ($(selector).length === 0) return null
+ return { evidence: label ?? `element ${selector} present`, weight }
+ }
+}
+
+function attributeExists(selector: string, attr: string, weight: number): SignalFn {
+ return ({ $ }) => {
+ let hit: string | undefined
+ $(selector).each((_, el) => {
+ if (hit) return
+ const v = $(el).attr(attr)
+ if (typeof v === 'string') hit = v
+ })
+ if (hit === undefined) return null
+ return { evidence: `${selector}[${attr}] present`, weight }
+ }
+}
+
+function truncate(s: string, n: number): string {
+ return s.length > n ? s.slice(0, n - 1) + '…' : s
+}
+
+/* ── Platform definitions ─────────────────────────────────────────────── */
+
+const PLATFORMS: PlatformDef[] = [
+ /* === CMS ============================================================ */
+ {
+ id: 'wordpress',
+ name: 'WordPress',
+ category: 'cms',
+ signals: [
+ metaGenerator(/WordPress(?:\s+(\d+\.\d+(?:\.\d+)?))?/i, 5),
+ scriptSrcContains('/wp-includes/', 5),
+ scriptSrcContains('/wp-content/', 4),
+ linkHrefContains('/wp-content/', 3),
+ linkHrefContains('/wp-json/', 4),
+ htmlMatches(/wp-block-[a-z]/, 2, 'wp-block-* class found'),
+ headerExists('x-pingback', 3),
+ headerMatches('link', /wp-json/i, 4),
+ ],
+ },
+ {
+ id: 'drupal',
+ name: 'Drupal',
+ category: 'cms',
+ signals: [
+ metaGenerator(/Drupal(?:\s+(\d+))?/i, 5),
+ headerMatches('x-generator', /drupal/i, 5),
+ headerExists('x-drupal-cache', 4),
+ headerExists('x-drupal-dynamic-cache', 4),
+ htmlMatches(/sites\/(?:default|all)\//i, 3, '/sites/default/ or /sites/all/ path'),
+ ],
+ },
+ {
+ id: 'joomla',
+ name: 'Joomla',
+ category: 'cms',
+ signals: [
+ metaGenerator(/Joomla(?:!\s*-?\s*(\d+\.\d+))?/i, 5),
+ htmlMatches(/\/media\/jui\//i, 3, '/media/jui/ path'),
+ htmlMatches(/\/components\/com_/i, 3, '/components/com_* path'),
+ ],
+ },
+ {
+ id: 'ghost',
+ name: 'Ghost',
+ category: 'cms',
+ signals: [
+ metaGenerator(/Ghost(?:\s+(\d+\.\d+))?/i, 5),
+ linkHrefContains('/ghost/', 3),
+ scriptSrcContains('ghost.io', 3),
+ ],
+ },
+ {
+ id: 'hubspot',
+ name: 'HubSpot CMS',
+ category: 'cms',
+ signals: [
+ metaGenerator(/HubSpot/i, 5),
+ // tracking/forms scripts: low weight — many non-CMS sites embed HubSpot CRM tooling.
+ scriptSrcContains('js.hs-scripts.com', 2),
+ scriptSrcContains('js.hsforms.net', 1),
+ scriptSrcContains('hs-banner.com', 1),
+ scriptSrcContains('hs-analytics.net', 1),
+ ],
+ },
+ {
+ id: 'craft-cms',
+ name: 'Craft CMS',
+ category: 'cms',
+ signals: [
+ headerMatches('x-powered-by', /Craft\s*CMS/i, 5),
+ scriptSrcContains('cdn.craft.cm', 3),
+ ],
+ },
+ {
+ id: 'sanity',
+ name: 'Sanity',
+ category: 'cms',
+ signals: [
+ imgSrcContains('cdn.sanity.io', 4),
+ htmlMatches(/cdn\.sanity\.io/i, 3, 'cdn.sanity.io reference'),
+ ],
+ },
+ {
+ id: 'contentful',
+ name: 'Contentful',
+ category: 'cms',
+ signals: [
+ imgSrcContains('images.ctfassets.net', 4),
+ imgSrcContains('assets.ctfassets.net', 4),
+ htmlMatches(/(?:images|assets|videos)\.ctfassets\.net/i, 3, 'ctfassets.net reference'),
+ ],
+ },
+ {
+ id: 'notion',
+ name: 'Notion (notion.site)',
+ category: 'cms',
+ signals: [
+ htmlMatches(/notion\.site/i, 3, 'notion.site reference'),
+ metaGenerator(/Notion/i, 5),
+ htmlMatches(/__NOTION/, 3, '__NOTION marker'),
+ ],
+ },
+
+ /* === Site builders =================================================== */
+ {
+ id: 'wix',
+ name: 'Wix',
+ category: 'site-builder',
+ signals: [
+ metaGenerator(/Wix(?:\.com)?/i, 5),
+ scriptSrcContains('static.wixstatic.com', 5),
+ htmlMatches(/wix-warmup-data/i, 4, 'wix-warmup-data marker'),
+ htmlMatches(/parastorage\.com/i, 3, 'parastorage.com (Wix CDN)'),
+ ],
+ },
+ {
+ id: 'squarespace',
+ name: 'Squarespace',
+ category: 'site-builder',
+ signals: [
+ metaGenerator(/Squarespace/i, 5),
+ scriptSrcContains('static1.squarespace.com', 5),
+ htmlMatches(/Static\.SQUARESPACE_CONTEXT/, 5, 'Static.SQUARESPACE_CONTEXT global'),
+ headerMatches('x-served-by', /squarespace/i, 4),
+ ],
+ },
+ {
+ id: 'webflow',
+ name: 'Webflow',
+ category: 'site-builder',
+ signals: [
+ metaGenerator(/Webflow/i, 5),
+ attributeExists('html', 'data-wf-page', 5),
+ attributeExists('html', 'data-wf-site', 5),
+ scriptSrcContains('webflow.js', 4),
+ scriptSrcContains('webflow.com', 3),
+ imgSrcContains('uploads-ssl.webflow.com', 4),
+ imgSrcContains('cdn.prod.website-files.com', 4),
+ ],
+ },
+ {
+ id: 'framer',
+ name: 'Framer',
+ category: 'site-builder',
+ signals: [
+ metaGenerator(/Framer/i, 5),
+ htmlMatches(/framerusercontent\.com/i, 4, 'framerusercontent.com reference'),
+ htmlMatches(/framer\.com/i, 2, 'framer.com reference'),
+ ],
+ },
+ {
+ id: 'carrd',
+ name: 'Carrd',
+ category: 'site-builder',
+ signals: [
+ metaGenerator(/Carrd/i, 5),
+ htmlMatches(/carrd\.co/i, 3, 'carrd.co reference'),
+ ],
+ },
+ {
+ id: 'bubble',
+ name: 'Bubble',
+ category: 'site-builder',
+ signals: [
+ metaGenerator(/Bubble/i, 4),
+ scriptSrcContains('bubble.io', 4),
+ htmlMatches(/bubble-app/i, 3, 'bubble-app marker'),
+ ],
+ },
+
+ /* === E-commerce ====================================================== */
+ {
+ id: 'shopify',
+ name: 'Shopify',
+ category: 'ecommerce',
+ signals: [
+ headerExists('x-shopid', 5),
+ headerExists('x-shopify-stage', 5),
+ scriptSrcContains('cdn.shopify.com', 5),
+ htmlMatches(/Shopify\.theme\b/, 5, 'Shopify.theme global'),
+ htmlMatches(/Shopify\.shop\b/, 4, 'Shopify.shop global'),
+ headerMatches('powered-by', /Shopify/i, 4),
+ ],
+ },
+ {
+ id: 'woocommerce',
+ name: 'WooCommerce',
+ category: 'ecommerce',
+ signals: [
+ bodyClassContains('woocommerce', 4),
+ htmlMatches(/wp-content\/plugins\/woocommerce/i, 5, 'WooCommerce plugin path'),
+ metaGenerator(/WooCommerce(?:\s+(\d+\.\d+))?/i, 5),
+ ],
+ },
+ {
+ id: 'bigcommerce',
+ name: 'BigCommerce',
+ category: 'ecommerce',
+ signals: [
+ scriptSrcContains('cdn11.bigcommerce.com', 5),
+ headerMatches('x-bc-apigw-client-id', /./, 5),
+ htmlMatches(/bigcommerce\.com/i, 3, 'bigcommerce.com reference'),
+ ],
+ },
+ {
+ id: 'magento',
+ name: 'Magento (Adobe Commerce)',
+ category: 'ecommerce',
+ signals: [
+ htmlMatches(/Mage\.Cookies/, 5, 'Mage.Cookies marker'),
+ htmlMatches(/\/skin\/frontend\//i, 4, '/skin/frontend/ path'),
+ htmlMatches(/\/static\/version\d+\//i, 4, '/static/versionN/ path'),
+ headerMatches('x-magento-cache-debug', /./, 5),
+ ],
+ },
+ {
+ id: 'prestashop',
+ name: 'PrestaShop',
+ category: 'ecommerce',
+ signals: [
+ metaGenerator(/PrestaShop/i, 5),
+ headerMatches('powered-by', /PrestaShop/i, 5),
+ ],
+ },
+
+ /* === JS frameworks =================================================== */
+ {
+ id: 'nextjs',
+ name: 'Next.js',
+ category: 'framework',
+ signals: [
+ elementExists('script#__NEXT_DATA__', 5, '',
+ )
+ const result = detectPlatformFromInput({ html })
+ const wp = result.detected.find((p) => p.id === 'wordpress')
+ expect(wp, 'expected WordPress detection').toBeDefined()
+ expect(wp!.confidence).toBe('high')
+ expect(wp!.version).toBe('6.4.1')
+ expect(wp!.evidence.length).toBeGreaterThanOrEqual(2)
+ })
+
+ it('detects Drupal from x-generator header', () => {
+ const result = detectPlatformFromInput({
+ html: wrap(),
+ headers: { 'x-generator': 'Drupal 10 (https://www.drupal.org)' },
+ })
+ const drupal = result.detected.find((p) => p.id === 'drupal')
+ expect(drupal).toBeDefined()
+ expect(drupal!.confidence).toBe('high')
+ })
+
+ it('detects Ghost from generator', () => {
+ const html = wrap('')
+ const result = detectPlatformFromInput({ html })
+ const ghost = result.detected.find((p) => p.id === 'ghost')
+ expect(ghost).toBeDefined()
+ expect(ghost!.version).toBe('5.78')
+ })
+
+ it('detects HubSpot CMS at high confidence when meta generator declares it', () => {
+ const html = wrap('')
+ const result = detectPlatformFromInput({ html })
+ const hs = result.detected.find((p) => p.id === 'hubspot')
+ expect(hs).toBeDefined()
+ expect(hs!.confidence).toBe('high')
+ })
+
+ it('does not flag HubSpot CMS at high confidence from a tracking pixel alone', () => {
+ // js.hs-scripts.com is the HubSpot CRM tracking pixel — countless non-CMS sites embed it.
+ const html = wrap('')
+ const result = detectPlatformFromInput({ html })
+ const hs = result.detected.find((p) => p.id === 'hubspot')
+ expect(hs).toBeDefined()
+ expect(hs!.confidence).not.toBe('high')
+ })
+ })
+
+ describe('Site builder detection', () => {
+ it('detects Wix from generator + static.wixstatic.com', () => {
+ const html = wrap(
+ '' +
+ '',
+ )
+ const result = detectPlatformFromInput({ html })
+ const wix = result.detected.find((p) => p.id === 'wix')
+ expect(wix).toBeDefined()
+ expect(wix!.confidence).toBe('high')
+ })
+
+ it('detects Squarespace from Static.SQUARESPACE_CONTEXT global', () => {
+ const html = wrap('', '')
+ const result = detectPlatformFromInput({ html })
+ const sq = result.detected.find((p) => p.id === 'squarespace')
+ expect(sq).toBeDefined()
+ expect(sq!.confidence).toBe('high')
+ })
+
+ it('detects Webflow from data-wf-page attribute on ', () => {
+ const html = wrap('', '', 'data-wf-page="abc123" data-wf-site="xyz"')
+ const result = detectPlatformFromInput({ html })
+ const wf = result.detected.find((p) => p.id === 'webflow')
+ expect(wf).toBeDefined()
+ expect(wf!.confidence).toBe('high')
+ })
+
+ it('detects Framer from generator + framerusercontent', () => {
+ const html = wrap(
+ '' +
+ '',
+ )
+ const result = detectPlatformFromInput({ html })
+ const framer = result.detected.find((p) => p.id === 'framer')
+ expect(framer).toBeDefined()
+ })
+ })
+
+ describe('E-commerce detection', () => {
+ it('detects Shopify from headers + theme global', () => {
+ const html = wrap('', '')
+ const result = detectPlatformFromInput({
+ html,
+ headers: { 'x-shopid': '12345' },
+ })
+ const shop = result.detected.find((p) => p.id === 'shopify')
+ expect(shop).toBeDefined()
+ expect(shop!.confidence).toBe('high')
+ })
+
+ it('detects WooCommerce as a layered signal on top of WordPress', () => {
+ const html = wrap(
+ '',
+ '',
+ '',
+ ).replace(
+ '',
+ '',
+ )
+ const result = detectPlatformFromInput({ html })
+ expect(result.detected.find((p) => p.id === 'wordpress')).toBeDefined()
+ expect(result.detected.find((p) => p.id === 'woocommerce')).toBeDefined()
+ })
+
+ it('detects BigCommerce from cdn11.bigcommerce.com', () => {
+ const html = wrap('')
+ const result = detectPlatformFromInput({ html })
+ expect(result.detected.find((p) => p.id === 'bigcommerce')).toBeDefined()
+ })
+ })
+
+ describe('JS framework detection', () => {
+ it('detects Next.js from __NEXT_DATA__ script', () => {
+ const html = wrap(
+ '',
+ '',
+ )
+ const result = detectPlatformFromInput({ html })
+ const next = result.detected.find((p) => p.id === 'nextjs')
+ expect(next).toBeDefined()
+ expect(next!.confidence).toBe('high')
+ })
+
+ it('detects Nuxt from window.__NUXT__ + /_nuxt/', () => {
+ const html = wrap(
+ '',
+ '',
+ )
+ const result = detectPlatformFromInput({ html })
+ expect(result.detected.find((p) => p.id === 'nuxt')).toBeDefined()
+ })
+
+ it('detects Gatsby from div#___gatsby', () => {
+ const html = wrap('', '')
+ const result = detectPlatformFromInput({ html })
+ expect(result.detected.find((p) => p.id === 'gatsby')).toBeDefined()
+ })
+
+ it('detects Astro from generator + data-astro-cid attribute', () => {
+ const html = wrap('', 'x
')
+ const result = detectPlatformFromInput({ html })
+ const astro = result.detected.find((p) => p.id === 'astro')
+ expect(astro).toBeDefined()
+ expect(astro!.version).toBe('4.5.2')
+ })
+
+ it('detects Angular from ng-version attribute and captures the version', () => {
+ const html = wrap('', '')
+ const result = detectPlatformFromInput({ html })
+ const ng = result.detected.find((p) => p.id === 'angular')
+ expect(ng).toBeDefined()
+ expect(ng!.confidence).toBe('high')
+ expect(ng!.version).toBe('17.1.2')
+ })
+
+ it('detects Remix from window.__remixContext', () => {
+ const html = wrap('', '')
+ const result = detectPlatformFromInput({ html })
+ expect(result.detected.find((p) => p.id === 'remix')).toBeDefined()
+ })
+ })
+
+ describe('Static site generator detection', () => {
+ it('detects Hugo from generator', () => {
+ const html = wrap('')
+ const result = detectPlatformFromInput({ html })
+ const hugo = result.detected.find((p) => p.id === 'hugo')
+ expect(hugo).toBeDefined()
+ expect(hugo!.version).toBe('0.121.1')
+ })
+
+ it('detects Jekyll from generator', () => {
+ const html = wrap('')
+ const result = detectPlatformFromInput({ html })
+ const j = result.detected.find((p) => p.id === 'jekyll')
+ expect(j).toBeDefined()
+ expect(j!.version).toBe('4.3.2')
+ })
+
+ it('detects Docusaurus from __docusaurus marker', () => {
+ const html = wrap('')
+ const result = detectPlatformFromInput({ html })
+ expect(result.detected.find((p) => p.id === 'docusaurus')).toBeDefined()
+ })
+ })
+
+ describe('Hosting detection', () => {
+ it('detects Vercel from x-vercel-id header', () => {
+ const result = detectPlatformFromInput({
+ html: wrap(),
+ headers: { 'x-vercel-id': 'sfo1::abc123', server: 'Vercel' },
+ })
+ const v = result.detected.find((p) => p.id === 'vercel')
+ expect(v).toBeDefined()
+ expect(v!.confidence).toBe('high')
+ })
+
+ it('detects Netlify from x-nf-request-id header', () => {
+ const result = detectPlatformFromInput({
+ html: wrap(),
+ headers: { 'x-nf-request-id': 'abc-123', server: 'Netlify' },
+ })
+ expect(result.detected.find((p) => p.id === 'netlify')).toBeDefined()
+ })
+
+ it('detects Cloudflare from cf-ray header', () => {
+ const result = detectPlatformFromInput({
+ html: wrap(),
+ headers: { server: 'cloudflare', 'cf-ray': '8a1b2c3d-LAX' },
+ })
+ expect(result.detected.find((p) => p.id === 'cloudflare')).toBeDefined()
+ })
+
+ it('detects GitHub Pages from server header', () => {
+ const result = detectPlatformFromInput({
+ html: wrap(),
+ headers: { server: 'GitHub.com', 'x-github-request-id': 'abc' },
+ })
+ expect(result.detected.find((p) => p.id === 'github-pages')).toBeDefined()
+ })
+ })
+
+ describe('Custom site behavior', () => {
+ it('flags isCustom=true when only framework + hosting are detected', () => {
+ const html = wrap(
+ '',
+ '',
+ )
+ const result = detectPlatformFromInput({
+ html,
+ headers: { 'x-vercel-id': 'sfo1::abc' },
+ })
+ expect(result.isCustom).toBe(true)
+ expect(result.detected.find((p) => p.id === 'nextjs')).toBeDefined()
+ expect(result.detected.find((p) => p.id === 'vercel')).toBeDefined()
+ })
+
+ it('flags isCustom=false when a CMS is detected', () => {
+ const html = wrap(
+ '' +
+ '',
+ )
+ const result = detectPlatformFromInput({ html })
+ expect(result.isCustom).toBe(false)
+ })
+
+ it('returns no detections and isCustom=true for a fully blank page', () => {
+ const result = detectPlatformFromInput({
+ html: 'xhi',
+ })
+ expect(result.detected).toHaveLength(0)
+ expect(result.isCustom).toBe(true)
+ expect(result.rawSignals.generator).toBeNull()
+ })
+
+ it('captures raw signals even when no platform matches', () => {
+ const html = wrap('')
+ const result = detectPlatformFromInput({
+ html,
+ headers: { 'x-powered-by': 'PHP/8.2', server: 'nginx' },
+ })
+ expect(result.rawSignals.generator).toBe('MysteryStack 1.0')
+ expect(result.rawSignals.xPoweredBy).toBe('PHP/8.2')
+ expect(result.rawSignals.server).toBe('nginx')
+ })
+ })
+
+ describe('Confidence + ranking', () => {
+ it('sorts detections by confidence score (highest first)', () => {
+ const html = wrap(
+ '' +
+ '',
+ '',
+ )
+ const result = detectPlatformFromInput({ html })
+ const wpIdx = result.detected.findIndex((p) => p.id === 'wordpress')
+ const reactIdx = result.detected.findIndex((p) => p.id === 'react')
+ expect(wpIdx).toBeGreaterThanOrEqual(0)
+ // WordPress should rank above generic React if both detected.
+ if (reactIdx >= 0) expect(wpIdx).toBeLessThan(reactIdx)
+ })
+
+ it('respects minConfidence=high — drops weaker matches', () => {
+ const html = wrap('', '') // weak React signal only
+ const all = detectPlatformFromInput({ html })
+ const high = detectPlatformFromInput({ html }, { minConfidence: 'high' })
+ // React alone would be at most ~medium; high filter should drop it.
+ expect(high.detected.length).toBeLessThanOrEqual(all.detected.length)
+ expect(high.detected.find((p) => p.id === 'react')).toBeUndefined()
+ })
+ })
+})