Skip to content

feat(traffic): scaffold Cloudflare Worker traffic source (foundation)#635

Draft
arberx wants to merge 1 commit into
mainfrom
arberx/server-log-integrations
Draft

feat(traffic): scaffold Cloudflare Worker traffic source (foundation)#635
arberx wants to merge 1 commit into
mainfrom
arberx/server-log-integrations

Conversation

@arberx
Copy link
Copy Markdown
Member

@arberx arberx commented May 27, 2026

Summary

First slice of the Cloudflare adapter — contracts, DB schema, and the push-receive integration package. The HTTP route + CLI + doctor checks land in a follow-up PR.

Design doc: plans/cloudflare-worker-traffic-source.md

Why Worker push instead of GraphQL Analytics or Logpush

  • GraphQL Analytics API is aggregate-only even on the AI Crawl Control dataset — no raw request rows.
  • Logpush gives raw rows but is Business plan or higher only.
  • Workers give universal raw-row access on every plan (including free).

Also unlocks the future "Cloudflare-as-proxy" story for hosts with no native logs (Shopify, Webflow, Ghost, etc.) — once a customer has canonry's Worker on their zone, that zone is a fully ingestible traffic source regardless of where the site is actually hosted.

Why push-receive

This is the first push-receive traffic source — every existing adapter (cloud-run, vercel, wordpress) pulls. The principle in plans/server-side-ai-traffic-ingestion.md (no canonry-hosted endpoint in the hot path) is preserved because canonry is single-tenant per deployment: the Worker only ever talks to the operator's own canonry instance, never to a canonry-hosted SaaS relay.

What's in this PR

  • Zod schemas (packages/contracts/src/traffic.ts):
    • cloudflareWorkerSourceConfigSchema
    • trafficConnectCloudflareRequestSchema / trafficConnectCloudflareResponseSchema
    • cloudflareWorkerEventSchema / cloudflareWorkerIngestRequestSchema
  • New package @ainyc/canonry-integration-cloudflare-worker:
    • generateWorkerScript — produces the JS string with embedded source-id, bearer, HMAC secret, version, and bot keyword constants
    • generateWranglerToml — companion wrangler.toml for operators who prefer wrangler deploy
    • verifyRequestSignature — HMAC-SHA256 + ±300s timestamp window, constant-time comparison
    • normalizeCloudflareWorkerEvent — Worker event → provider-neutral NormalizedTrafficRequest
    • DEFAULT_BOT_LIST — versioned edge-side keyword set
  • Schema (packages/db/src/schema.ts + migration v67):
    • traffic_sources.ingest_token_hash (sha256 of the per-source bearer)
    • traffic_sources.last_worker_version (drives the future cloudflare.worker.version-stale doctor check)

Two-tier filtering

Edge-side filter is generic and stable — broad UA/referer keywords + Cloudflare bot signals — so the Worker only needs redeploys when the category of signal changes, not when individual bot names are added to canonry's list. The strict bot/operator classification stays server-side in packages/integration-traffic.

Secrets

  • Bearer token: hashed in DB (ingest_token_hash), cleartext in ~/.canonry/config.yaml, embedded in the Worker script at generation time
  • HMAC secret: never in DB, cleartext in ~/.canonry/config.yaml, embedded in the Worker script

Test plan

Written TDD (red → green for every unit). All workspace tests pass (3381/3381).

  • Contract schemas — 26 cases (happy path, every nullable field individually missing, schemaVersion literal mismatch, events-array boundaries 0/1/100/101)
  • Signature verifier — 12 cases (valid sig, mutated body, mutated timestamp, wrong secret, expired window, future window, non-numeric timestamp, empty timestamp, malformed hex, wrong byte length, empty body, default window)
  • Normalizer — 15 cases (full event, missing host, missing queryString, IP preserved for verification, cf=null, cf properties individually null, missing path/observedAt/eventId all return null, eventId namespacing, providerResource type, request size/latency intentionally null)
  • Worker script generator — 13 cases (every embedded constant lands, all bot keywords baked in, all referer suffixes baked in, bot list version reachable, waitUntil used, all 4 documented headers emitted, HMAC-SHA256 via SubtleCrypto, POST method, JS parses, custom score threshold, default bot list shape)
  • DB round-trip — 3 cases (cloudflare source persists both columns, pull adapters leave them NULL, v67 is the latest migration)
  • DB ↔ DTO coverage — new columns marked internal with rationale
  • Workspace typecheck clean
  • Workspace lint clean (only pre-existing apps/web warnings)

Out of scope (next PR)

  • API routes: POST /traffic/connect/cloudflare, POST /traffic/cloudflare/ingest, POST /traffic/cloudflare/rotate/:sourceId
  • CLI: canonry traffic connect cloudflare, traffic rotate cloudflare, traffic verify cloudflare
  • Doctor checks: cloudflare.worker.last-seen, cloudflare.worker.version-stale, cloudflare.worker.signature-failures
  • Dashboard connect-modal entry
  • Phase 2: auto-deploy via Cloudflare API token, Cloudflare-as-proxy zone provisioning, Logpush sibling adapter

🤖 Generated with Claude Code

First slice of the Cloudflare adapter — contracts, schema, and the
push-receive integration package. The HTTP route + CLI + doctor checks
land in a follow-up PR.

Why a Worker instead of GraphQL Analytics or Logpush: Cloudflare's
GraphQL API is aggregate-only and Logpush is Business+ only, so the
Worker is the universal raw-row access path. Also unblocks future
"Cloudflare-as-proxy" support for hosts with no native logs.

This is the first push-receive traffic source — every existing adapter
pulls. Safe because canonry is single-tenant per deployment; the Worker
only ever talks to the operator's own canonry instance.

Includes:
- Zod schemas for the source config, connect request/response, ingest
  payload, and the per-event shape
- integration-cloudflare-worker package with HMAC-SHA256 signature
  verifier, event → NormalizedTrafficRequest normalizer, and Worker
  script generator (broad edge-side bot/referer filter; strict
  classification stays server-side in integration-traffic)
- traffic_sources columns ingest_token_hash + last_worker_version
  (migration v67)
- plans/cloudflare-worker-traffic-source.md design doc
- Tests written first (TDD): 43 tests in the new package, 26 contract
  schema cases, 3 DB column round-trip tests

Full workspace test passes (3381/3381).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant