Skip to content

feat(traffic): add Cloud Run puller foundation#386

Merged
arberx merged 6 commits intomainfrom
ax/cloud-run-traffic-puller
May 7, 2026
Merged

feat(traffic): add Cloud Run puller foundation#386
arberx merged 6 commits intomainfrom
ax/cloud-run-traffic-puller

Conversation

@arberx
Copy link
Copy Markdown
Member

@arberx arberx commented May 1, 2026

Summary

Starts the server-side traffic ingestion stack with the provider-neutral model and Cloud Run / Cloud Logging adapter foundation.

This PR intentionally does not add public API, CLI, MCP, or dashboard traffic surfaces yet. The new code is a reusable integration package and shared contract layer so the next stacked PR can add persistence and public reads/writes in one complete API/CLI slice.

What changed

  • Added traffic source/evidence contract constants and NormalizedTrafficRequest in @ainyc/canonry-contracts.
  • Added @ainyc/canonry-integration-cloud-run with:
    • Cloud Logging filter construction for cloud_run_revision;
    • optional service, location, timestamp, request URL, and user-agent narrowing;
    • entries.list pull support with page tokens;
    • normalization from LogEntry.httpRequest into Canonry request evidence.
  • Added @ainyc/canonry-integration-traffic with local, provider-neutral crawler/referrer classification and hourly rollups over normalized request events.
  • Added scripts/test-cloud-run-traffic-pull.ts plus a fixture so we can test pull -> normalize -> ingest -> analyze before wiring Canonry DB/API/CLI surfaces.
  • Added a provider source model review documenting raw-event vs aggregate-bucket adapters across Cloud Run, WordPress, Cloudflare, and Vercel.
  • Updated the server-side ingestion plan to point at the source model review.
  • Bumped root and package versions to 3.3.0.

Local probe

Fixture mode:

pnpm tsx scripts/test-cloud-run-traffic-pull.ts \
  --fixture scripts/fixtures/cloud-run-traffic-sample.json

Real Cloud Run logs:

pnpm tsx scripts/test-cloud-run-traffic-pull.ts \
  --gcp-project <gcp-project-id> \
  --service <cloud-run-service> \
  --location <region> \
  --since 6h \
  --url-contains ainyc.ai \
  --use-gcloud \
  --out .tmp/cloud-run-traffic-report.json

Use --narrow-bots only when testing crawler detection specifically; it lowers Cloud Logging volume but misses human AI referrals.

Validation

  • pnpm run typecheck
  • pnpm run lint
  • pnpm -r --no-bail run test
  • pnpm --filter @ainyc/canonry-integration-cloud-run test -- --runInBand
  • pnpm --filter @ainyc/canonry-integration-cloud-run typecheck
  • pnpm exec eslint scripts/test-cloud-run-traffic-pull.ts packages/integration-cloud-run/src packages/integration-cloud-run/test
  • pnpm exec tsc --noEmit --target ES2022 --lib ES2022,DOM,DOM.Iterable --module NodeNext --moduleResolution NodeNext --types node --skipLibCheck scripts/test-cloud-run-traffic-pull.ts
  • pnpm tsx scripts/test-cloud-run-traffic-pull.ts --gcp-project openclaw-nyc --service openclaw-nyc --location us-east1 --since 6h --url-contains ainyc.ai --use-gcloud --page-size 1000 --max-pages 3 --out .tmp/ainyc-cloud-run-traffic-report.json
  • pnpm tsx scripts/test-cloud-run-traffic-pull.ts --gcp-project openclaw-nyc --service openclaw-nyc --location us-east1 --since 6h --url-contains ainyc.ai --narrow-bots --use-gcloud --page-size 1000 --max-pages 3 --out .tmp/ainyc-cloud-run-traffic-bots-report.json
  • commit hook reran pnpm -r run lint

@arberx arberx force-pushed the ax/review-ai-traffic-plan branch from 277dd17 to c14bf91 Compare May 1, 2026 02:02
Base automatically changed from ax/review-ai-traffic-plan to main May 1, 2026 02:04
@arberx arberx force-pushed the ax/cloud-run-traffic-puller branch 3 times, most recently from e91e632 to 45f073c Compare May 7, 2026 19:11
@arberx arberx marked this pull request as ready for review May 7, 2026 21:03
arberx added 6 commits May 7, 2026 17:04
Required by scripts/check-docs.sh — every package under packages/
must have AGENTS.md and CLAUDE.md.
Local probe that pulls Cloud Run request logs and GA4 AI-referral rows
over the same window and surfaces the gap: per-AI-surface comparison
(CR referer hits vs GA sessions), path-level join with crawled/clicked
verdicts, and crawler-only summary. Reuses the existing Cloud Run
puller, traffic classifier, and GA4 client; supports live (canonry
project lookup or manual service-account JSON) and offline fixture
modes for the replay loop before persistence/API/CLI surfaces land.
The hand-rolled per-event matching only checked the Referer header,
so UTM-only AI clicks (e.g. links from ChatGPT app that strip referer
but tag ?utm_source=chatgpt.com) showed up in the probe totals but
were missed by the source comparison and path join. On a 24h ainyc.ai
pull both ChatGPT clicks were UTM-only, so the comparison reported
0 Cloud Run hits vs 1 GA session — wrong direction.

Switch to classifyAiReferral / classifyCrawler from
@ainyc/canonry-integration-traffic, which already handles referer +
UTM and the path join now agrees with the probe totals. Source
comparison is now grouped by product (ChatGPT, Copilot, …) instead
of by rule so the chatgpt.com / chat.openai.com double-row goes away,
and the row exposes a referer/utm breakdown so it's obvious which
evidence channel produced the click.
@arberx arberx force-pushed the ax/cloud-run-traffic-puller branch from 2e7adcd to 94f9f63 Compare May 7, 2026 21:05
@arberx arberx merged commit 5d87d0c into main May 7, 2026
2 checks passed
@arberx arberx deleted the ax/cloud-run-traffic-puller branch May 7, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant