feat(traffic): persistence + sync API for Cloud Run sources (Phase 2 slice 1) by arberx · Pull Request #429 · AINYC/canonry

arberx · 2026-05-08T00:13:18Z

Summary

First slice of Phase 2 server-side traffic ingestion: schema, config storage, sync API, CLI commands.
Adds traffic_sources + 3 hourly/sample tables (migration v49) with composite PKs so repeat syncs upsert via hits + ?. Adds cloudRun: connection block to ~/.canonry/config.yaml, getCloudLoggingAccessToken (SA → JWT → logging.read token), and RunKinds['traffic-sync'] so sync runs flow through the existing run/UI invalidation pipelines.
Ships POST /api/v1/projects/:name/traffic/connect/cloud-run and POST /api/v1/projects/:name/traffic/sources/:id/sync plus canonry traffic connect cloud-run / canonry traffic sync CLI; the route's Cloud Run pull and access-token resolver are injectable so tests run without network. v1 covers service-account auth only — OAuth-mode, read endpoints (/traffic/status / sources / crawlers / referrals / timeline), MCP toolkit, doctor checks, intelligence correlations, and UI come in follow-up slices.
Smoke-checked against ainyc.ai over a 24h window via gcloud token: 2263 normalized events, 90 crawler hits across 8 bots (anthropic-claudebot, ccbot, openai-{chatgpt-user,gptbot,searchbot}, bytespider, perplexity-bot, meta-externalagent), 3 AI referrals (2 ChatGPT UTM, 1 Copilot referer) — same shape as the Phase 1 probe. Versions bumped to 4.12.0.

Test plan

`pnpm typecheck` — all 22 workspace packages pass
`pnpm lint` — clean across the workspace
`pnpm test` — 2117 / 2117 pass (204 test files), including 6 new DB schema tests, 8 new API route tests, 6 new CLI command tests, 5 new config-helper tests
`bash scripts/check-docs.sh` — green
Live smoke against ainyc.ai (`scripts/smoke-traffic-sync.ts --use-gcloud`) — totals match the Phase 1 probe over the same window

…slice 1) First slice of Phase 2 server-side traffic ingestion. Lands the schema, config storage, API endpoints, ApiClient methods, and CLI commands needed to connect a Cloud Run service-account credential and run a manual sync that writes hourly crawler / AI-referral buckets and a bounded sample tail. What ships: - Four new tables (traffic_sources, crawler_events_hourly, ai_referral_events_hourly, raw_event_samples) + migration v49. Composite PKs on the rollup tables let repeat syncs upsert via `hits + ?`. - `cloudRun:` connection block in `~/.canonry/config.yaml` with helpers in `packages/canonry/src/cloud-run-config.ts` (mirrors the GA4 pattern). - `getCloudLoggingAccessToken` / `refreshCloudLoggingAccessToken` in `@ainyc/canonry-integration-cloud-run/src/auth.ts` so the route can mint a Cloud Logging-scoped token from a service-account key. - `POST /api/v1/projects/:name/traffic/connect/cloud-run` and `POST /api/v1/projects/:name/traffic/sources/:id/sync` in `packages/api-routes/src/traffic.ts`. Sync resolves credentials, calls `listCloudRunTrafficEvents`, runs `buildTrafficProbeReport`, and upserts the hourly buckets + samples in a single transaction. Both the pull function and the access-token resolver are injectable so tests run without network. - `RunKinds['traffic-sync']` so sync runs land in the existing `runs` table and webhook/UI invalidation pipelines. - ApiClient + CLI: `canonry traffic connect cloud-run` and `canonry traffic sync`. - Smoke script `scripts/smoke-traffic-sync.ts` that exercises the auth + pull + classifier path against real Cloud Logging (SA key or gcloud token). v1 covers service-account auth only. OAuth-mode sync, `traffic status` / `traffic sources` / read endpoints, MCP toolkit, doctor checks, intelligence correlations, and UI come in follow-up slices. Smoke-checked against ainyc.ai over a 24h window (gcloud token): 2263 normalized events, 90 crawler hits across 8 bots (anthropic-claudebot, ccbot, openai-{chatgpt-user,gptbot,searchbot}, bytespider, perplexity-bot, meta-externalagent), 3 AI referrals (2 ChatGPT UTM, 1 Copilot referer) — same shape as the Phase 1 probe over the same window. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…e-counting Without this clamp, every sync recomputed the window as `now - sinceMinutes` to `now` and ignored `lastSyncedAt` / `lastCursor`. Combined with the `hits + bucket.hits` upsert on the composite-PK rollup tables, two back-to-back syncs of an event silently inflated the hourly bucket — a footgun once a cron schedule lands. Now `windowStart = max(now - sinceMinutes, lastSyncedAt)`, capped at `windowEnd` to defend against a future-dated lastSyncedAt. First sync behavior (lastSyncedAt is null) is unchanged. The test that previously asserted the buggy `hits = 2` after two identical syncs is replaced with one that asserts `hits = 1` plus `windows[1].startTime >= windows[0].endTime`. The harness mock now mirrors Cloud Logging's window filter so the assertion reflects production behavior, and the big sync test's events are made relative to `Date.now()` so they fall inside the requested window regardless of when the suite runs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The "clamps windowStart to lastSyncedAt" test in traffic.test.ts shipped in PR #429 with a baseTime that snapped to the top of an hour and added 5 minutes. When CI ran more than ~5 minutes into the hour, that snap pushed observedAt outside the default 60-min sync window and the first sync returned 0 events instead of 1, failing the assertion. The hour-snap was only relevant for the OTHER test (which asserts two events accumulate into one hourly bucket). The clamp test doesn't need hour-aligned timestamps — only a time inside the default sync window. Replaces the snap with a fixed 10-min-ago offset so the test is deterministic regardless of when CI runs. The other tests that legitimately need hour-aligned timestamps already pair their snap with sinceMinutes:120 which gives enough slack to absorb minute-of-hour variance.

arberx and others added 2 commits May 7, 2026 20:36

arberx force-pushed the arberx/traffic-phase-2-cloud-run-sync branch from f1447d2 to 79969ec Compare May 8, 2026 00:36

arberx merged commit 924adec into main May 8, 2026
2 checks passed

arberx deleted the arberx/traffic-phase-2-cloud-run-sync branch May 8, 2026 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(traffic): persistence + sync API for Cloud Run sources (Phase 2 slice 1)#429

feat(traffic): persistence + sync API for Cloud Run sources (Phase 2 slice 1)#429
arberx merged 2 commits intomainfrom
arberx/traffic-phase-2-cloud-run-sync

arberx commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arberx commented May 8, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant