Skip to content

feat(traffic): persistence + sync API for Cloud Run sources (Phase 2 slice 1)#429

Merged
arberx merged 2 commits intomainfrom
arberx/traffic-phase-2-cloud-run-sync
May 8, 2026
Merged

feat(traffic): persistence + sync API for Cloud Run sources (Phase 2 slice 1)#429
arberx merged 2 commits intomainfrom
arberx/traffic-phase-2-cloud-run-sync

Conversation

@arberx
Copy link
Copy Markdown
Member

@arberx arberx commented May 8, 2026

Summary

  • First slice of Phase 2 server-side traffic ingestion: schema, config storage, sync API, CLI commands.
  • Adds traffic_sources + 3 hourly/sample tables (migration v49) with composite PKs so repeat syncs upsert via hits + ?. Adds cloudRun: connection block to ~/.canonry/config.yaml, getCloudLoggingAccessToken (SA → JWT → logging.read token), and RunKinds['traffic-sync'] so sync runs flow through the existing run/UI invalidation pipelines.
  • Ships POST /api/v1/projects/:name/traffic/connect/cloud-run and POST /api/v1/projects/:name/traffic/sources/:id/sync plus canonry traffic connect cloud-run / canonry traffic sync CLI; the route's Cloud Run pull and access-token resolver are injectable so tests run without network. v1 covers service-account auth only — OAuth-mode, read endpoints (/traffic/status / sources / crawlers / referrals / timeline), MCP toolkit, doctor checks, intelligence correlations, and UI come in follow-up slices.
  • Smoke-checked against ainyc.ai over a 24h window via gcloud token: 2263 normalized events, 90 crawler hits across 8 bots (anthropic-claudebot, ccbot, openai-{chatgpt-user,gptbot,searchbot}, bytespider, perplexity-bot, meta-externalagent), 3 AI referrals (2 ChatGPT UTM, 1 Copilot referer) — same shape as the Phase 1 probe. Versions bumped to 4.12.0.

Test plan

  • `pnpm typecheck` — all 22 workspace packages pass
  • `pnpm lint` — clean across the workspace
  • `pnpm test` — 2117 / 2117 pass (204 test files), including 6 new DB schema tests, 8 new API route tests, 6 new CLI command tests, 5 new config-helper tests
  • `bash scripts/check-docs.sh` — green
  • Live smoke against ainyc.ai (`scripts/smoke-traffic-sync.ts --use-gcloud`) — totals match the Phase 1 probe over the same window

arberx and others added 2 commits May 7, 2026 20:36
…slice 1)

First slice of Phase 2 server-side traffic ingestion. Lands the schema, config
storage, API endpoints, ApiClient methods, and CLI commands needed to connect a
Cloud Run service-account credential and run a manual sync that writes hourly
crawler / AI-referral buckets and a bounded sample tail.

What ships:
- Four new tables (traffic_sources, crawler_events_hourly,
  ai_referral_events_hourly, raw_event_samples) + migration v49. Composite PKs
  on the rollup tables let repeat syncs upsert via `hits + ?`.
- `cloudRun:` connection block in `~/.canonry/config.yaml` with helpers in
  `packages/canonry/src/cloud-run-config.ts` (mirrors the GA4 pattern).
- `getCloudLoggingAccessToken` / `refreshCloudLoggingAccessToken` in
  `@ainyc/canonry-integration-cloud-run/src/auth.ts` so the route can mint a
  Cloud Logging-scoped token from a service-account key.
- `POST /api/v1/projects/:name/traffic/connect/cloud-run` and
  `POST /api/v1/projects/:name/traffic/sources/:id/sync` in
  `packages/api-routes/src/traffic.ts`. Sync resolves credentials, calls
  `listCloudRunTrafficEvents`, runs `buildTrafficProbeReport`, and upserts the
  hourly buckets + samples in a single transaction. Both the pull function and
  the access-token resolver are injectable so tests run without network.
- `RunKinds['traffic-sync']` so sync runs land in the existing `runs` table
  and webhook/UI invalidation pipelines.
- ApiClient + CLI: `canonry traffic connect cloud-run` and
  `canonry traffic sync`.
- Smoke script `scripts/smoke-traffic-sync.ts` that exercises the auth + pull +
  classifier path against real Cloud Logging (SA key or gcloud token).

v1 covers service-account auth only. OAuth-mode sync, `traffic status` /
`traffic sources` / read endpoints, MCP toolkit, doctor checks, intelligence
correlations, and UI come in follow-up slices.

Smoke-checked against ainyc.ai over a 24h window (gcloud token): 2263
normalized events, 90 crawler hits across 8 bots (anthropic-claudebot,
ccbot, openai-{chatgpt-user,gptbot,searchbot}, bytespider, perplexity-bot,
meta-externalagent), 3 AI referrals (2 ChatGPT UTM, 1 Copilot referer) — same
shape as the Phase 1 probe over the same window.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e-counting

Without this clamp, every sync recomputed the window as `now - sinceMinutes`
to `now` and ignored `lastSyncedAt` / `lastCursor`. Combined with the
`hits + bucket.hits` upsert on the composite-PK rollup tables, two
back-to-back syncs of an event silently inflated the hourly bucket — a
footgun once a cron schedule lands.

Now `windowStart = max(now - sinceMinutes, lastSyncedAt)`, capped at
`windowEnd` to defend against a future-dated lastSyncedAt. First sync
behavior (lastSyncedAt is null) is unchanged.

The test that previously asserted the buggy `hits = 2` after two identical
syncs is replaced with one that asserts `hits = 1` plus
`windows[1].startTime >= windows[0].endTime`. The harness mock now mirrors
Cloud Logging's window filter so the assertion reflects production
behavior, and the big sync test's events are made relative to `Date.now()`
so they fall inside the requested window regardless of when the suite
runs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@arberx arberx force-pushed the arberx/traffic-phase-2-cloud-run-sync branch from f1447d2 to 79969ec Compare May 8, 2026 00:36
@arberx arberx merged commit 924adec into main May 8, 2026
2 checks passed
@arberx arberx deleted the arberx/traffic-phase-2-cloud-run-sync branch May 8, 2026 00:49
arberx added a commit that referenced this pull request May 8, 2026
The "clamps windowStart to lastSyncedAt" test in traffic.test.ts shipped
in PR #429 with a baseTime that snapped to the top of an hour and added
5 minutes. When CI ran more than ~5 minutes into the hour, that snap
pushed observedAt outside the default 60-min sync window and the first
sync returned 0 events instead of 1, failing the assertion.

The hour-snap was only relevant for the OTHER test (which asserts two
events accumulate into one hourly bucket). The clamp test doesn't need
hour-aligned timestamps — only a time inside the default sync window.

Replaces the snap with a fixed 10-min-ago offset so the test is
deterministic regardless of when CI runs. The other tests that legitimately
need hour-aligned timestamps already pair their snap with sinceMinutes:120
which gives enough slack to absorb minute-of-hour variance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant