feat(traffic): persistence + sync API for Cloud Run sources (Phase 2 slice 1)#429
Merged
feat(traffic): persistence + sync API for Cloud Run sources (Phase 2 slice 1)#429
Conversation
…slice 1)
First slice of Phase 2 server-side traffic ingestion. Lands the schema, config
storage, API endpoints, ApiClient methods, and CLI commands needed to connect a
Cloud Run service-account credential and run a manual sync that writes hourly
crawler / AI-referral buckets and a bounded sample tail.
What ships:
- Four new tables (traffic_sources, crawler_events_hourly,
ai_referral_events_hourly, raw_event_samples) + migration v49. Composite PKs
on the rollup tables let repeat syncs upsert via `hits + ?`.
- `cloudRun:` connection block in `~/.canonry/config.yaml` with helpers in
`packages/canonry/src/cloud-run-config.ts` (mirrors the GA4 pattern).
- `getCloudLoggingAccessToken` / `refreshCloudLoggingAccessToken` in
`@ainyc/canonry-integration-cloud-run/src/auth.ts` so the route can mint a
Cloud Logging-scoped token from a service-account key.
- `POST /api/v1/projects/:name/traffic/connect/cloud-run` and
`POST /api/v1/projects/:name/traffic/sources/:id/sync` in
`packages/api-routes/src/traffic.ts`. Sync resolves credentials, calls
`listCloudRunTrafficEvents`, runs `buildTrafficProbeReport`, and upserts the
hourly buckets + samples in a single transaction. Both the pull function and
the access-token resolver are injectable so tests run without network.
- `RunKinds['traffic-sync']` so sync runs land in the existing `runs` table
and webhook/UI invalidation pipelines.
- ApiClient + CLI: `canonry traffic connect cloud-run` and
`canonry traffic sync`.
- Smoke script `scripts/smoke-traffic-sync.ts` that exercises the auth + pull +
classifier path against real Cloud Logging (SA key or gcloud token).
v1 covers service-account auth only. OAuth-mode sync, `traffic status` /
`traffic sources` / read endpoints, MCP toolkit, doctor checks, intelligence
correlations, and UI come in follow-up slices.
Smoke-checked against ainyc.ai over a 24h window (gcloud token): 2263
normalized events, 90 crawler hits across 8 bots (anthropic-claudebot,
ccbot, openai-{chatgpt-user,gptbot,searchbot}, bytespider, perplexity-bot,
meta-externalagent), 3 AI referrals (2 ChatGPT UTM, 1 Copilot referer) — same
shape as the Phase 1 probe over the same window.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e-counting Without this clamp, every sync recomputed the window as `now - sinceMinutes` to `now` and ignored `lastSyncedAt` / `lastCursor`. Combined with the `hits + bucket.hits` upsert on the composite-PK rollup tables, two back-to-back syncs of an event silently inflated the hourly bucket — a footgun once a cron schedule lands. Now `windowStart = max(now - sinceMinutes, lastSyncedAt)`, capped at `windowEnd` to defend against a future-dated lastSyncedAt. First sync behavior (lastSyncedAt is null) is unchanged. The test that previously asserted the buggy `hits = 2` after two identical syncs is replaced with one that asserts `hits = 1` plus `windows[1].startTime >= windows[0].endTime`. The harness mock now mirrors Cloud Logging's window filter so the assertion reflects production behavior, and the big sync test's events are made relative to `Date.now()` so they fall inside the requested window regardless of when the suite runs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
f1447d2 to
79969ec
Compare
arberx
added a commit
that referenced
this pull request
May 8, 2026
The "clamps windowStart to lastSyncedAt" test in traffic.test.ts shipped in PR #429 with a baseTime that snapped to the top of an hour and added 5 minutes. When CI ran more than ~5 minutes into the hour, that snap pushed observedAt outside the default 60-min sync window and the first sync returned 0 events instead of 1, failing the assertion. The hour-snap was only relevant for the OTHER test (which asserts two events accumulate into one hourly bucket). The clamp test doesn't need hour-aligned timestamps — only a time inside the default sync window. Replaces the snap with a fixed 10-min-ago offset so the test is deterministic regardless of when CI runs. The other tests that legitimately need hour-aligned timestamps already pair their snap with sinceMinutes:120 which gives enough slack to absorb minute-of-hour variance.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
traffic_sources+ 3 hourly/sample tables (migration v49) with composite PKs so repeat syncs upsert viahits + ?. AddscloudRun:connection block to~/.canonry/config.yaml,getCloudLoggingAccessToken(SA → JWT →logging.readtoken), andRunKinds['traffic-sync']so sync runs flow through the existing run/UI invalidation pipelines.POST /api/v1/projects/:name/traffic/connect/cloud-runandPOST /api/v1/projects/:name/traffic/sources/:id/syncpluscanonry traffic connect cloud-run/canonry traffic syncCLI; the route's Cloud Run pull and access-token resolver are injectable so tests run without network. v1 covers service-account auth only — OAuth-mode, read endpoints (/traffic/status/sources/crawlers/referrals/timeline), MCP toolkit, doctor checks, intelligence correlations, and UI come in follow-up slices.Test plan