Skip to content

feat(api,cli,mcp): ranked + classified cited-source rankings (#675)#676

Open
arberx wants to merge 1 commit into
mainfrom
arberx/plan-issue-675
Open

feat(api,cli,mcp): ranked + classified cited-source rankings (#675)#676
arberx wants to merge 1 commit into
mainfrom
arberx/plan-issue-675

Conversation

@arberx
Copy link
Copy Markdown
Member

@arberx arberx commented Jun 2, 2026

Reworks GET /projects/:name/analytics/sources from a top-5-per-category breakdown into the full ranked, per-provider, classified cited-domain surface an operator acts on — exposed via a new canonry sources command and the canonry_analytics_sources MCP tool.

  • Surface-class taxonomy (own / direct-competitor / ota-aggregator / editorial-media / other) computed deterministically from already-stored project + competitor domains (zero LLM calls); major travel OTAs (Tripadvisor, Booking, Expedia, …) added to the category rules.
  • SourceBreakdownDto gains ranked + byProvider lists with an explicit long-tail rollup (truncated counts always reconcile to totals) and a full-scope surface-class roll-up; ?limit=N caps each list, probe runs stay excluded, and the route is now a typed response (loose response removed, SDK regenerated).
  • Tests assert the calculation invariants exactly (slot/domain sums reconcile, per-provider totals sum to overall, limit truncation, probe exclusion, empty / infra-only edge cases); MCP classification flipped deferred → included and version bumped to 4.70.0.

Reworks GET /projects/:name/analytics/sources from a top-5-per-category
breakdown into the full ranked, per-provider, classified cited-domain
surface an operator acts on.

- Contracts: new SurfaceClass taxonomy (own / direct-competitor /
  ota-aggregator / editorial-media / other) computed DETERMINISTICALLY from
  already-stored project + competitor domains — zero LLM calls.
  classifySurfaceFromCategory() is the core; classifySurface(uri) wraps it.
  SourceBreakdownDto gains ranked + byProvider RankedSourceList views with an
  explicit long-tail rollup (truncated counts always reconcile to totals) and a
  surface-class roll-up over the full scope. Major travel OTAs (Tripadvisor,
  Booking, Expedia, …) added to the category rules.
- API: ?limit=N caps each ranked / per-provider list (positive-int validated);
  probe runs stay excluded. Route now returns a typed SourceBreakdownDto (loose
  response removed) — SDK regenerated.
- CLI: new `canonry sources <project>` (--rank, --by-provider, --limit,
  --window, --format json|jsonl).
- MCP: canonry_analytics_sources (monitoring tier); classification flipped
  deferred → included.

Tests assert the calculation invariants exactly (slot/domain sums reconcile,
per-provider totals sum to overall, surface-class roll-up spans the full scope,
limit truncation, probe exclusion, empty / infra-only edge cases).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant