Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions plans/content-brief-synthesis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Content Briefs: Winnability Gate and LLM Brief Synthesis

## Context

canonry already surfaces content opportunities deterministically. `@ainyc/canonry-intelligence` builds three row sets from a single `OrchestratorInput`:

- `buildContentTargetRows` (`packages/intelligence/src/content-targets.ts:107`): per-query targets with `demandSource`, a `ContentAction`, action confidence, and the winning competitor.
- `buildContentGapRows`: queries only competitors are cited for.
- `buildContentSourceRows`: the cited-source breakdown.

A narrow LLM layer already exists on top. `POST /projects/:name/content/recommendations/:targetRef/analyze` (`packages/api-routes/src/content.ts`) calls an injected `ExplainContentRecommendationFn`, implemented by `createRecommendationExplainer` (`packages/canonry/src/agent/recommendation-explainer.ts`). It uses pi-ai `complete()` at the `analyze` capability tier, is multi-provider, and caches by `(projectId, targetRef, promptVersion)` in `recommendation_explanations` (`packages/db/src/schema.ts:830`). Its system prompt explains, in 3 to 5 bullets and under 600 characters, why a single target matters.

Two gaps separate that output from a content brief an operator can act on:

1. **No winnability signal.** A target whose cited surface is owned by aggregators or editorial media (a head term the site cannot realistically win with its own content) is presented the same as a differentiated query the site can own. Operators make that call by hand today. The signal already exists but is siloed: discovery classifies every recurring cited domain as `direct-competitor | ota-aggregator | editorial-media | other | unknown` (prompt at `packages/canonry/src/discovery-run.ts:281`; OTAs and booking or review platforms are explicitly `ota-aggregator`, listicles and blogs `editorial-media`). content-targets only consumes `competitorDomains`, so it knows "a competitor is cited" but not "the surface is an aggregator we should not chase."
2. **Explain, not brief.** The LLM layer explains why a target matters. It does not synthesize the brief: the angle, the why-winnable rationale, and the schema or markup hookup.

Net: the judgment "defend the ownable queries, do not chase the ceded head terms" plus the brief itself happen manually, outside the tool.

## Goal

Two surgical extensions that keep determinism deciding **what** and the LLM deciding only **how to write**:

1. A deterministic `surfaceClass` (ownable vs ceded) on every content target, reusing the discovery classifier. No new LLM calls.
2. A `brief` mode on the existing content explainer that synthesizes a structured brief, reusing the same provider plumbing, capability tier, and prompt-version cache, gated to ownable targets.

## Non-goals

- No new provider or LLM plumbing. Reuse the `recommendation-explainer.ts` pattern.
- The LLM never invents targets. It only renders briefs for targets the deterministic layer already surfaced and gated.
- No content is published. The output is a brief; a human acts on it.

## Plan

### Step 1: Make the domain classification queryable per domain

The classifier result is written into the discovery session `competitor_map` (`packages/api-routes/src/discovery/orchestrate.ts`), keyed to a session, not to a `(project, domain)` lookup. content-targets runs on every report and sweep and cannot run a discovery probe, so it needs a cheap per-domain lookup. Two options:

- **1a (no migration):** read the latest `completed` discovery session for the project and index its `competitor_map` by domain. Simplest, but the gate only works once discovery has run, and it reflects the last session's view.
- **1b (small migration, preferred):** add a `domain_classifications` table keyed by `(projectId, domain)` carrying the latest `competitorType` plus provenance (`sessionId`, `classifiedAt`). Upsert it when a discovery session completes in `orchestrate.ts`. Decoupled from session retention and cheap to join.

Recommend 1b. Either way, treat missing or stale classifications as `unknown` (see the Step 2 fail-open rule).

### Step 2: surfaceClass winnability gate on content targets (deterministic, the moat)

- `packages/intelligence/src/content-targets.ts`: in `buildContentTargetRows`, look up the class of the domains actually cited for each query (from Step 1) and derive:
- cited surface dominated by `ota-aggregator` or `editorial-media` gives `surfaceClass: 'ceded'`.
- cited surface that is `direct-competitor`, the own domain, `other`, `unknown`, or has no citation gives `surfaceClass: 'ownable'` (fail open: when in doubt, it is worth a brief).
- "Dominated" means the combined aggregator and editorial share of the cited domains for that query crosses a documented threshold. Start conservative, for example a majority, and unit-test the boundary.
- Add `surfaceClass: 'ownable' | 'ceded'` (and optionally a numeric `winnability`) to `ContentTargetRowDto` (`packages/contracts/src/content.ts`).
- `GET /projects/:name/content/targets` gains an optional `surfaceClass` filter; default ordering surfaces `ownable` first.
- This is a pure data join over existing inputs. No LLM, no new external calls.

### Step 3: brief mode on the content explainer (LLM synthesis, reuse plumbing)

- `packages/canonry/src/agent/recommendation-explainer.ts`: add a brief template and a structured-output path beside the existing explainer. Keep the same `complete()` call, `analyze` tier, provider and api-key resolution, and cache-key shape. Add a separate `RECOMMENDATION_BRIEF_PROMPT_VERSION` so the two modes cache independently.
- The brief prompt returns structure, not prose: `targetQuery`, `surfaceClass`, `angle`, `whyWinnable` (must cite the gap signal and surfaceClass verbatim from context), `schemaHookup` (the schema.org type or markup to add or extend), and the controllable-surface rationale.
- `packages/api-routes/src/content.ts`: add `mode: 'explain' | 'brief'` to `recommendationExplainRequestSchema` and branch in the route, or add a sibling `POST .../:targetRef/brief`.
- Enforce the gate server-side: reject `brief` for a `ceded` target with a clear 4xx, so a brief is never generated for a head term we should not chase.
- Persist the structured brief plus provider, model, and cost, mirroring `recommendation_explanations`. Either add a `mode` discriminator to that table or add `recommendation_briefs`.

### Step 4: CLI and MCP surface

There is no `cnry content` command group today; targets are API, web, report, and MCP only. Add:

- `cnry content targets <project> [--ownable] [--format json]`: deterministic targets with surfaceClass.
- `cnry content brief <project> [--target <ref>] [--all-ownable] [--format json]`: generate or fetch briefs for ownable targets.
- `cnry content map <project>`: convenience command that prints ranked ownable targets, each with its brief. The operator-facing one-shot.
- Register an MCP tool (`packages/canonry/src/mcp/tool-registry.ts`) so agents can call it headless.
- The report content section (`packages/api-routes/src/report.ts`) can graduate from the templated recommendation string to the brief when one exists.

### Step 5: Tests and docs

- Unit tests for the surfaceClass rule: ownable vs ceded across aggregator-dominated, editorial-dominated, competitor-cited, own-cited, and no-citation queries, plus the threshold boundary.
- Contract test for the gated brief route: ceded returns 4xx, ownable returns a cached structured brief.
- Explainer test: the brief prompt-version cache is isolated from explain mode.
- Docs: a content section under `docs/` and the CLI reference, noting the determinism-decides-what, LLM-decides-how split.

## Risks and open questions

- **Classifier coverage.** The gate is only as good as the classifications available. A project that never ran discovery has none, so every target stays `ownable`. That is acceptable: the gate adds signal where it exists and never hides a target. Document that running discovery improves the gate.
- **Freshness.** A persisted classification can lag a domain's real role. A provenance timestamp plus a refresh on each discovery completion bounds this; treat stale as `unknown` and fail open.
- **Threshold tuning.** "Dominated" is a judgment knob. Start conservative, unit-test the boundary, document the default, and consider exposing it in config later.
- **Brief grounding.** Brief quality depends on the context fields fed in. Reuse the explainer's verbatim-signal discipline: cite the gap, do not invent facts.
- **Cost.** One `analyze`-tier call per ownable target on refresh. The existing prompt-version cache already bounds repeat cost, and the gate further reduces calls by excluding ceded targets.

## Why this shape

Determinism decides which queries are worth writing for, from real citation evidence and the existing classifier, so the LLM cannot drift into generic suggestions. The LLM only renders a brief for a target the deterministic layer already surfaced and gated. The work reuses two systems that already exist (the discovery classifier and the content explainer), so it is an extension rather than a new subsystem, which keeps the blast radius small.