Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

22 changes: 16 additions & 6 deletions docs/roadmap/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,22 @@ Build trusted, scalable AI capabilities that help people discover gospel content

### Content Discovery

| ID | Feature | Owner | Priority | Start | Days | Status |
| --------------------------------------------------------------------- | ------------------------------------- | ----- | -------- | ------ | ---- | ----------- |
| [feat-009](content-discovery/feat-009-pgvector-embedding-indexing.md) | pgvector Setup and Embedding Indexing | nisal | P0 | Apr 7 | 14 | not-started |
| [feat-010](content-discovery/feat-010-semantic-search-api.md) | Semantic Search API | nisal | P0 | Apr 14 | 21 | not-started |
| [feat-011](content-discovery/feat-011-search-ui-web.md) | Search UI — Web | urim | P0 | Apr 14 | 21 | not-started |
| [feat-012](content-discovery/feat-012-search-ui-mobile.md) | Search UI — Mobile | urim | P0 | Apr 14 | 21 | not-started |
| ID | Feature | Owner | Priority | Start | Days | Status |
| ------------------------------------------------------------------------- | ------------------------------------- | ----- | -------- | ------ | ---- | ----------- |
| [feat-009](content-discovery/feat-009-pgvector-embedding-indexing.md) | pgvector Setup and Embedding Indexing | nisal | P0 | Apr 7 | 14 | not-started |
| [feat-010](content-discovery/feat-010-semantic-search-api.md) | Semantic Search API | nisal | P0 | Apr 14 | 21 | not-started |
| [feat-011](content-discovery/feat-011-search-ui-web.md) | Search UI — Web | urim | P0 | Apr 14 | 21 | not-started |
| [feat-012](content-discovery/feat-012-search-ui-mobile.md) | Search UI — Mobile | urim | P0 | Apr 14 | 21 | not-started |
| [feat-037](content-discovery/feat-037-video-content-vectorization.md) | Video Content Vectorization for Recs | nisal | P1 | Apr 21 | 42 | not-started |
| [feat-038](content-discovery/feat-038-video-vectorization-data-audit.md) | Vectorization — Data Audit | nisal | P1 | Apr 21 | 3 | not-started |
| [feat-039](content-discovery/feat-039-chapter-based-scene-boundaries.md) | Vectorization — Scene Boundaries | nisal | P1 | Apr 24 | 7 | not-started |
| [feat-040](content-discovery/feat-040-multimodal-scene-descriptions.md) | Vectorization — Scene Descriptions | nisal | P1 | May 1 | 10 | not-started |
| [feat-041](content-discovery/feat-041-scene-embeddings-table.md) | Vectorization — Embeddings Table | nisal | P1 | May 11 | 7 | not-started |
| [feat-042](content-discovery/feat-042-backfill-worker.md) | Vectorization — English Backfill | nisal | P1 | May 18 | 10 | not-started |
| [feat-043](content-discovery/feat-043-visual-shot-detection-fusion.md) | Vectorization — Visual Shot Fusion | nisal | P2 | May 28 | 10 | not-started |
| [feat-044](content-discovery/feat-044-recommendation-query-api.md) | Vectorization — Recommendation API | nisal | P1 | May 28 | 7 | not-started |
| [feat-045](content-discovery/feat-045-pipeline-integration.md) | Vectorization — Pipeline Integration | nisal | P1 | Jun 4 | 7 | not-started |
| [feat-046](content-discovery/feat-046-recommendations-demo-experience.md) | Vectorization — Recommendations Demo | nisal | P1 | Jun 4 | 7 | not-started |

### Topic Experiences

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ depends_on:
- "feat-002"
blocks:
- "feat-010"
- "feat-037"
tags:
- "cms"
- "pgvector"
Expand Down
215 changes: 215 additions & 0 deletions docs/roadmap/content-discovery/feat-037-video-content-vectorization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
---
id: "feat-037"
title: "Video Content Vectorization for Recommendations"
owner: "nisal"
priority: "P1"
status: "not-started"
start_date: "2026-04-21"
duration: 42
depends_on:
- "feat-009"
- "feat-031"
blocks:
- "feat-038"
tags:
- "cms"
- "pgvector"
- "ai-pipeline"
- "search"
- "manager"
---

## Problem

Current recommendations are metadata-driven — "you watched Film X, here it is in 1,500 other languages." Transcript embeddings (feat-009/010) capture what was said, but miss what was shown. Visual scene embeddings enable cross-film recommendations based on visual setting, actions, emotional tone, and mood.

**Phase 1 (this feature)**: All English-language videos. Prove recommendation quality at ~$100-$300 estimated cost. Phase 2 (full 50K+ catalog) is a separate funding decision.

## Entry Points — Read These First

1. `apps/manager/src/services/chapters.ts` — existing scene-like segmentation: `Chapter { title, startSeconds, endSeconds, summary }`. This is the baseline for R1a.
2. `apps/manager/src/services/embeddings.ts` — existing text embedding pipeline using `text-embedding-3-small` (1536 dims). Scene descriptions will be embedded through the same model.
3. `apps/manager/src/workflows/videoEnrichment.ts` — enrichment workflow with parallel steps. R6 adds scene vectorization as a new branch.
4. `apps/manager/src/services/storage.ts` — S3 artifact storage pattern (`{assetId}/{type}.json`).
5. `apps/cms/src/api/video/content-types/video/schema.json` — Video content type with `coreId`, `label` enum, `variants` relation.
6. `apps/cms/src/api/video-variant/content-types/video-variant/schema.json` — VideoVariant with `language` and `muxVideo` relations.
7. `apps/cms/src/api/mux-video/content-types/mux-video/schema.json` — MuxVideo with `assetId` and `playbackId` for frame extraction.
8. `docs/brainstorms/2026-04-02-video-content-vectorization-requirements.md` — full requirements doc with storage schema, cost model, and rollout strategy.

## Grep These

- `chapters` in `apps/manager/src/` — existing chapter/scene segmentation
- `getOpenrouter` in `apps/manager/src/` — AI model client (text-only; needs multimodal extension)
- `text-embedding-3-small` in `apps/manager/src/` — embedding model
- `strapi.db.connection.raw` in `apps/cms/src/` — raw SQL patterns for pgvector
- `muxAssetId` in `apps/manager/src/` — Mux asset references for frame extraction
- `playbackId` in `apps/cms/src/` — Mux playback IDs for thumbnail URLs
- `label` in `apps/cms/src/api/video/` — video type enum (featureFilm, shortFilm, etc.)

## What To Build

### R0. Data Audit (first task)

Query CMS to determine English video landscape:

```sql
-- Video count by label type
SELECT label, COUNT(*) FROM videos GROUP BY label;

-- Duration distribution
SELECT label,
COUNT(*) as count,
AVG(duration) as avg_duration,
MAX(duration) as max_duration
FROM videos v
JOIN video_variants vv ON vv.video_id = v.id
JOIN languages l ON vv.language_id = l.id
WHERE l.bcp47 = 'en'
GROUP BY label;

-- Chapter metadata coverage
SELECT COUNT(DISTINCT ej.mux_asset_id)
FROM enrichment_jobs ej
WHERE ej.step_statuses->>'chapters' = 'completed';
```

### R1. Scene Segmentation

**R1a — Transcript-based (extend chapters.ts)**:

- For each English video, use existing chapter output as scene boundaries
- Short clips (single chapter) → treat as one scene
- Store chapter boundaries as scene candidates

**R1b — Visual fusion (feature films only)**:

- Extract frames at chapter boundaries using Mux thumbnail API: `https://image.mux.com/{PLAYBACK_ID}/thumbnail.jpg?time={SECONDS}`
- Feed frame sequences + transcript to multimodal LLM to refine/merge chapter boundaries into narrative scenes
- Research: evaluate PySceneDetect for shot boundary detection to augment

### R2. Scene Content Description

New service: `apps/manager/src/services/sceneDescription.ts`

```typescript
type SceneDescription = {
sceneIndex: number
startSeconds: number
endSeconds: number | null
description: string // LLM-generated rich description
chapterTitle: string | null
frameCount: number
}

export async function describeScene(
playbackId: string,
startSeconds: number,
endSeconds: number | null,
transcript: string,
chapterTitle: string | null,
): Promise<SceneDescription>
```

- Extract 3 representative frames via Mux thumbnail API at scene start, midpoint, and end
- Send frames + transcript chunk to multimodal LLM (Gemini 2.5 Flash via OpenRouter or direct API)
- Prompt: describe visual setting, objects, actions, characters, emotional tone, mood
- **Requires new multimodal client** — existing OpenRouter client is text-only

### R3. Scene Embedding + Storage

Create `scene_embeddings` table via bootstrap SQL (same pattern as feat-009):

```sql
CREATE TABLE IF NOT EXISTS scene_embeddings (
id SERIAL PRIMARY KEY,
video_id INTEGER NOT NULL,
core_id TEXT,
mux_asset_id TEXT NOT NULL,
playback_id TEXT NOT NULL,
scene_index INTEGER NOT NULL,
start_seconds FLOAT NOT NULL,
end_seconds FLOAT,
description TEXT NOT NULL,
chapter_title TEXT,
frame_count INTEGER,
embedding vector(1536) NOT NULL,
model TEXT NOT NULL DEFAULT 'text-embedding-3-small',
language TEXT NOT NULL DEFAULT 'en',
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(video_id, scene_index)
);

CREATE INDEX IF NOT EXISTS scene_embeddings_hnsw
ON scene_embeddings USING hnsw (embedding vector_cosine_ops);
CREATE INDEX IF NOT EXISTS scene_embeddings_video_id
ON scene_embeddings(video_id);
CREATE INDEX IF NOT EXISTS scene_embeddings_language
ON scene_embeddings(language);
```

Indexing service: `apps/cms/src/api/scene-embedding/services/indexer.ts`

```typescript
export async function indexSceneEmbeddings(
videoId: number,
scenes: SceneDescription[],
embeddings: number[][],
meta: {
coreId: string
muxAssetId: string
playbackId: string
language: string
},
): Promise<{ scenesIndexed: number }>
```

### R4. Cross-film Recommendation Query

```sql
SELECT se.video_id, se.scene_index, se.description, se.start_seconds,
1 - (se.embedding <=> $1) AS similarity
FROM scene_embeddings se
WHERE se.video_id != $2
AND se.language = 'en'
ORDER BY se.embedding <=> $1
LIMIT 10;
```

Expose as CMS service or API endpoint for web/mobile consumption.

### R5. Backfill Worker

Dedicated Railway service (or separate entry point in manager) for one-time English catalog processing:

- Queue-based: iterate English videos, process each through R1 → R2 → R3
- Resumable: track processed video IDs, skip on restart
- Cost controls: configurable batch size, rate limits, cost tracking per video, auto-pause at threshold
- Dry-run mode: estimate cost without LLM calls

### R6. Pipeline Integration

Add scene vectorization to `videoEnrichment.ts` as an independent branch:

- Runs after transcription completes (needs transcript)
- Also needs muxAssetId/playbackId (for frames) — different input than other parallel steps
- Triggers R1a → R2 → R3 for the new video

## Constraints

- **English only** — filter by language in all queries and processing. `language` column enables future expansion.
- **Separate table from `video_embeddings`** — different columns, different query patterns. Do not extend feat-009's table.
- **Do NOT use a Strapi content type** for scene embeddings — pgvector columns don't work with Strapi ORM. Use raw SQL (same pattern as feat-009).
- **Embed once per Video, not per VideoVariant** — language variants share visual content. Dedup by `video_id`.
- **Cost cap** — backfill worker must auto-pause if cumulative cost exceeds configurable threshold.
- **Mux thumbnail API** for frame extraction — do not download full videos. Confirm API supports arbitrary timestamps during planning.

## Verification

1. **Data audit complete**: know English video count by label, duration distribution, chapter coverage
2. **Scene segmentation**: sample 10 feature films, verify scene boundaries align with narrative scenes (not just shot cuts)
3. **Scene descriptions**: sample 20 scenes, verify descriptions capture visual content, not just transcript paraphrasing
4. **Embeddings indexed**: `SELECT COUNT(*) FROM scene_embeddings WHERE language = 'en'` matches expected scene count
5. **Recommendation quality**: for 50 seed videos, top-10 similar scenes include at least 3 relevant cross-film results for 80% of seeds
6. **Deduplication**: recommendations never surface the same video (different variant) as the input
7. **Cost tracking**: backfill worker logs cumulative cost, stays within budget
8. **Pipeline integration**: upload a new English video → scene embeddings appear in `scene_embeddings` table automatically
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
id: "feat-038"
title: "Video Vectorization — Data Audit"
owner: "nisal"
priority: "P1"
status: "not-started"
start_date: "2026-04-21"
duration: 3
depends_on:
- "feat-037"
blocks:
- "feat-039"
- "feat-042"
tags:
- "cms"
- "pgvector"
---

## Problem

Before building the scene vectorization pipeline, we need to know the shape of the English video catalog: how many videos by type, duration distribution, and existing chapter coverage. This gates all downstream sizing, cost estimates, and architecture decisions.

## Entry Points — Read These First

1. `apps/cms/src/api/video/content-types/video/schema.json` — Video schema with `label` enum
2. `apps/cms/src/api/video-variant/content-types/video-variant/schema.json` — VideoVariant with language relation
3. `apps/cms/src/api/enrichment-job/content-types/enrichment-job/schema.json` — tracks chapter completion status
4. `docs/brainstorms/2026-04-02-video-content-vectorization-requirements.md` — R0 requirements

## Grep These

- `label` in `apps/cms/src/api/video/` — video type enum values
- `bcp47` in `apps/cms/src/` — language code field for filtering English

## What To Build

Run diagnostic queries against the CMS database:

```sql
-- English video count by label
SELECT v.label, COUNT(*) as count
FROM videos v
JOIN video_variants vv ON vv.video_id = v.id
JOIN languages l ON vv.language_id = l.id
WHERE l.bcp47 = 'en'
GROUP BY v.label ORDER BY count DESC;

-- Duration distribution for English videos
SELECT v.label,
COUNT(*) as count,
ROUND(AVG(vv.duration)) as avg_duration_sec,
MAX(vv.duration) as max_duration_sec
FROM videos v
JOIN video_variants vv ON vv.video_id = v.id
JOIN languages l ON vv.language_id = l.id
WHERE l.bcp47 = 'en'
GROUP BY v.label;

-- Chapter metadata coverage
SELECT COUNT(DISTINCT ej.mux_asset_id)
FROM enrichment_jobs ej
WHERE ej.step_statuses->>'chapters' = 'completed';

-- Confirm Video → VideoVariant dedup model
SELECT v.id, COUNT(vv.id) as variant_count
FROM videos v
JOIN video_variants vv ON vv.video_id = v.id
GROUP BY v.id ORDER BY variant_count DESC LIMIT 10;
```

Deliverable: update the brainstorm doc cost model with actual numbers. Confirm or revise the ~$100-$300 Phase 1 estimate.

## Constraints

- Read-only queries — do not modify production data
- Use `strapi.db.connection.raw()` pattern or direct DB access

## Verification

- Know exact English video count by label type
- Know duration distribution (what % are short clips vs feature films)
- Know chapter coverage (what % already have scene-like metadata)
- Cost model in brainstorm doc updated with real numbers
Loading
Loading