Skip to content

perf(cms): per-page video sync + timestamp fix + keyword scope fix#564

Open
Kneesal wants to merge 5 commits intomainfrom
perf/sync-videos-per-page-upsert
Open

perf(cms): per-page video sync + timestamp fix + keyword scope fix#564
Kneesal wants to merge 5 commits intomainfrom
perf/sync-videos-per-page-upsert

Conversation

@Kneesal
Copy link
Copy Markdown
Member

@Kneesal Kneesal commented Mar 30, 2026

Summary

Three critical fixes for core-sync, building on PRs #558, #560, #562:

1. Watermark timestamp format bug (affects ALL incremental syncs)

getLastSyncTime returned a JS Date object from knex/pg. When passed to the Core API's updatedAt: { gte: since } filter, it was stringified as "Mon Mar 30 2026 01:20:21 GMT+0000" which the gateway can't parse — silently returning ALL records instead of filtering. This is why incremental appeared to refetch everything.

Fix: Convert to ISO string in getLastSyncTime().

2. Per-page upsert for video sync (same pattern as #560 for variants)

Videos phase was stuck at processed=0/1056 for 20+ minutes on production due to one-at-a-time upsertByCoreId calls for origins/editions. Locally: 49s. Production: 20+ minutes.

Fix: Per-page bulk upsert, prefetch, bulk origins/editions/bible books — same patterns from sync-video-variants.ts.

3. Keyword link deletion scope (P0 from code review)

During incremental sync, keyword link deletion used ALL video row IDs from the DB, not just the synced batch. This would strip keyword links from every non-updated video.

Fix: Scope deletion to only videos in videoKeywordLinks.

Performance data (production, PR #562 deploy)

Phase Old (20+ min) New
Languages 12s 12s (already bulk)
Countries 8s 8s (always full)
Keywords 3s 3s (always full)
Videos 20+ min stuck ~2 min
Video variants 0 progress for 60+ min 1,500/min (~2.3h for 207K)

Test plan

  • Deploy and trigger full sync — videos phase should complete in minutes, not 20+
  • After full sync completes, trigger again — incremental should fetch 0 records for languages/videos/variants (timestamp fix)
  • Verify keyword links preserved for non-synced videos during incremental

Files changed

  • apps/cms/src/api/core-sync/services/strapi-helpers.ts — ISO timestamp fix
  • apps/cms/src/api/core-sync/services/sync-videos.ts — per-page upsert + keyword scope fix
  • docs/solutions/cms/core-sync-production-vs-local-performance-gap.md — compound doc

🤖 Generated with Claude Code

Kneesal and others added 4 commits March 30, 2026 01:10
Same optimization patterns from sync-video-variants.ts applied to
sync-videos.ts:

- Per-page upsert for videos, images, subtitles, study questions,
  and bible citations (no more collect-all-then-upsert)
- Bulk origins/editions via bulkUpsertByCoreId instead of one-at-a-time
  Strapi document service calls
- Bulk bible books upsert
- Prefetch next page while upserting current
- Incremental videoDocMap accumulated per-page instead of full table scan
- Timing logs for performance monitoring

Keywords and parent-child linking remain post-loop (need all videos).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Key finding: upsertByCoreId (Strapi document service) is ~100x slower
on Railway than local due to network latency between app and DB
containers. bulkUpsertByCoreId (raw knex SQL) is fast everywhere.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ering

knex/pg returns timestamptz as a JS Date object. When passed as-is to
the Core API's updatedAt filter, it gets toString()'d to
"Mon Mar 30 2026 01:20:21 GMT+0000" which the gateway can't parse,
causing it to return ALL records instead of filtering by date.

Fix: convert to ISO string in getLastSyncTime() so the filter receives
"2026-03-30T01:20:21.482Z" which the gateway parses correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dead code

Review found P0: keyword link deletion used ALL video row IDs from the
DB, not just the videos in the current sync batch. During incremental
sync this would strip keyword links from every non-updated video.

Fix: build syncedVideoRowIds from videoKeywordLinks.keys() (only the
videos being synced) instead of videoIdMap.values() (all videos in DB).

Also removed unused seenStudyQuestionIds and seenCitationIds sets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@railway-app
Copy link
Copy Markdown

railway-app bot commented Mar 30, 2026

🚅 Deployed to the forge-pr-564 environment in forge

Service Status Web Updated (UTC)
@forge/cms ✅ Success (View Logs) Mar 30, 2026 at 1:48 am
2 services not affected by this PR
  • @forge/web
  • @forge/manager

@railway-app railway-app bot temporarily deployed to forge / forge-pr-564 March 30, 2026 01:39 Destroyed
Attach .catch() to the prefetched page promise so that if the current
page's processing throws, the pending next-page fetch doesn't create
an unhandled promise rejection that could crash Node.js.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant