feat(pipeline): add --journals-only mode + journal-sync workflow#18
Merged
Conversation
Adds a lightweight path to refresh the journal layer without re-running the full (slow) conference sync: - New `--journals-only` flag fetches only journal papers (OpenAlex by source id, +S2 supplement) and skips arXiv/ACL/conference sources. Existing conference + journal + arXiv papers are reloaded and preserved; Railway sync is an upsert, so nothing is dropped. - Extract the journal fetch into a reusable `_fetch_journal_papers()` helper shared by the conferences-only and journals-only paths. - New `journal-sync.yml` workflow (manual `workflow_dispatch`) runs the mode, reports per-venue counts, rebuilds the journal recommender, commits, and deploys Pages. Verified: pipeline imports clean; `--journals-only` flag parses; suite 147 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a fast path to refresh the journal layer without re-running the full conference sync.
Why
The journal fetch only lived inside
--conferences-only, which also re-fetches OpenReview/PMLR/CVF/ACL/S2 conferences (~30 min–2.6 h). To populate journals on demand we need a journals-only path.What
--journals-onlyflag — fetches only journal papers (OpenAlex by source id, +S2 supplement) and skips arXiv/ACL/conference sources. The accumulate step still reloads existing conference + journal + arXiv papers, and Railway sync is an upsert (ON CONFLICT DO UPDATE) — so existing rows are preserved, never dropped._fetch_journal_papers()helper — extracted from the conferences-only block and shared by both paths (DRY).journal-sync.ymlworkflow — manualworkflow_dispatch: runs the mode, reports per-venue counts, rebuilds the journal recommender JSON, commitssite/data/, and deploys Pages.Safety
conferences_db.json,papers_db.json) aren't emptied.Verification
--journals-onlyparses; full suite 147 passed.not journals_only.After merge I'll trigger this workflow to populate the ~25 journals (incl. the 6 added in #16) now, instead of waiting for the monthly conference sync.
🤖 Generated with Claude Code