Skip to content

feat(pipeline): add --journals-only mode + journal-sync workflow#18

Merged
kishormorol merged 1 commit into
mainfrom
feat/journals-only-sync
Jun 6, 2026
Merged

feat(pipeline): add --journals-only mode + journal-sync workflow#18
kishormorol merged 1 commit into
mainfrom
feat/journals-only-sync

Conversation

@kishormorol

Copy link
Copy Markdown
Owner

Adds a fast path to refresh the journal layer without re-running the full conference sync.

Why

The journal fetch only lived inside --conferences-only, which also re-fetches OpenReview/PMLR/CVF/ACL/S2 conferences (~30 min–2.6 h). To populate journals on demand we need a journals-only path.

What

  • --journals-only flag — fetches only journal papers (OpenAlex by source id, +S2 supplement) and skips arXiv/ACL/conference sources. The accumulate step still reloads existing conference + journal + arXiv papers, and Railway sync is an upsert (ON CONFLICT DO UPDATE) — so existing rows are preserved, never dropped.
  • _fetch_journal_papers() helper — extracted from the conferences-only block and shared by both paths (DRY).
  • journal-sync.yml workflow — manual workflow_dispatch: runs the mode, reports per-venue counts, rebuilds the journal recommender JSON, commits site/data/, and deploys Pages.

Safety

  • Journals-only loads existing conference/arXiv DBs before writing, so the site JSON outputs (conferences_db.json, papers_db.json) aren't emptied.
  • Railway upsert means the journal sync only adds/updates journal rows.

Verification

  • Pipeline imports clean; --journals-only parses; full suite 147 passed.
  • All fetch blocks correctly gated behind not journals_only.

After merge I'll trigger this workflow to populate the ~25 journals (incl. the 6 added in #16) now, instead of waiting for the monthly conference sync.

🤖 Generated with Claude Code

Adds a lightweight path to refresh the journal layer without re-running the
full (slow) conference sync:

- New `--journals-only` flag fetches only journal papers (OpenAlex by source
  id, +S2 supplement) and skips arXiv/ACL/conference sources. Existing
  conference + journal + arXiv papers are reloaded and preserved; Railway
  sync is an upsert, so nothing is dropped.
- Extract the journal fetch into a reusable `_fetch_journal_papers()` helper
  shared by the conferences-only and journals-only paths.
- New `journal-sync.yml` workflow (manual `workflow_dispatch`) runs the mode,
  reports per-venue counts, rebuilds the journal recommender, commits, and
  deploys Pages.

Verified: pipeline imports clean; `--journals-only` flag parses; suite 147 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kishormorol kishormorol merged commit f2f9467 into main Jun 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant