feat: site cap (1000/section), HF source splits, live hero count by kishormorol · Pull Request #19 · kishormorol/ResearchScope

kishormorol · 2026-06-06T23:05:13Z

Three related site/data-exposure changes.

1. Website shows 3,000 papers (1,000 each)

The browse pages (Papers / Conferences / Journals) query Railway via _queryPapers, which reported json.total — so pagination spanned the full corpus (74k+ journals). Now it clamps the count to SECTION_CAP = 1000 and trims rows past the cap, so each section paginates through at most 1,000 (3,000 total). Static-JSON fallback clamped too. Full corpus stays available via the API + HF dataset.

2. HF dataset: separate arXiv / conference / journal

The papers config now also publishes per-source splits:

load_dataset("kishormorol/researchscope-papers", "papers", split="journal")  # or conference / arxiv

Uploaded as data/papers_{arxiv,conference,journal}.jsonl. The combined train split is kept for the existing downloaders. Card updated with per-source counts + usage.

3. Fix stale "83,000+" paper count on the homepage

The hero claimed "83,000+ papers" — long outdated (corpus is now 100K+; Railway holds ~165K). The hero count is now dynamic (injected from stats.json by loadStats(), rounded to a clean "N,000+"), so it won't go stale again. Remaining hardcoded "83K+" copy across the README, index feature card, and sign-in/register pages replaced with "100,000+".

Verification

node --check passes on railway-api.js and app.js.
HF _bucket unit-checked; card front-matter validates as YAML with splits [train, arxiv, conference, journal].
Full suite: 147 passed.
No 83K/83,000 references remain.

HF per-source splits + the live hero count populate on the next pipeline run that pushes data (any sync with HF_TOKEN).

🤖 Generated with Claude Code

Website: - Cap each browse section (arXiv / conference / journal) to 1,000 papers (3,000 total). _queryPapers now clamps the reported count to SECTION_CAP and trims rows past the cap, so pagination stops at 1,000 per section. The full corpus stays available via the API and the HF dataset; the static JSON was already capped at 1,000. HF dataset: - Split the `papers` config by source into `arxiv`, `conference`, and `journal` splits (uploaded as papers_<source>.jsonl), alongside the existing combined `train` split (kept for backward compatibility). Users can now `load_dataset(repo, "papers", split="journal")`. - Card updated: per-source counts in stats, file table, and usage examples. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The homepage hero claimed "83,000+ papers" — long stale (the corpus is now 100K+; the Railway API holds ~165K). Fixes: - Hero count is now dynamic: loadStats() injects the live total from stats.json, rounded down to a clean "N,000+", so it never goes stale again (static fallback "100,000+" before JS loads). - Replace remaining hardcoded "83K+"/"83,000+" copy with "100,000+" across README, the index feature card, and the sign-in/register pages. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

kishormorol and others added 2 commits June 6, 2026 19:04

kishormorol changed the title ~~feat: cap site to 1000/section + split HF dataset by source~~ feat: site cap (1000/section), HF source splits, live hero count Jun 6, 2026

kishormorol merged commit 5aa32f3 into main Jun 6, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: site cap (1000/section), HF source splits, live hero count#19

feat: site cap (1000/section), HF source splits, live hero count#19
kishormorol merged 2 commits into
mainfrom
feat/site-cap-and-hf-source-splits

kishormorol commented Jun 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kishormorol commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Website shows 3,000 papers (1,000 each)

2. HF dataset: separate arXiv / conference / journal

3. Fix stale "83,000+" paper count on the homepage

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kishormorol commented Jun 6, 2026 •

edited

Loading