diff --git a/skills/huggingface-papers/SKILL.md b/skills/huggingface-papers/SKILL.md index e65fc90..5958f65 100644 --- a/skills/huggingface-papers/SKILL.md +++ b/skills/huggingface-papers/SKILL.md @@ -1,6 +1,6 @@ --- name: huggingface-papers -description: Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata such as authors, linked models/datasets/spaces, Github repo and project page. Use when the user shares a Hugging Face paper page URL, an arXiv URL or ID, or asks to summarize, explain, or analyze an AI research paper. +description: Look up, read, search, and list Hugging Face paper pages using the `hf papers` CLI or the papers REST API. Fetch paper content as markdown, get structured metadata (authors, linked models/datasets/spaces, Github repo, project page), search papers by keyword, and browse the daily papers feed. Use when the user shares a Hugging Face paper page URL, an arXiv URL or ID, or asks to summarize, explain, analyze, search, or list AI research papers. --- # Hugging Face Paper Pages @@ -13,7 +13,7 @@ Hugging Face Paper pages (hf.co/papers) is a platform built on top of arXiv (arx Whenever someone mentions a HF paper or arXiv abstract/PDF URL in a model card, dataset card or README of a Space repository, the paper will be automatically indexed. Note that not all papers indexed on Hugging Face are also submitted to daily papers. The latter is more a manner of promoting a research paper. Papers can only be submitted to daily papers up until 14 days after their publication date on arXiv. -The Hugging Face team has built an easy-to-use API to interact with paper pages. Content of the papers can be fetched as markdown, or structured metadata can be returned such as author names, linked models/datasets/spaces, linked Github repo and project page. +The Hugging Face team has built an easy-to-use API and CLI to interact with paper pages. Content of the papers can be fetched as markdown, or structured metadata can be returned such as author names, linked models/datasets/spaces, linked Github repo and project page. ## When to Use @@ -22,6 +22,7 @@ The Hugging Face team has built an easy-to-use API to interact with paper pages. - User shares an arXiv URL (e.g. `https://arxiv.org/abs/2602.08025` or `https://arxiv.org/pdf/2602.08025`) - User mentions a arXiv ID (e.g. `2602.08025`) - User asks you to summarize, explain, or analyze an AI research paper +- User asks to search, list, or browse AI research papers ## Parsing the paper ID @@ -36,142 +37,165 @@ It's recommended to parse the paper ID (arXiv ID) from whatever the user provide | `2602.08025v1` | `2602.08025v1` | | `2602.08025` | `2602.08025` | -This allows you to provide the paper ID into any of the hub API endpoints mentioned below. +This allows you to provide the paper ID into any of the CLI commands or API endpoints mentioned below. -### Fetch the paper page as markdown +## `hf papers` CLI (preferred) + +The `hf` CLI (part of the `huggingface_hub` package) provides a convenient way to interact with papers directly from the terminal. Prefer the CLI over raw API calls when possible. -The content of a paper can be fetched as markdown like so: +### Read a paper as markdown ```bash -curl -s "https://huggingface.co/papers/{PAPER_ID}.md" +hf papers read {PAPER_ID} ``` -This should return the Hugging Face paper page as markdown. This relies on the HTML version of the paper at https://arxiv.org/html/{PAPER_ID}. +This prints the full paper content as markdown to stdout. It relies on the HTML version of the paper at https://arxiv.org/html/{PAPER_ID}. There are 2 exceptions: - Not all arXiv papers have an HTML version. If the HTML version of the paper does not exist, then the content falls back to the HTML of the Hugging Face paper page. -- If it results in a 404, it means the paper is not yet indexed on hf.co/papers. See [Error handling](#error-handling) for info. +- If the paper is not indexed on hf.co/papers, the command exits with an error: `Paper '{PAPER_ID}' not found on the Hub.` -Alternatively, you can request markdown from the normal paper page URL, like so: +### Get structured metadata (JSON) ```bash -curl -s -H "Accept: text/markdown" "https://huggingface.co/papers/{PAPER_ID}" +hf papers info {PAPER_ID} ``` -### Paper Pages API Endpoints - -All endpoints use the base URL `https://huggingface.co`. +This prints structured JSON metadata that can include: -#### Get structured metadata +- authors (names and Hugging Face usernames, in case they have claimed the paper) +- media URLs (uploaded when submitting the paper to Daily Papers) +- summary (abstract) and AI-generated summary +- project page and GitHub repository +- organization and engagement metadata (number of upvotes) -Fetch the paper metadata as JSON using the Hugging Face REST API: +### Search papers ```bash -curl -s "https://huggingface.co/api/papers/{PAPER_ID}" +hf papers search "vision language" +hf papers search "attention mechanism" --limit 10 +hf papers search "diffusion" --format json ``` -This returns structured metadata that can include: +This performs hybrid semantic and full-text search over paper titles, authors, and content. -- authors (names and Hugging Face usernames, in case they have claimed the paper) -- media URLs (uploaded when submitting the paper to Daily Papers) -- summary (abstract) and AI-generated summary -- project page and GitHub repository -- organization and engagement metadata (number of upvotes) +Options: +- `--limit` (default 20): number of results +- `--format` (`table` or `json`): output format +- `--quiet`: only print paper IDs -To find models linked to the paper, use: +### List daily papers ```bash -curl https://huggingface.co/api/models?filter=arxiv:{PAPER_ID} +hf papers ls +hf papers ls --sort trending +hf papers ls --date 2025-01-23 +hf papers ls --date today +hf papers ls --week 2025-W09 +hf papers ls --month 2025-02 +hf papers ls --submitter akhaliq +hf papers ls --format json ``` -To find datasets linked to the paper, use: +Options: +- `--date`: date in ISO format (`YYYY-MM-DD`) or `today` +- `--week`: ISO week, e.g. `2025-W09` +- `--month`: month in ISO format, e.g. `2025-02` +- `--submitter`: filter by Hub username of the submitter +- `--sort`: `publishedAt` (default) or `trending` +- `--limit` (default 50): number of results +- `--format` (`table` or `json`): output format +- `--quiet`: only print paper IDs + +## Paper Pages REST API + +The REST API can be used as an alternative to the CLI, or for endpoints not yet covered by the CLI. All endpoints use the base URL `https://huggingface.co`. + +### Fetch the paper page as markdown ```bash -curl https://huggingface.co/api/datasets?filter=arxiv:{PAPER_ID} +curl -s "https://huggingface.co/papers/{PAPER_ID}.md" ``` -To find spaces linked to the paper, use: +Alternatively, you can request markdown from the normal paper page URL: ```bash -curl https://huggingface.co/api/spaces?filter=arxiv:{PAPER_ID} +curl -s -H "Accept: text/markdown" "https://huggingface.co/papers/{PAPER_ID}" ``` -#### Claim paper authorship +### Get structured metadata + +```bash +curl -s "https://huggingface.co/api/papers/{PAPER_ID}" +``` -Claim authorship of a paper for a Hugging Face user: +### Find linked models, datasets, and spaces ```bash -curl "https://huggingface.co/api/settings/papers/claim" \ - --request POST \ - --header "Content-Type: application/json" \ - --header "Authorization: Bearer $HF_TOKEN" \ - --data '{ - "paperId": "{PAPER_ID}", - "claimAuthorId": "{AUTHOR_ENTRY_ID}", - "targetUserId": "{USER_ID}" - }' +curl https://huggingface.co/api/models?filter=arxiv:{PAPER_ID} +curl https://huggingface.co/api/datasets?filter=arxiv:{PAPER_ID} +curl https://huggingface.co/api/spaces?filter=arxiv:{PAPER_ID} ``` -- Endpoint: `POST /api/settings/papers/claim` -- Body: - - `paperId` (string, required): arXiv paper identifier being claimed - - `claimAuthorId` (string): author entry on the paper being claimed, 24-char hex ID - - `targetUserId` (string): HF user who should receive the claim, 24-char hex ID -- Response: paper authorship claim result, including the claimed paper ID +### Search papers -#### Get daily papers +```bash +curl -s "https://huggingface.co/api/papers/search?q=vision+language&limit=20" +``` -Fetch the Daily Papers feed: +- Endpoint: `GET /api/papers/search` +- Query parameters: + - `q` (string): search query, max length 250 + - `limit` (integer): number of results, between 1 and 120 + +### Get daily papers ```bash -curl -s -H "Authorization: Bearer $HF_TOKEN" \ - "https://huggingface.co/api/daily_papers?p=0&limit=20&date=2017-07-21&sort=publishedAt" +curl -s "https://huggingface.co/api/daily_papers?limit=20&date=2025-01-23&sort=publishedAt" ``` - Endpoint: `GET /api/daily_papers` - Query parameters: - `p` (integer): page number - `limit` (integer): number of results, between 1 and 100 - - `date` (string): RFC 3339 full-date, for example `2017-07-21` - - `week` (string): ISO week, for example `2024-W03` - - `month` (string): month value, for example `2024-01` + - `date` (string): RFC 3339 full-date, for example `2025-01-23` + - `week` (string): ISO week, for example `2025-W09` + - `month` (string): month value, for example `2025-02` - `submitter` (string): filter by submitter - `sort` (enum): `publishedAt` or `trending` -- Response: list of daily papers - -#### List papers -List arXiv papers sorted by published date: +### List papers ```bash -curl -s -H "Authorization: Bearer $HF_TOKEN" \ - "https://huggingface.co/api/papers?cursor={CURSOR}&limit=20" +curl -s "https://huggingface.co/api/papers?cursor={CURSOR}&limit=20" ``` - Endpoint: `GET /api/papers` - Query parameters: - `cursor` (string): pagination cursor - `limit` (integer): number of results, between 1 and 100 -- Response: list of papers -#### Search papers - -Perform hybrid semantic and full-text search on papers: +### Claim paper authorship ```bash -curl -s -H "Authorization: Bearer $HF_TOKEN" \ - "https://huggingface.co/api/papers/search?q=vision+language&limit=20" +curl "https://huggingface.co/api/settings/papers/claim" \ + --request POST \ + --header "Content-Type: application/json" \ + --header "Authorization: Bearer $HF_TOKEN" \ + --data '{ + "paperId": "{PAPER_ID}", + "claimAuthorId": "{AUTHOR_ENTRY_ID}", + "targetUserId": "{USER_ID}" + }' ``` -This searches over the paper title, authors, and content. - -- Endpoint: `GET /api/papers/search` -- Query parameters: - - `q` (string): search query, max length 250 - - `limit` (integer): number of results, between 1 and 120 -- Response: matching papers +- Endpoint: `POST /api/settings/papers/claim` +- Body: + - `paperId` (string, required): arXiv paper identifier being claimed + - `claimAuthorId` (string): author entry on the paper being claimed, 24-char hex ID + - `targetUserId` (string): HF user who should receive the claim, 24-char hex ID -#### Index a paper +### Index a paper Insert a paper from arXiv by ID. If the paper is already indexed, only its authors can re-index it: @@ -189,9 +213,8 @@ curl "https://huggingface.co/api/papers/index" \ - Body: - `arxivId` (string, required): arXiv ID to index, for example `2301.00001` - Pattern: `^\d{4}\.\d{4,5}$` -- Response: empty JSON object on success -#### Update paper links +### Update paper links Update the project page, GitHub repository, or submitting organization for a paper. The requester must be the paper author, the Daily Papers submitter, or a papers admin: @@ -214,11 +237,11 @@ curl "https://huggingface.co/api/papers/{PAPER_OBJECT_ID}/links" \ - `githubRepo` (string, nullable): GitHub repository URL - `organizationId` (string, nullable): organization ID, 24-char hex ID - `projectPage` (string, nullable): project page URL -- Response: empty JSON object on success ## Error Handling -- **404 on `https://huggingface.co/papers/{PAPER_ID}` or `md` endpoint**: the paper is not indexed on Hugging Face paper pages yet. +- **CLI errors**: `hf papers info` and `hf papers read` print `Paper '{PAPER_ID}' not found on the Hub.` when the paper does not exist. +- **404 on `https://huggingface.co/papers/{PAPER_ID}` or `.md` endpoint**: the paper is not indexed on Hugging Face paper pages yet. - **404 on `/api/papers/{PAPER_ID}`**: the paper may not be indexed on Hugging Face paper pages yet. - **Paper ID not found**: verify the extracted arXiv ID, including any version suffix @@ -233,7 +256,8 @@ If the Hugging Face paper page does not contain enough detail for the user's que ## Notes -- No authentication is required for public paper pages. -- Write endpoints such as claim authorship, index paper, and update paper links require `Authorization: Bearer $HF_TOKEN`. -- Prefer the `.md` endpoint for reliable machine-readable output. -- Prefer `/api/papers/{PAPER_ID}` when you need structured JSON fields instead of page markdown. \ No newline at end of file +- Prefer the `hf papers` CLI for reading, searching, and listing papers. +- No authentication is required for public paper pages (CLI or REST API). +- Write endpoints such as claim authorship, index paper, and update paper links require `Authorization: Bearer $HF_TOKEN` (REST API only, not yet available in the CLI). +- Prefer `hf papers read {PAPER_ID}` or the `.md` endpoint for reliable machine-readable output. +- Prefer `hf papers info {PAPER_ID}` or `/api/papers/{PAPER_ID}` when you need structured JSON fields instead of page markdown. \ No newline at end of file