-
Notifications
You must be signed in to change notification settings - Fork 607
Update Hugging Face Papers skill #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
NielsRogge
wants to merge
1
commit into
huggingface:main
Choose a base branch
from
NielsRogge:improve_papers_skill
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+104
−80
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| --- | ||
| name: huggingface-papers | ||
| description: Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata such as authors, linked models/datasets/spaces, Github repo and project page. Use when the user shares a Hugging Face paper page URL, an arXiv URL or ID, or asks to summarize, explain, or analyze an AI research paper. | ||
| description: Look up, read, search, and list Hugging Face paper pages using the `hf papers` CLI or the papers REST API. Fetch paper content as markdown, get structured metadata (authors, linked models/datasets/spaces, Github repo, project page), search papers by keyword, and browse the daily papers feed. Use when the user shares a Hugging Face paper page URL, an arXiv URL or ID, or asks to summarize, explain, analyze, search, or list AI research papers. | ||
| --- | ||
|
|
||
| # Hugging Face Paper Pages | ||
|
|
@@ -13,7 +13,7 @@ Hugging Face Paper pages (hf.co/papers) is a platform built on top of arXiv (arx | |
|
|
||
| Whenever someone mentions a HF paper or arXiv abstract/PDF URL in a model card, dataset card or README of a Space repository, the paper will be automatically indexed. Note that not all papers indexed on Hugging Face are also submitted to daily papers. The latter is more a manner of promoting a research paper. Papers can only be submitted to daily papers up until 14 days after their publication date on arXiv. | ||
|
|
||
| The Hugging Face team has built an easy-to-use API to interact with paper pages. Content of the papers can be fetched as markdown, or structured metadata can be returned such as author names, linked models/datasets/spaces, linked Github repo and project page. | ||
| The Hugging Face team has built an easy-to-use API and CLI to interact with paper pages. Content of the papers can be fetched as markdown, or structured metadata can be returned such as author names, linked models/datasets/spaces, linked Github repo and project page. | ||
|
|
||
| ## When to Use | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the agent decides whether to load the skill based only on the frontmatter description, so having this "When to Use" section can't influence triggering no? |
||
|
|
||
|
|
@@ -22,6 +22,7 @@ The Hugging Face team has built an easy-to-use API to interact with paper pages. | |
| - User shares an arXiv URL (e.g. `https://arxiv.org/abs/2602.08025` or `https://arxiv.org/pdf/2602.08025`) | ||
| - User mentions a arXiv ID (e.g. `2602.08025`) | ||
| - User asks you to summarize, explain, or analyze an AI research paper | ||
| - User asks to search, list, or browse AI research papers | ||
|
|
||
| ## Parsing the paper ID | ||
|
|
||
|
|
@@ -36,142 +37,165 @@ It's recommended to parse the paper ID (arXiv ID) from whatever the user provide | |
| | `2602.08025v1` | `2602.08025v1` | | ||
| | `2602.08025` | `2602.08025` | | ||
|
|
||
| This allows you to provide the paper ID into any of the hub API endpoints mentioned below. | ||
| This allows you to provide the paper ID into any of the CLI commands or API endpoints mentioned below. | ||
|
|
||
| ### Fetch the paper page as markdown | ||
| ## `hf papers` CLI (preferred) | ||
|
|
||
| The `hf` CLI (part of the `huggingface_hub` package) provides a convenient way to interact with papers directly from the terminal. Prefer the CLI over raw API calls when possible. | ||
|
|
||
| The content of a paper can be fetched as markdown like so: | ||
| ### Read a paper as markdown | ||
|
|
||
| ```bash | ||
| curl -s "https://huggingface.co/papers/{PAPER_ID}.md" | ||
| hf papers read {PAPER_ID} | ||
| ``` | ||
|
|
||
| This should return the Hugging Face paper page as markdown. This relies on the HTML version of the paper at https://arxiv.org/html/{PAPER_ID}. | ||
| This prints the full paper content as markdown to stdout. It relies on the HTML version of the paper at https://arxiv.org/html/{PAPER_ID}. | ||
|
|
||
| There are 2 exceptions: | ||
| - Not all arXiv papers have an HTML version. If the HTML version of the paper does not exist, then the content falls back to the HTML of the Hugging Face paper page. | ||
| - If it results in a 404, it means the paper is not yet indexed on hf.co/papers. See [Error handling](#error-handling) for info. | ||
| - If the paper is not indexed on hf.co/papers, the command exits with an error: `Paper '{PAPER_ID}' not found on the Hub.` | ||
|
|
||
| Alternatively, you can request markdown from the normal paper page URL, like so: | ||
| ### Get structured metadata (JSON) | ||
|
|
||
| ```bash | ||
| curl -s -H "Accept: text/markdown" "https://huggingface.co/papers/{PAPER_ID}" | ||
| hf papers info {PAPER_ID} | ||
| ``` | ||
|
|
||
| ### Paper Pages API Endpoints | ||
|
|
||
| All endpoints use the base URL `https://huggingface.co`. | ||
| This prints structured JSON metadata that can include: | ||
|
|
||
| #### Get structured metadata | ||
| - authors (names and Hugging Face usernames, in case they have claimed the paper) | ||
| - media URLs (uploaded when submitting the paper to Daily Papers) | ||
| - summary (abstract) and AI-generated summary | ||
| - project page and GitHub repository | ||
| - organization and engagement metadata (number of upvotes) | ||
|
|
||
| Fetch the paper metadata as JSON using the Hugging Face REST API: | ||
| ### Search papers | ||
|
|
||
| ```bash | ||
| curl -s "https://huggingface.co/api/papers/{PAPER_ID}" | ||
| hf papers search "vision language" | ||
| hf papers search "attention mechanism" --limit 10 | ||
| hf papers search "diffusion" --format json | ||
| ``` | ||
|
|
||
| This returns structured metadata that can include: | ||
| This performs hybrid semantic and full-text search over paper titles, authors, and content. | ||
|
|
||
| - authors (names and Hugging Face usernames, in case they have claimed the paper) | ||
| - media URLs (uploaded when submitting the paper to Daily Papers) | ||
| - summary (abstract) and AI-generated summary | ||
| - project page and GitHub repository | ||
| - organization and engagement metadata (number of upvotes) | ||
| Options: | ||
| - `--limit` (default 20): number of results | ||
| - `--format` (`table` or `json`): output format | ||
| - `--quiet`: only print paper IDs | ||
|
|
||
| To find models linked to the paper, use: | ||
| ### List daily papers | ||
|
|
||
| ```bash | ||
| curl https://huggingface.co/api/models?filter=arxiv:{PAPER_ID} | ||
| hf papers ls | ||
| hf papers ls --sort trending | ||
| hf papers ls --date 2025-01-23 | ||
| hf papers ls --date today | ||
| hf papers ls --week 2025-W09 | ||
| hf papers ls --month 2025-02 | ||
| hf papers ls --submitter akhaliq | ||
| hf papers ls --format json | ||
| ``` | ||
|
|
||
| To find datasets linked to the paper, use: | ||
| Options: | ||
| - `--date`: date in ISO format (`YYYY-MM-DD`) or `today` | ||
| - `--week`: ISO week, e.g. `2025-W09` | ||
| - `--month`: month in ISO format, e.g. `2025-02` | ||
| - `--submitter`: filter by Hub username of the submitter | ||
| - `--sort`: `publishedAt` (default) or `trending` | ||
| - `--limit` (default 50): number of results | ||
| - `--format` (`table` or `json`): output format | ||
| - `--quiet`: only print paper IDs | ||
|
|
||
| ## Paper Pages REST API | ||
|
|
||
| The REST API can be used as an alternative to the CLI, or for endpoints not yet covered by the CLI. All endpoints use the base URL `https://huggingface.co`. | ||
|
|
||
| ### Fetch the paper page as markdown | ||
|
|
||
| ```bash | ||
| curl https://huggingface.co/api/datasets?filter=arxiv:{PAPER_ID} | ||
| curl -s "https://huggingface.co/papers/{PAPER_ID}.md" | ||
| ``` | ||
|
|
||
| To find spaces linked to the paper, use: | ||
| Alternatively, you can request markdown from the normal paper page URL: | ||
|
|
||
| ```bash | ||
| curl https://huggingface.co/api/spaces?filter=arxiv:{PAPER_ID} | ||
| curl -s -H "Accept: text/markdown" "https://huggingface.co/papers/{PAPER_ID}" | ||
| ``` | ||
|
|
||
| #### Claim paper authorship | ||
| ### Get structured metadata | ||
|
|
||
| ```bash | ||
| curl -s "https://huggingface.co/api/papers/{PAPER_ID}" | ||
| ``` | ||
|
|
||
| Claim authorship of a paper for a Hugging Face user: | ||
| ### Find linked models, datasets, and spaces | ||
|
|
||
| ```bash | ||
| curl "https://huggingface.co/api/settings/papers/claim" \ | ||
| --request POST \ | ||
| --header "Content-Type: application/json" \ | ||
| --header "Authorization: Bearer $HF_TOKEN" \ | ||
| --data '{ | ||
| "paperId": "{PAPER_ID}", | ||
| "claimAuthorId": "{AUTHOR_ENTRY_ID}", | ||
| "targetUserId": "{USER_ID}" | ||
| }' | ||
| curl https://huggingface.co/api/models?filter=arxiv:{PAPER_ID} | ||
| curl https://huggingface.co/api/datasets?filter=arxiv:{PAPER_ID} | ||
| curl https://huggingface.co/api/spaces?filter=arxiv:{PAPER_ID} | ||
| ``` | ||
|
|
||
| - Endpoint: `POST /api/settings/papers/claim` | ||
| - Body: | ||
| - `paperId` (string, required): arXiv paper identifier being claimed | ||
| - `claimAuthorId` (string): author entry on the paper being claimed, 24-char hex ID | ||
| - `targetUserId` (string): HF user who should receive the claim, 24-char hex ID | ||
| - Response: paper authorship claim result, including the claimed paper ID | ||
| ### Search papers | ||
|
|
||
| #### Get daily papers | ||
| ```bash | ||
| curl -s "https://huggingface.co/api/papers/search?q=vision+language&limit=20" | ||
| ``` | ||
|
|
||
| Fetch the Daily Papers feed: | ||
| - Endpoint: `GET /api/papers/search` | ||
| - Query parameters: | ||
| - `q` (string): search query, max length 250 | ||
| - `limit` (integer): number of results, between 1 and 120 | ||
|
|
||
| ### Get daily papers | ||
|
|
||
| ```bash | ||
| curl -s -H "Authorization: Bearer $HF_TOKEN" \ | ||
| "https://huggingface.co/api/daily_papers?p=0&limit=20&date=2017-07-21&sort=publishedAt" | ||
| curl -s "https://huggingface.co/api/daily_papers?limit=20&date=2025-01-23&sort=publishedAt" | ||
| ``` | ||
|
|
||
| - Endpoint: `GET /api/daily_papers` | ||
| - Query parameters: | ||
| - `p` (integer): page number | ||
| - `limit` (integer): number of results, between 1 and 100 | ||
| - `date` (string): RFC 3339 full-date, for example `2017-07-21` | ||
| - `week` (string): ISO week, for example `2024-W03` | ||
| - `month` (string): month value, for example `2024-01` | ||
| - `date` (string): RFC 3339 full-date, for example `2025-01-23` | ||
| - `week` (string): ISO week, for example `2025-W09` | ||
| - `month` (string): month value, for example `2025-02` | ||
| - `submitter` (string): filter by submitter | ||
| - `sort` (enum): `publishedAt` or `trending` | ||
| - Response: list of daily papers | ||
|
|
||
| #### List papers | ||
|
|
||
| List arXiv papers sorted by published date: | ||
| ### List papers | ||
|
|
||
| ```bash | ||
| curl -s -H "Authorization: Bearer $HF_TOKEN" \ | ||
| "https://huggingface.co/api/papers?cursor={CURSOR}&limit=20" | ||
| curl -s "https://huggingface.co/api/papers?cursor={CURSOR}&limit=20" | ||
| ``` | ||
|
|
||
| - Endpoint: `GET /api/papers` | ||
| - Query parameters: | ||
| - `cursor` (string): pagination cursor | ||
| - `limit` (integer): number of results, between 1 and 100 | ||
| - Response: list of papers | ||
|
|
||
| #### Search papers | ||
|
|
||
| Perform hybrid semantic and full-text search on papers: | ||
| ### Claim paper authorship | ||
|
|
||
| ```bash | ||
| curl -s -H "Authorization: Bearer $HF_TOKEN" \ | ||
| "https://huggingface.co/api/papers/search?q=vision+language&limit=20" | ||
| curl "https://huggingface.co/api/settings/papers/claim" \ | ||
| --request POST \ | ||
| --header "Content-Type: application/json" \ | ||
| --header "Authorization: Bearer $HF_TOKEN" \ | ||
| --data '{ | ||
| "paperId": "{PAPER_ID}", | ||
| "claimAuthorId": "{AUTHOR_ENTRY_ID}", | ||
| "targetUserId": "{USER_ID}" | ||
| }' | ||
| ``` | ||
|
|
||
| This searches over the paper title, authors, and content. | ||
|
|
||
| - Endpoint: `GET /api/papers/search` | ||
| - Query parameters: | ||
| - `q` (string): search query, max length 250 | ||
| - `limit` (integer): number of results, between 1 and 120 | ||
| - Response: matching papers | ||
| - Endpoint: `POST /api/settings/papers/claim` | ||
| - Body: | ||
| - `paperId` (string, required): arXiv paper identifier being claimed | ||
| - `claimAuthorId` (string): author entry on the paper being claimed, 24-char hex ID | ||
| - `targetUserId` (string): HF user who should receive the claim, 24-char hex ID | ||
|
|
||
| #### Index a paper | ||
| ### Index a paper | ||
|
|
||
| Insert a paper from arXiv by ID. If the paper is already indexed, only its authors can re-index it: | ||
|
|
||
|
|
@@ -189,9 +213,8 @@ curl "https://huggingface.co/api/papers/index" \ | |
| - Body: | ||
| - `arxivId` (string, required): arXiv ID to index, for example `2301.00001` | ||
| - Pattern: `^\d{4}\.\d{4,5}$` | ||
| - Response: empty JSON object on success | ||
|
|
||
| #### Update paper links | ||
| ### Update paper links | ||
|
|
||
| Update the project page, GitHub repository, or submitting organization for a paper. The requester must be the paper author, the Daily Papers submitter, or a papers admin: | ||
|
|
||
|
|
@@ -214,11 +237,11 @@ curl "https://huggingface.co/api/papers/{PAPER_OBJECT_ID}/links" \ | |
| - `githubRepo` (string, nullable): GitHub repository URL | ||
| - `organizationId` (string, nullable): organization ID, 24-char hex ID | ||
| - `projectPage` (string, nullable): project page URL | ||
| - Response: empty JSON object on success | ||
|
|
||
| ## Error Handling | ||
|
|
||
| - **404 on `https://huggingface.co/papers/{PAPER_ID}` or `md` endpoint**: the paper is not indexed on Hugging Face paper pages yet. | ||
| - **CLI errors**: `hf papers info` and `hf papers read` print `Paper '{PAPER_ID}' not found on the Hub.` when the paper does not exist. | ||
| - **404 on `https://huggingface.co/papers/{PAPER_ID}` or `.md` endpoint**: the paper is not indexed on Hugging Face paper pages yet. | ||
| - **404 on `/api/papers/{PAPER_ID}`**: the paper may not be indexed on Hugging Face paper pages yet. | ||
| - **Paper ID not found**: verify the extracted arXiv ID, including any version suffix | ||
|
|
||
|
|
@@ -233,7 +256,8 @@ If the Hugging Face paper page does not contain enough detail for the user's que | |
|
|
||
| ## Notes | ||
|
|
||
| - No authentication is required for public paper pages. | ||
| - Write endpoints such as claim authorship, index paper, and update paper links require `Authorization: Bearer $HF_TOKEN`. | ||
| - Prefer the `.md` endpoint for reliable machine-readable output. | ||
| - Prefer `/api/papers/{PAPER_ID}` when you need structured JSON fields instead of page markdown. | ||
| - Prefer the `hf papers` CLI for reading, searching, and listing papers. | ||
| - No authentication is required for public paper pages (CLI or REST API). | ||
| - Write endpoints such as claim authorship, index paper, and update paper links require `Authorization: Bearer $HF_TOKEN` (REST API only, not yet available in the CLI). | ||
| - Prefer `hf papers read {PAPER_ID}` or the `.md` endpoint for reliable machine-readable output. | ||
| - Prefer `hf papers info {PAPER_ID}` or `/api/papers/{PAPER_ID}` when you need structured JSON fields instead of page markdown. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to add AI/ML so that the agent is more encouraged to use it without mention of HF papers.