Skip to content

fix: add DOI to prefix_mapping in _get_not_found_ids#115

Closed
lior-airis wants to merge 1 commit intodanielnsilva:masterfrom
lior-airis:fix/doi-not-found-false-positive
Closed

fix: add DOI to prefix_mapping in _get_not_found_ids#115
lior-airis wants to merge 1 commit intodanielnsilva:masterfrom
lior-airis:fix/doi-not-found-false-positive

Conversation

@lior-airis
Copy link

@lior-airis lior-airis commented Feb 24, 2026

Summary

  • _get_not_found_ids was missing DOI from its prefix_mapping dict, causing false "IDs not found" warnings for every DOI-prefixed lookup (e.g., DOI:10.1145/792550.792552)
  • The bare DOI value was added to found_ids without the DOI: prefix, so it never matched the prefixed input ID
  • Added 'DOI': 'DOI' to the mapping
  • Changed matching logic to always add bare external ID values alongside prefixed forms, so both DOI:10.1145/... and 10.1145/... inputs match correctly (preserving backward compatibility with bare DOI inputs used in existing tests)
  • Added two unit tests (sync + async)

Reproduction

from semanticscholar import SemanticScholar
sch = SemanticScholar()
# This returns the paper correctly, but also logs a false warning:
# WARNING:semanticscholar:IDs not found: ['DOI:10.1145/792550.792552']
papers = sch.get_papers(['DOI:10.1145/792550.792552'])

The paper data IS returned, but the spurious warning is confusing and the return_not_found=True parameter falsely includes DOI-prefixed IDs in the not-found list.

Test plan

  • Added test_get_papers_doi_prefix_not_false_positive (sync)
  • Added test_get_papers_doi_prefix_not_false_positive_async (async)
  • Existing tests pass (bare DOI inputs like 10.2139/ssrn.2250500 still match correctly)

The `_get_not_found_ids` method was missing DOI from its
`prefix_mapping` dict. When a paper was looked up using a
DOI-prefixed ID (e.g., `DOI:10.1145/792550.792552`), the method
would add only the bare DOI value to `found_ids` (without the
`DOI:` prefix), causing the input ID to never match. This resulted
in a false "IDs not found" warning for every DOI-prefixed lookup,
even when the paper was successfully returned.

Also changed the matching logic to always add bare external ID
values alongside prefixed forms, so both `DOI:10.1145/...` and
`10.1145/...` inputs match correctly. This is consistent with
how the Semantic Scholar API accepts both forms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lior-airis lior-airis force-pushed the fix/doi-not-found-false-positive branch from bfe2822 to ef052b3 Compare February 25, 2026 19:26
@lior-airis
Copy link
Author

Closing in favor of #116 which fixes the Python 3.14 nest_asyncio incompatibility first. Will rebase the DOI prefix fix on top of that and reopen as a separate PR.

@lior-airis lior-airis closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant