Skip to content

UMLS / MedDRA term validation for hallucination detection #3

@sarvanithin

Description

@sarvanithin

Overview

The current SNOMED bundle has ~275 concepts. The UMLS (Unified Medical Language System) has 4M+ concepts across 200+ vocabularies including MedDRA, ICD-10, SNOMED-CT, RxNorm, and LOINC. Using UMLS would dramatically improve hallucination detection for unknown medical terms.

API

UMLS REST API: https://uts-ws.nlm.nih.gov/rest/
Requires a free UMLS API key (register at https://uts.nlm.nih.gov/uts/signup-login).

GET /rest/search/current?string=myocardial+infarction&apiKey=YOUR_KEY

What to build

  • medguard/knowledge/umls.pyUMLSClient
    • concept_exists(term) → bool (same interface as SNOMEDClient)
    • Cache results in ~/.medguard/cache/umls/
  • Drop-in replacement for SNOMEDClient in HallucinationDetector
  • Add umls_api_key_env: str = "UMLS_API_KEY" to HallucinationConfig

Files to modify

  • medguard/knowledge/umls.py — new file
  • medguard/guardrails/hallucination.py — use UMLS if configured
  • medguard/config.py — add umls_api_key_env to HallucinationConfig

Acceptance criteria

  • UMLSClient.concept_exists("myocardial infarction") returns True
  • UMLSClient.concept_exists("xyzfakdisease123") returns False
  • Falls back to bundled SNOMED if UMLS_API_KEY not set
  • Tests with respx mocks

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions