Overview
The current RegexPHIEngine is English-only. Clinical AI systems in Europe and Latin America handle patient data in Spanish, French, German, and Portuguese. PHI patterns differ by locale (e.g., Spanish DNI vs US SSN, European date formats DD/MM/YYYY).
What to build
medguard/guardrails/phi_i18n.py — locale-specific PHI patterns
- Spanish: DNI
\d{8}[A-Z], NIE [XYZ]\d{7}[A-Z], date \d{2}/\d{2}/\d{4}
- French: INSEE (SS) number, RPPS doctor ID
- German: Krankenversichertennummer (KVNR)
LocalePHIEngine(locale="es") — wraps RegexPHIEngine with locale patterns
- Add
locale: str = "en" to PHIConfig
Files to create/modify
medguard/guardrails/phi_i18n.py — locale pattern definitions
medguard/guardrails/phi.py — LocalePHIEngine class
medguard/config.py — add locale to PHIConfig
tests/test_phi.py — parametrized tests for each locale
Acceptance criteria
Resources
Overview
The current
RegexPHIEngineis English-only. Clinical AI systems in Europe and Latin America handle patient data in Spanish, French, German, and Portuguese. PHI patterns differ by locale (e.g., Spanish DNI vs US SSN, European date formats DD/MM/YYYY).What to build
medguard/guardrails/phi_i18n.py— locale-specific PHI patterns\d{8}[A-Z], NIE[XYZ]\d{7}[A-Z], date\d{2}/\d{2}/\d{4}LocalePHIEngine(locale="es")— wrapsRegexPHIEnginewith locale patternslocale: str = "en"toPHIConfigFiles to create/modify
medguard/guardrails/phi_i18n.py— locale pattern definitionsmedguard/guardrails/phi.py—LocalePHIEngineclassmedguard/config.py— addlocaletoPHIConfigtests/test_phi.py— parametrized tests for each localeAcceptance criteria
12345678Zdetected and redacted1 84 12 76 451 099 52detectedPHIConfig(locale="es")selects Spanish patternsResources