PIIgent combines two types of recognizers:
Pattern-Based Recognizers — High precision on structured German identifiers. Each recognizer targets one entity type with format validation:
| Recognizer | Entity Type | Validation |
|---|---|---|
DeKvnrRecognizer |
DE_KVNR (health insurance ID) | Luhn checksum |
DeLanrRecognizer |
DE_LANR (physician ID) | KBV checksum |
DeBsnrRecognizer |
DE_BSNR (facility ID) | KV code validation |
DeTelematikIdRecognizer |
DE_TELEMATIK_ID | Format validation |
DePersonalIdRecognizer |
DE_PERSONAL_ID | Weighted checksum |
DeTaxIdRecognizer |
DE_TAX_ID | ISO 7064 |
DeSocialSecurityRecognizer |
DE_SOCIAL_SECURITY | Checksum algorithm |
DePostalCodeRecognizer |
DE_POSTAL_CODE | Range validation |
LLM-Based Recognizer — Contextual detection of entities that lack fixed patterns:
| Recognizer | Entity Types | Notes |
|---|---|---|
OllamaNERecognizer |
PERSON, LOCATION, DATE_TIME, ORGANIZATION, etc. | Via Ollama (Ministral) |
The system includes agents that generate natural language explanations for failures:
- RationaleAgent — Explains why an entity was tagged or missed
- ErrorTaxonomyAgent — Classifies errors into systematic categories
- FixProposalAgent — Proposes specific changes based on error patterns
- VerificationAgent — Tests whether proposed fixes improve performance
A genetic approach to prompt optimization—treating prompts as genotypes that evolve through mutation, crossover, and fitness selection:
@dataclass
class PromptGenotype:
instruction_block: str
entity_definitions: Dict[str, str]
examples: List[PromptExample]
constraints: List[str]
# Lineage tracking
parent_ids: List[str]
generation: int
mutation_history: List[str]
fitness_scores: Dict[str, float]Mutation operators: ADD_EXAMPLE, SWAP_EXAMPLE, ADD_CONSTRAINT, EMPHASIZE_ENTITY, ADD_PATTERN
Structured difficulty progression:
@dataclass
class DifficultyDimensions:
entity_count: int # 1-5: Entities per document
entity_variety: int # 1-5: Different entity types
format_variation: int # 1-5: Non-standard formats
overlap_frequency: int # 1-5: Overlapping entities
context_ambiguity: int # 1-5: Ambiguous contextsFailure-weighted sampling oversamples regions where the system fails.
PIIgent uses a grammar-aware synthetic engine (SynPII) for adversarial testing and validation.
SynPII provides an AdversarialGenerator that uses specific Adversarial Scenarios (generative recipes) to "stress-test" the detection flow. This allows the PIIgent agents to request targeted data for identified failure modes:
- OverlapScenario: Places entities in close proximity to trigger resolution errors (e.g., PLZ inside a LOCATION).
- FormatScenario: Randomizes separators and casing to test pattern robustness.
- ContextScenario: Wraps entities in ambiguous phrases to test LLM contextual reasoning.
To prevent the Prompt Evolution system from "cheating" by memorizing synthetic patterns, the LeakageChecker monitors prompts for:
- Template Artifacts: Detecting internal placeholder formats (e.g.,
{{PLACEHOLDER}}). - Synthetic Regularity: Identifying overly consistent naming patterns (e.g.,
Person_1). - Memorization Gap: Comparing performance on synthetic vs. novel/human-audited samples.
Beyond aggregate F1, PIIgent tracks:
| Category | Metrics |
|---|---|
| Span Matching | Exact P/R/F1, Partial P/R/F1 (50% overlap threshold) |
| Type Matching | Type-only P/R/F1 (ignore boundary errors) |
| Boundary Analysis | Off-by-one errors, partial span rate |
| Calibration | Expected Calibration Error (ECE), overconfidence rate |
| Per-Entity-Type | Confusion matrix, entity-specific F2 |
| Multi-Recognizer | Agreement rate, disagreement analysis |
| Generalization | Gap between synthetic validation and novel test sets |