Skip to content

Entity duplication in the de-anonymization process #1

@lamerentertainment

Description

@lamerentertainment

Thanks again for this astonishing release. I use it nearly every day heavily! My issue:

During anonymization, multi-word PII entities such as “Hans Müller” are replaced with multiple identical placeholders (e.g., two instances of 1_person). In the sidebar view, these placeholders span across occurrences of the same entity, being uniformly represented as 1_person (Hans Müller, “Will replace [1_person] → Hans Müller”). However, this placeholder duplication during anonymization introduces an inconsistency at the de-anonymization stage: each occurrence of the duplicated placeholder is independently restored to Hans Müller. As a result, the process yields duplicate mentions of the same entity, despite their original equivalence ("Hans Müller Hans Müller").

See pull request for a possible solution (add appendices to the placeholders marking the words)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions