Arbitrary HTML present after sanitization because of unicode normalization

Impact

If using keep_typographic_whitespace=False (which is the default), the sanitizer normalizes unicode to the NFKC form at the end. Some unicode characters normalize to chevrons; this allows specially crafted HTML to escape sanitization.

Patches

The problem has been fixed in 2.4.2.

Workarounds

Set keep_typographic_whitespace=True explicitly, or normalize to NFKC yourself earlier.

References

matthiask published to matthiask/html-sanitizer May 5, 2024

Published to the GitHub Advisory Database May 6, 2024

Reviewed May 6, 2024

Last updated May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package

Affected versions

Patched versions

Description

Impact

Patches

Workarounds

References

Severity

EPSS score

Exploit Prediction Scoring System (EPSS)

Weaknesses

CVE ID

GHSA ID

Source code

Credits