Skip to content

Arbitrary HTML present after sanitization because of unicode normalization

High severity GitHub Reviewed Published May 5, 2024 in matthiask/html-sanitizer • Updated May 6, 2024

Package

pip html-sanitizer (pip)

Affected versions

< 2.4.2

Patched versions

2.4.2

Description

Impact

If using keep_typographic_whitespace=False (which is the default), the sanitizer normalizes unicode to the NFKC form at the end. Some unicode characters normalize to chevrons; this allows specially crafted HTML to escape sanitization.

Patches

The problem has been fixed in 2.4.2.

Workarounds

Set keep_typographic_whitespace=True explicitly, or normalize to NFKC yourself earlier.

References

@matthiask matthiask published to matthiask/html-sanitizer May 5, 2024
Published to the GitHub Advisory Database May 6, 2024
Reviewed May 6, 2024
Last updated May 6, 2024

Severity

High

EPSS score

Exploit Prediction Scoring System (EPSS)

This score estimates the probability of this vulnerability being exploited within the next 30 days. Data provided by FIRST.
(11th percentile)

Weaknesses

No CWEs

CVE ID

CVE-2024-34078

GHSA ID

GHSA-wvhx-q427-fgh3

Credits

Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.