fix: cache poisoning when fonts collide on basename with different To…#596
fix: cache poisoning when fonts collide on basename with different To…#596RayVR wants to merge 2 commits into
Conversation
…Unicode streams Signed-off-by: Raymond Roberts <mail@rayvroberts.com>
|
Hi @RayVR — thanks for this, really clean fix. The root-cause analysis in #595 was spot-on, the subset-prefix detector matches PDF 32000-1 §9.6.4 exactly ( A few items before merge: Please address1. Legacy tests at 2. Silent semantic change at 3.
The cost of option (b) is at most one re-parse per such font. Minors
Out of scope for this PR — intentionally deferredPer spec §9.6.5 (Type 3,
Both are real and the same shape as your fix; please mention them in this PR's description under a "Follow-ups (not in this PR)" section so the scope boundary is explicit. After this lands I'd suggest #595 stays open until #597 and #598 are also resolved — or we close #595 and let #597/#598 carry the residual. Thanks again — careful spec work and a real regression test, exactly the contribution shape I love to see. |
…l-safe for missing BaseFont Fonts without a BaseFont key now default is_subset=true so they are never served from the global cross-document cache. The two combined- hash call sites now destructure the (hash, is_subset) tuple so only the hash participates in the page-level font-set fingerprint. Legacy tests updated to match the new return type. Doc comment clarified, CHANGELOG entry added.
Description
resolves issue #595
Type of Change
Related Issues
Fixes #595
Changes Made
font_identity_hash_cheap — Return type changed from u64 to (u64, bool). The bool indicates whether the font has a subset prefix (AAAAAA+). Subset fonts have document-specific ToUnicode mappings that are unsafe to share across documents.
load_fonts — The global cross-document font cache (Layer 6) is now skipped for subset fonts, both for lookup and insertion. The per-document cache (Layer 5) still works for subset fonts.
Tests — Updated 4 existing tests to destructure the new return type. Added 1 new test test_font_identity_hash_detects_subset_prefix that verifies AAAAAA+ArialUnicodeMS is detected as subset and ArialUnicodeMS is not.
Root cause: Two different PDFs with the same subset-prefixed font name (e.g., AAAAAA+ArialUnicodeMS) but different glyph subsets would share a cached FontInfo across documents. The first document's ToUnicode CMap would be used for the second document, mapping characters to wrong codepoints (e.g., ° → 日).
Testing
cargo test --all-featurescargo clippy -- -D warningscargo fmtPython Bindings (if applicable)
ruff formatruff checkDocumentation
Checklist
feat:,fix:,docs:)Screenshots (if applicable)
Additional Notes