fix(writer): emit annotation /AP appearance streams as indirect objects#713
Conversation
yfedoseev
left a comment
There was a problem hiding this comment.
Review — spec-aligned, correct on both writer paths
Thanks @norbusan. Checked this against the PDF spec (docs/spec/pdf.md) and the writer internals — the fix is sound.
Spec grounding
- The rule this fixes: "All streams shall be indirect objects" (
pdf.md:1884). An inline/N << … >> stream … endstreamnested directly in an annotation's/APdict is malformed; readers (MuPDF/PyMuPDF/Acrobat) can reject or drop the appearance. ✅ /AP/N(required),/R,/D(optional), each a stream or appearance subdictionary of named states (Table 168,pdf.md:26327; subdictionary example at:26341) — the fix handles exactly these shapes.
Implementation
hoist_appearance_streams(src/writer/object_serializer.rs:422) walks/AP → {N,R,D}, replaces each directObject::Streamwith anObject::Referenceto a freshly-allocated indirect object (:431, viastd::mem::replaceso BBox/Resources/Type/Subtype are preserved), and recurses one level into appearance-state subdictionaries. No-op when there's nothing to hoist, so it's safe to call unconditionally.- Verified the only inline-AP construction site is the watermark builder (
src/writer/watermark.rs:368); both writer paths (builder + editor save-page) route through the hoist, and/Size/xref accounting is computed after hoisting, so new object ids and xref entries line up.
Minor, non-blocking notes (latent, not current defects):
/ASis required when/APcontains state subdictionaries (pdf.md:26084). Today's only caller uses a single-stream/N, so this is latent — but if a future caller builds a subdictionary/Nwithout setting/AS, the annotation is still under-specified. Worth a comment or follow-up if subdictionary APs become a real path.- The hoist recurses one level into subdictionaries — correct for the valid
/APshapes; just noting the depth assumption.
Not a code issue: the only red check is Set up Ruby 3.2 on windows-latest (the setup-ruby action failing to provision Ruby), not your change. A rebase onto main (which now has ruby/setup-ruby 1.312.0) should clear it.
LGTM on correctness.
Watermark annotations (and any annotation whose appearance is built as a
nested Object::Stream under /AP) were serialized with the stream inline
inside the annotation dictionary:
/AP <</N <</BBox [...]/Length n ...>> stream ... endstream>> ...
A PDF stream must be an indirect object (ISO 32000-1 §7.3.8); an inline
stream as a direct dict value is invalid. Compliant viewers (e.g. MuPDF)
reject the annotation with "invalid key in dict", so the watermark never
renders -- the file looked fine to byte-substring assertions but showed
nothing on the page.
Add a shared `hoist_appearance_streams` helper that lifts nested /N, /D
and /R streams (including named-state sub-dictionaries) into freshly
allocated indirect objects, rewriting the slot to a reference. Apply it
on both serialization paths: the document-builder writer (PdfWriter) and
the existing-page editor (DocumentEditor::save_page).
Verified with MuPDF: the watermark annotation now parses and renders on
both paths. Existing watermark tests only asserted byte presence; add
unit tests for the helper plus a builder regression test asserting the
appearance is an indirect reference, never an inline stream.
Signed-off-by: Norbert Preining <norbert@preining.info>
c25d34d to
a7306d1
Compare
|
Thanks @yfedoseev I have a follow-up PR that was the reason I found this issue, the PR exposes WatermarkAnnotation and rotate to the Python API, since this is what I need. The current watermark is too restrictive (font decided, no rotation, no grey leveling, etc). |
|
Thanks for the rebase — the Windows Ruby job is no longer failing and the rerun is green so far. I've also done a line-by-line check of the change against the spec (ISO 32000-1) and it holds up:
One latent note, non-blocking: Will approve and merge once the remaining checks finish. On the follow-up exposing |
Actually, I don't add a new API, just expose the existing one in Python. It seems not to be exposed there. I would need help how to do this for all the others, though. |
yfedoseev
left a comment
There was a problem hiding this comment.
Approve. Spec-correct, both writer paths covered, well-tested.
hoist_appearance_streamslifts nested/N,/R,/Dappearance streams (including named-state subdictionaries) into freshly-allocated indirect objects and rewrites the slots to references; wired into both the PdfWriter builder path and DocumentEditor::save_page.- pdf.md-aligned, textbook: §7.3.8.1 'All streams shall be indirect objects' (pdf.md:1884) — both halves satisfied (stream indirect, dict preserved). Table 168 /N/R/D shape matched; §8.10 form-XObject constraint respected.
- No dangling refs / double-writes; object-numbering and /Size derivation include hoisted ids on both paths. Unit + builder regression tests assert /N 0 R and never inline /N <<.
Non-blocking: /AS (required alongside state subdicts, Table 164) stays the annotation builder's responsibility — latent only, since the current watermark caller uses single-stream /N.
|
Thanks for the approval. Is there anything else I am supposed to do? |
…nce-indirect-stream # Conflicts: # CHANGELOG.md
Description
Watermark annotations (and any annotation whose appearance is built as a nested Object::Stream under /AP) were serialized with the stream inline inside the annotation dictionary:
A PDF stream must be an indirect object (ISO 32000-1 §7.3.8); an inline stream as a direct dict value is invalid. Compliant viewers (e.g. MuPDF) reject the annotation with "invalid key in dict", so the watermark never renders -- the file looked fine to byte-substring assertions but showed nothing on the page.
Add a shared
hoist_appearance_streamshelper that lifts nested /N, /D and /R streams (including named-state sub-dictionaries) into freshly allocated indirect objects, rewriting the slot to a reference. Apply it on both serialization paths: the document-builder writer (PdfWriter) and the existing-page editor (DocumentEditor::save_page).Verified with MuPDF: the watermark annotation now parses and renders on both paths. Existing watermark tests only asserted byte presence; add unit tests for the helper plus a builder regression test asserting the appearance is an indirect reference, never an inline stream.
Type of Change
Testing
cargo test --all-featurescargo clippy -- -D warningscargo fmtPython Bindings (if applicable)
ruff formatruff checkDocumentation
Checklist
feat:,fix:,docs:)