Skip to content

fix(writer): emit annotation /AP appearance streams as indirect objects#713

Merged
yfedoseev merged 3 commits into
yfedoseev:mainfrom
norbusan:fix/watermark-appearance-indirect-stream
Jun 11, 2026
Merged

fix(writer): emit annotation /AP appearance streams as indirect objects#713
yfedoseev merged 3 commits into
yfedoseev:mainfrom
norbusan:fix/watermark-appearance-indirect-stream

Conversation

@norbusan

@norbusan norbusan commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

Watermark annotations (and any annotation whose appearance is built as a nested Object::Stream under /AP) were serialized with the stream inline inside the annotation dictionary:

/AP <</N <</BBox [...]/Length n ...>> stream ... endstream>> ...

A PDF stream must be an indirect object (ISO 32000-1 §7.3.8); an inline stream as a direct dict value is invalid. Compliant viewers (e.g. MuPDF) reject the annotation with "invalid key in dict", so the watermark never renders -- the file looked fine to byte-substring assertions but showed nothing on the page.

Add a shared hoist_appearance_streams helper that lifts nested /N, /D and /R streams (including named-state sub-dictionaries) into freshly allocated indirect objects, rewriting the slot to a reference. Apply it on both serialization paths: the document-builder writer (PdfWriter) and the existing-page editor (DocumentEditor::save_page).

Verified with MuPDF: the watermark annotation now parses and renders on both paths. Existing watermark tests only asserted byte presence; add unit tests for the helper plus a builder regression test asserting the appearance is an indirect reference, never an inline stream.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Tests
  • CI/CD changes

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • All new and existing tests pass locally
  • I have run cargo test --all-features
  • I have run cargo clippy -- -D warnings
  • I have run cargo fmt

Python Bindings (if applicable)

  • Python bindings updated (if needed)
  • Python tests pass
  • Python code formatted with ruff format
  • Python code linted with ruff check

Documentation

  • I have updated the documentation (README, docs/, code comments)
  • I have added/updated examples (if applicable)
  • I have updated CHANGELOG.md

Checklist

  • My code follows the project's coding guidelines (see CONTRIBUTING.md)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have checked my code and corrected any misspellings
  • The PR title follows conventional commits format (e.g., feat:, fix:, docs:)

@norbusan norbusan requested a review from yfedoseev as a code owner June 9, 2026 07:03

@yfedoseev yfedoseev left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — spec-aligned, correct on both writer paths

Thanks @norbusan. Checked this against the PDF spec (docs/spec/pdf.md) and the writer internals — the fix is sound.

Spec grounding

  • The rule this fixes: "All streams shall be indirect objects" (pdf.md:1884). An inline /N << … >> stream … endstream nested directly in an annotation's /AP dict is malformed; readers (MuPDF/PyMuPDF/Acrobat) can reject or drop the appearance. ✅
  • /AP /N (required), /R, /D (optional), each a stream or appearance subdictionary of named states (Table 168, pdf.md:26327; subdictionary example at :26341) — the fix handles exactly these shapes.

Implementation

  • hoist_appearance_streams (src/writer/object_serializer.rs:422) walks /AP → {N,R,D}, replaces each direct Object::Stream with an Object::Reference to a freshly-allocated indirect object (:431, via std::mem::replace so BBox/Resources/Type/Subtype are preserved), and recurses one level into appearance-state subdictionaries. No-op when there's nothing to hoist, so it's safe to call unconditionally.
  • Verified the only inline-AP construction site is the watermark builder (src/writer/watermark.rs:368); both writer paths (builder + editor save-page) route through the hoist, and /Size/xref accounting is computed after hoisting, so new object ids and xref entries line up.

Minor, non-blocking notes (latent, not current defects):

  1. /AS is required when /AP contains state subdictionaries (pdf.md:26084). Today's only caller uses a single-stream /N, so this is latent — but if a future caller builds a subdictionary /N without setting /AS, the annotation is still under-specified. Worth a comment or follow-up if subdictionary APs become a real path.
  2. The hoist recurses one level into subdictionaries — correct for the valid /AP shapes; just noting the depth assumption.

Not a code issue: the only red check is Set up Ruby 3.2 on windows-latest (the setup-ruby action failing to provision Ruby), not your change. A rebase onto main (which now has ruby/setup-ruby 1.312.0) should clear it.

LGTM on correctness.

Watermark annotations (and any annotation whose appearance is built as a
nested Object::Stream under /AP) were serialized with the stream inline
inside the annotation dictionary:

    /AP <</N <</BBox [...]/Length n ...>> stream ... endstream>> ...

A PDF stream must be an indirect object (ISO 32000-1 §7.3.8); an inline
stream as a direct dict value is invalid. Compliant viewers (e.g. MuPDF)
reject the annotation with "invalid key in dict", so the watermark never
renders -- the file looked fine to byte-substring assertions but showed
nothing on the page.

Add a shared `hoist_appearance_streams` helper that lifts nested /N, /D
and /R streams (including named-state sub-dictionaries) into freshly
allocated indirect objects, rewriting the slot to a reference. Apply it
on both serialization paths: the document-builder writer (PdfWriter) and
the existing-page editor (DocumentEditor::save_page).

Verified with MuPDF: the watermark annotation now parses and renders on
both paths. Existing watermark tests only asserted byte presence; add
unit tests for the helper plus a builder regression test asserting the
appearance is an indirect reference, never an inline stream.

Signed-off-by: Norbert Preining <norbert@preining.info>
@norbusan norbusan force-pushed the fix/watermark-appearance-indirect-stream branch from c25d34d to a7306d1 Compare June 10, 2026 05:57
@norbusan

Copy link
Copy Markdown
Contributor Author

Thanks @yfedoseev
I have rebased on main and hope it will now ci test fine.

I have a follow-up PR that was the reason I found this issue, the PR exposes WatermarkAnnotation and rotate to the Python API, since this is what I need. The current watermark is too restrictive (font decided, no rotation, no grey leveling, etc).

@yfedoseev

Copy link
Copy Markdown
Owner

Thanks for the rebase — the Windows Ruby job is no longer failing and the rerun is green so far. I've also done a line-by-line check of the change against the spec (ISO 32000-1) and it holds up:

  • The hoist satisfies both halves of §7.3.8 — "All streams shall be indirect objects and the stream dictionary shall be a direct object": nested streams become references to freshly-allocated indirect objects, and each stream's dict moves out intact (mem::replace), so /Type /XObject, /Subtype /Form, /BBox, /Resources are all preserved.
  • The shapes handled match Table 168 exactly: /N, /R, /D, each either a single stream or a one-level appearance-state subdictionary — which is the only nesting the spec permits there, so the recursion depth is right.
  • Bookkeeping checks out on both paths: ids come from the writer's counter (which feeds /Size), hoisted streams get their own xref entries, and on the editor path they're encrypted under their own object id — which matters since encryption keys are derived per id/gen.

One latent note, non-blocking: /AS is required when /AP contains state subdictionaries (Table 164). The helper hoists those correctly but adding /AS stays the annotation builder's job — fine today since the watermark uses a single-stream /N, just worth keeping in mind if subdictionary appearances become a real path.

Will approve and merge once the remaining checks finish.

On the follow-up exposing WatermarkAnnotation/rotation to Python: sounds great, and agreed the current watermark API is too restrictive. One heads-up before you open it: new public API in this repo ships across all language bindings (Python, Node, WASM, C FFI, C#, Go, Ruby), not Python-only. If you want to keep the PR reviewable, Rust core + Python first is fine and we can coordinate the remaining bindings, but the end state needs to cover all of them.

@norbusan

Copy link
Copy Markdown
Contributor Author

new public API in this repo ships across all language bindings (Python, Node, WASM, C FFI, C#, Go, Ruby), not Python-only

Actually, I don't add a new API, just expose the existing one in Python.

It seems not to be exposed there.

I would need help how to do this for all the others, though.

@yfedoseev yfedoseev left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Spec-correct, both writer paths covered, well-tested.

  • hoist_appearance_streams lifts nested /N,/R,/D appearance streams (including named-state subdictionaries) into freshly-allocated indirect objects and rewrites the slots to references; wired into both the PdfWriter builder path and DocumentEditor::save_page.
  • pdf.md-aligned, textbook: §7.3.8.1 'All streams shall be indirect objects' (pdf.md:1884) — both halves satisfied (stream indirect, dict preserved). Table 168 /N/R/D shape matched; §8.10 form-XObject constraint respected.
  • No dangling refs / double-writes; object-numbering and /Size derivation include hoisted ids on both paths. Unit + builder regression tests assert /N 0 R and never inline /N <<.

Non-blocking: /AS (required alongside state subdicts, Table 164) stays the annotation builder's responsibility — latent only, since the current watermark caller uses single-stream /N.

@norbusan

Copy link
Copy Markdown
Contributor Author

Thanks for the approval. Is there anything else I am supposed to do?

@yfedoseev yfedoseev merged commit c1b35fe into yfedoseev:main Jun 11, 2026
185 checks passed
@norbusan norbusan deleted the fix/watermark-appearance-indirect-stream branch June 11, 2026 08:12
@yfedoseev yfedoseev mentioned this pull request Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants