Skip to content

Commit

Permalink
Fix parsing of invalid PDFs containing non-existing indirect stream l…
Browse files Browse the repository at this point in the history
…ength values
  • Loading branch information
gettalong committed Sep 18, 2024
1 parent 9ed078f commit 70cf15c
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 1 deletion.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
* [HexaPDF::DigitalSignature::Signing::DefaultHandler] to update the document's
version to 2.0 when using PAdES
* Parsing of invalid `)` character in PDF objects and content streams
* Handling of files that contain stream length values that are indirect objects
that do not exist


## 0.47.0 - 2024-09-07
Expand Down
2 changes: 1 addition & 1 deletion lib/hexapdf/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ def parse_indirect_object(offset = nil)
length = if object[:Length].kind_of?(Integer)
object[:Length]
elsif object[:Length].kind_of?(Reference)
@document.deref(object[:Length]).value
@document.deref(object[:Length])&.value || 0
else
0
end
Expand Down
7 changes: 7 additions & 0 deletions test/hexapdf/test_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,13 @@ def create_parser(str)
assert_equal('12', collector(stream.fiber))
end

it "recovers from a non-existing indirect reference to a stream length value" do
create_parser("1 0 obj<</Length 2 0 R>> stream\n12(ab\nendstream endobj")
obj, _, _, stream = @parser.parse_indirect_object
assert_equal(5, obj[:Length])
assert_equal('12(ab', collector(stream.fiber))
end

it "works even if the keyword endobj is missing or mangled" do
create_parser("1 0 obj<</Length 4>>5")
object, * = @parser.parse_indirect_object
Expand Down

0 comments on commit 70cf15c

Please sign in to comment.