-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
I was testing this example from the documentation, but print(text_body) printed an empty string. I then changed the visitor function to this:
def visitor_body(text, cm, tm, font_dict, font_size):
#y = cm[5]
y = tm[5]
print(cm, tm)
if 50 < y < 720:
parts.append(text)
and ran it again.This produced the expected outcome, i.e. that print(text_body) printed the text of the 4th page of the PDF without header and footer. All printed cm matrices looked like this: [1.0, 0.0, 0.0, 1.0, 0.0, 0.0]. I didn't look under the hood of pypdf itself to check where this might be used "incorrectly", but I think it's the same problem with the second example on that page.
See the SVG files created from the code of the 2nd example, that I have attached here. The first one uses the cm matrix in visitor_svg_text, and puts all of the text in the top left corner. The second one uses the tm matrix and while it won't win beauty contests, it looks a lot better:
Environment
Python 3.10.0, pypdf 5.7.0, Windows 10
Code + PDF
See the links to the examples in the documentation above.
Traceback
N/A