Releases: pymupdf/PyMuPDF
Minor bug fixes, improved Quad recovering for text extractions
Fixes and improved font subsetting
Some hot fixes
Interesting new features and several fixes
Fixes:
Implemented enhancement requests:
-
#855, which allows font subsetting using package fontTools
-
#870, which allows
convert_to_pdf
method also for PDF documents. -
#843,
Document.tobytes()
(formerlyDocument.write()
) now also support linearized output. Plus several extensions / improvements around supporting Python fileobjects. -
Added new methods to quickly determine whether a PDF has annotations or links.
-
Extended the
Document.scrub()
method with a new parameter, which allows to also remove page thumbnails. -
Added methods to directly inquire and set values in PDF objects - without the need to manipulating PDF object sources in an unwieldy way - see methods
Document.xref_set_key()
/Document.xref_get_key()
.
Continued the process of changing the naming convention for class methods and attributes to "snake_case"
. As announced before, this is a tedious, error-prone process, and requires special care to maintain a high backlevel support for existing scripts.
In future versions - probably synchronously to MuPDF v1.19.0 - we will remove definitions of old names, but a method for re-activating old aliases will remain available.
Bug Fixes and some new features
The recent introduction of "Discussions" by Github has been very motivating for our users.
Based on their feedback, several enhancement have been implemented.
Here is a selection:
- Most Python functions now have typing / annotation support .
- For PDF table-of-contents items, colors are now supported (reading and writing)
- PDF page label support for reading and writing
- Support personalized tagging of new annotations, fields and links for easier selection of relevant objects.
There also is a number of fixes - please consult the documentation.
Minor fixes, improved font metrics handling
Font metrics handling has been improved: text box writing now observes the relevant font properties when determining line heights.
In this course a new option has been introduced, which allows getting text bboxes (glyphs, spans, text search quads, etc.) that more exactly wrap the text only - as opposed to always returning line height bboxes.
Fixes:
Better Optional Content support
Introducing PDF Optional Content
New features for text searching and more
This resolves
and removes the hit_max parameter from text searching. In addition, hyphenated words around line breaks are still found.
The use of the clip
parameter in text searches and text extractions now only includes characters whose bboxes are fully contained in the clip rctangle.