Skip to content

Releases: pymupdf/PyMuPDF

Improvements for drawings extraction and bug fixes

20 Nov 07:38
Compare
Choose a tag to compare

Improvements:

  • Page.get_drawings() now includes area orientation for rectangles
  • Page pixmap creation has a new parameter "dpi"
  • New check for monochrome / unicolor pixmaps and number of colors

Fixes:
#1388, #1375, #1364, #1342, #1355, #1397, #1408.

Important improvements for OCR support

24 Oct 10:58
Compare
Choose a tag to compare

OCR of a document page has been improved a lot compared to v1.19.0.
Text extractions now also come with an integrated sort.
Fixes: #1328

First version to support MuPDF v1.19.*

17 Oct 10:46
Compare
Choose a tag to compare

Introduces major new features like PDF journalling and OCR support by directly invoking Tesseract-OCR.
In addition, it is possible to detect whether object are covered (hidden) by other objects.

As part of the new version, the following issues have resolved:
#1313, #1311, #1290, #1286, #1287, #1284.

Hotfix

16 Sep 22:04
Compare
Choose a tag to compare

Fixes #1266

Implement various fixes

16 Sep 16:07
Compare
Choose a tag to compare

Performance improvement for drawings extraction

24 Aug 10:30
Compare
Choose a tag to compare
improve test scripts

`show_pdf_page` and `insert_image` are now tested with rotated insertions.

Layout Preserving Text Extraction

08 Aug 06:31
Compare
Choose a tag to compare

The fitz module now supports text extraction via a new subcommand "gettext". Among a couple of modes, preserving the original layout can be chosen.

Also fixed #1187, #1184, #1154, #1152 and #1146.

Support of Small Capitals, assigning subset font name tags

10 Jul 22:47
Compare
Choose a tag to compare

Apart from some minor fixes, this release introduces support for small caps in TextWriter based text output.

In addition, method Document.subset_fonts() now prefixes subsetted font names with the 6 upper case letter prefix as prescribed by the PDF standard.

List of fixed issues:
#1088, #1081, #1078, #1085.

Fixes and minor improvements

02 Jun 11:01
Compare
Choose a tag to compare

The following habe been fixed:

  • #1043
  • #1053
  • undocumented occasional error calculating envelopping rectangle for paths in Page.get_drawings()
  • undocumented occasional loop in TextWriter.fill_textbox()
  • added method Font.char_lengths() which returns a tuple of all character widths for a given string. An improved version of Font.text_length()
  • greatly improved performance of Font.text_length()
  • added various ways to delete multiple PDF pages, among them are slices and the Python del statement
  • changed method Document.del_toc_item(): the item's title text will no longer be removed - instead the item is shown grayed-out to indicate its deletion.

Rewritten method `Page.insert_image`

05 May 12:43
Compare
Choose a tag to compare

Method Page.insert_image has been rewritten for improved performance in standard cases. Also introduced option to re-use pre-existing images in the file directly to provide another performance boost.
Other changes:

  • implemented or fixed #1042, #1041, #1037
  • minor improvements in PDF EmbeddedFiles handling for better support of building PDF collections apps.