Skip to content

Commit 33e34f0

Browse files
committed
version 1.19.2
1 parent 4506a7e commit 33e34f0

File tree

14 files changed

+500
-327
lines changed

14 files changed

+500
-327
lines changed

README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# PyMuPDF 1.19.1
1+
# PyMuPDF 1.19.2
22

33
![logo](https://github.com/pymupdf/PyMuPDF/blob/master/demo/pymupdf.jpg)
44

5-
Release date: October 23, 2021
5+
Release date: November 20, 2021
66

77
On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016: [![Downloads](https://static.pepy.tech/personalized-badge/pymupdf?period=total&units=international_system&left_color=black&right_color=orange&left_text=Downloads)](https://pepy.tech/project/pymupdf)
88

@@ -11,7 +11,7 @@ On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016: [![Downloads]
1111

1212
# Introduction
1313

14-
PyMuPDF (current version 1.19.1) is a Python binding with support for [MuPDF](https://mupdf.com/) (current version 1.19.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.
14+
PyMuPDF (current version 1.19.2) is a Python binding with support for [MuPDF](https://mupdf.com/) (current version 1.19.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.
1515

1616
MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.
1717

@@ -27,9 +27,9 @@ For all supported document types (i.e. **_including images_**) you can
2727
* search for text
2828
* extract text and images
2929
* convert to other formats: PDF, (X)HTML, XML, JSON, text
30-
* perform Optical Character Recognition if Tesseract is installed
30+
* do OCR (Optical Character Recognition) if Tesseract is installed
3131

32-
> To some degree, PyMuPDF can therefore be used as an [image converter](https://github.com/pymupdf/PyMuPDF/wiki/How-to-Convert-Images): it can read a range of input formats and can produce **Portable Network Graphics (PNG)**, **Portable Anymaps** (**PNM**, etc.), **Portable Arbitrary Maps (PAM)**, **Adobe Postscript** and **Adobe Photoshop** documents, making the use of other graphics packages obselete in these cases. But interfacing with e.g. PIL/Pillow for image input and output is easy as well.
32+
> To some degree, PyMuPDF can also be used as an [image converter](https://github.com/pymupdf/PyMuPDF/wiki/How-to-Convert-Images): it can read a range of input formats and can produce **Portable Network Graphics (PNG)**, **Portable Anymaps** (**PNM**, etc.), **Portable Arbitrary Maps (PAM)**, **Adobe Postscript** and **Adobe Photoshop** documents, making the use of other graphics packages obselete in these cases. But interfacing with e.g. PIL/Pillow for image input and output is easy as well.
3333
3434
For **PDF documents,** there exists a plethora of additional features: they can be created, joined or split up. Pages can be inserted, deleted, re-arranged or modified in many ways (including annotations and form fields).
3535

@@ -52,12 +52,12 @@ For **PDF documents,** there exists a plethora of additional features: they can
5252
- **_layout-preserving text extraction_** (all documents)
5353

5454

55-
Have a look at the basic [demos](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo), the [examples](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples) (which contain complete, working programs), and the **recipes** section of our [Wiki](https://github.com/pymupdf/PyMuPDF/wiki) sidebar, which contains more than a dozen of guides in How-To-style.
55+
Have a look at the basic [demos](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo), the [examples](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples) (which contain complete, working programs), and [notebooks](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/jupyter-notebooks).
5656

5757

5858
# Documentation
5959

60-
Our documentation, written using Sphinx, is available in various formats from the following sources. It currently is a combination of reference guide and user manual. For a **quick start** look at the [tutorial](https://pymupdf.readthedocs.io/en/latest/tutorial.html) and the [recipes](https://pymupdf.readthedocs.io/en/latest/faq.html) chapters.
60+
Documentation is written using Sphinx and is available in various formats from the following sources. It currently is a combination of reference guide and user manual. For a **quick start** look at the [tutorial](https://pymupdf.readthedocs.io/en/latest/tutorial.html) and the [recipes](https://pymupdf.readthedocs.io/en/latest/faq.html) chapters.
6161

6262
* You can view it online at [Read the Docs](https://readthedocs.org/projects/pymupdf/). This site also provides download options for PDF.
6363
* The search function on Read the Docs does not work for me currently. If you want a working searchable local version, please download a zipped HTML for [here](https://github.com/pymupdf/PyMuPDF-optional-material/tree/master/doc/pymupdf.zip).
@@ -68,7 +68,7 @@ The latest changelog can be viewed [here](https://pymupdf.readthedocs.io/en/late
6868

6969
# Installation
7070

71-
PyMuPDF requires **Python 3.6 or later**.
71+
PyMuPDF **requires Python 3.6 or later**.
7272

7373
Python wheels exist for **Windows** (32bit and 64bit), **Linux** (64bit, Intel and ARM) and **Mac OSX** (64bit, Intel only), so it can be installed from [PyPI](https://pypi.org/search/?q=pymupdf) in the usual way:
7474

@@ -77,7 +77,7 @@ python -m pip install --upgrade pip
7777
python -m pip install --upgrade pymupdf
7878
```
7979

80-
There are **no mandatory** external dependencies. However, a some **optional features** become available if additional packages are installed:
80+
There are **no mandatory** external dependencies. However, some **optional features** become available if additional packages are installed:
8181

8282
* [Pillow](https://pypi.org/project/Pillow/) for using pillow image output directly from PyMuPDF
8383
* [fontTools](https://pypi.org/project/fonttools/) for creating font subsets

changes.rst

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,43 @@ Change Log
33

44
------
55

6+
**Changes in Version 1.19.2**
7+
8+
This patch version implements minor improvements for :meth:`Page.get_drawings` and also some important fixes.
9+
10+
* **Fixed** `#1388 <https://github.com/pymupdf/PyMuPDF/discussions/1388>`_. Fixed intermittent memory corruption when insert or updating annotations.
11+
12+
* **Fixed** `#1375 <https://github.com/pymupdf/PyMuPDF/discussions/1375>`_. Inconsistencies between line numbers as returned by the "words" and the "dict" options of :meth:`Page.get_text` have been corrected.
13+
14+
* **Fixed** `#1364 <https://github.com/pymupdf/PyMuPDF/issues/1342>`_. The check for being a ``"rawdict"`` span in :meth:`recover_span_quad` now works correctly.
15+
16+
* **Fixed** `#1342 <https://github.com/pymupdf/PyMuPDF/issues/1364>`_. Corrected the check for rectangle infiniteness in :meth:`Page.show_pdf_page`.
17+
18+
* **Changed** :meth:`Page.get_drawings`, :meth:`Page.get_cdrawings` to return an indicator on the area orientation covered by a rectangle. This implements `#1355 <https://github.com/pymupdf/PyMuPDF/issues/1355>`_. Also, the recognition rate for rectangles and quads has been significantly improved.
19+
20+
* **Changed** all text search and extraction methods to set the new ``flags`` option ``TEXT_MEDIABOX_CLIP`` to ON by default. That bit causes the automatic suppression of all characters that are completely outside a page's mediabox (in as far as that notion is supported for a document type). This eliminates the need for using ``clip=page.rect`` or similar for omitting text outside the visible area.
21+
22+
* **Added** parameter ``"dpi"`` to :meth:`Page.get_pixmap` and :meth:`Annot.get_pixmap`. When given, parameter ``"matrix"`` is ignored, and a :ref:`Pixmap` with the desired dots per inch is created.
23+
24+
* **Added** attributes :attr:`Pixmap.is_monochrome` and :attr:`Pixmap.is_unicolor` allowing fast checks of pixmap properties. Addresses `#1397 <https://github.com/pymupdf/PyMuPDF/discussions/1397>`_.
25+
26+
* **Added** method :meth:`Pixmap.color_count` to determine the unique colors in the pixmap.
27+
28+
* **Added** boolean parameter ``"compress"`` to PDF document method :meth:`Document.update_stream`. Addresses / enables solution for `#1408 <https://github.com/pymupdf/PyMuPDF/discussions/1408>`_.
29+
30+
------
31+
632
**Changes in Version 1.19.1**
733

8-
* **Fixed** `#1328 <https://github.com/pymupdf/PyMuPDF/issues/1328>`_. "words" text extraction again returns correct coordinates.
34+
This is the first patch version to support MuPDF v1.19.0. Apart from one bug fix, it includes important improvements for OCR support and the option to **sort extracted text** to the standard reading order "from top-left to bottom-right".
35+
36+
* **Fixed** `#1328 <https://github.com/pymupdf/PyMuPDF/issues/1328>`_. "words" text extraction again returns correct ``(x0, y0)`` coordinates.
937

10-
* **Changed** :meth:`Page.get_textpage_ocr` -- support specifying the desired OCR quality via parameter ``dpi``, support choice between full page OCR versus only OCRing displayed images.
38+
* **Changed** :meth:`Page.get_textpage_ocr`: it now supports parameter ``dpi`` to control OCR quality. It is also possible to choose whether the **full page** should be OCRed or **only the images displayed** by the page.
1139

1240
* **Changed** :meth:`Page.get_drawings` and :meth:`Page.get_cdrawings` to automatically convert colors to RGB color tuples. Implements `#1332 <https://github.com/pymupdf/PyMuPDF/discussions/1332>`_. Similar change was applied to :meth:`Page.get_texttrace`.
1341

14-
* **Changed** :meth:`Page.get_text` to support a new parameter ``sort``. If set to ``True`` the output is conveniently sorted.
42+
* **Changed** :meth:`Page.get_text` to support a parameter ``sort``. If set to ``True`` the output is conveniently sorted.
1543

1644

1745
------

fitz/__main__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -555,7 +555,7 @@ def page_simple(page, textout, GRID, fontsize, noformfeed, skip_empty, flags):
555555
if not skip_empty:
556556
textout.write(eop) # write formfeed
557557
return
558-
textout.write(text.encode("utf8"))
558+
textout.write(text.encode("utf8", errors="surrogatepass"))
559559
textout.write(eop)
560560
return
561561

@@ -569,7 +569,7 @@ def page_blocksort(page, textout, GRID, fontsize, noformfeed, skip_empty, flags)
569569
return
570570
blocks.sort(key=lambda b: (b[3], b[0]))
571571
for b in blocks:
572-
textout.write(b[4].encode("utf8"))
572+
textout.write(b[4].encode("utf8", errors="surrogatepass"))
573573
textout.write(eop)
574574
return
575575

@@ -793,7 +793,7 @@ def make_textline(left, slot, minslot, lchars):
793793
textout.write(b"\n")
794794
rowpos += rowheight
795795
text = make_textline(left, slot, minslots[k], lines[k])
796-
textout.write((text + "\n").encode("utf8"))
796+
textout.write((text + "\n").encode("utf8", errors="surrogatepass"))
797797
rowpos = k + rowheight
798798

799799
textout.write(eop) # write formfeed

0 commit comments

Comments
 (0)