You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016: [](https://pepy.tech/project/pymupdf)
8
8
@@ -11,7 +11,7 @@ On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016: [![Downloads]
11
11
12
12
# Introduction
13
13
14
-
PyMuPDF (current version 1.19.1) is a Python binding with support for [MuPDF](https://mupdf.com/) (current version 1.19.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.
14
+
PyMuPDF (current version 1.19.2) is a Python binding with support for [MuPDF](https://mupdf.com/) (current version 1.19.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.
15
15
16
16
MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.
17
17
@@ -27,9 +27,9 @@ For all supported document types (i.e. **_including images_**) you can
27
27
* search for text
28
28
* extract text and images
29
29
* convert to other formats: PDF, (X)HTML, XML, JSON, text
30
-
*perform Optical Character Recognition if Tesseract is installed
30
+
*do OCR (Optical Character Recognition) if Tesseract is installed
31
31
32
-
> To some degree, PyMuPDF can therefore be used as an [image converter](https://github.com/pymupdf/PyMuPDF/wiki/How-to-Convert-Images): it can read a range of input formats and can produce **Portable Network Graphics (PNG)**, **Portable Anymaps** (**PNM**, etc.), **Portable Arbitrary Maps (PAM)**, **Adobe Postscript** and **Adobe Photoshop** documents, making the use of other graphics packages obselete in these cases. But interfacing with e.g. PIL/Pillow for image input and output is easy as well.
32
+
> To some degree, PyMuPDF can also be used as an [image converter](https://github.com/pymupdf/PyMuPDF/wiki/How-to-Convert-Images): it can read a range of input formats and can produce **Portable Network Graphics (PNG)**, **Portable Anymaps** (**PNM**, etc.), **Portable Arbitrary Maps (PAM)**, **Adobe Postscript** and **Adobe Photoshop** documents, making the use of other graphics packages obselete in these cases. But interfacing with e.g. PIL/Pillow for image input and output is easy as well.
33
33
34
34
For **PDF documents,** there exists a plethora of additional features: they can be created, joined or split up. Pages can be inserted, deleted, re-arranged or modified in many ways (including annotations and form fields).
35
35
@@ -52,12 +52,12 @@ For **PDF documents,** there exists a plethora of additional features: they can
52
52
-**_layout-preserving text extraction_** (all documents)
53
53
54
54
55
-
Have a look at the basic [demos](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo), the [examples](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples) (which contain complete, working programs), and the **recipes** section of our [Wiki](https://github.com/pymupdf/PyMuPDF/wiki) sidebar, which contains more than a dozen of guides in How-To-style.
55
+
Have a look at the basic [demos](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo), the [examples](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/examples) (which contain complete, working programs), and [notebooks](https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/jupyter-notebooks).
56
56
57
57
58
58
# Documentation
59
59
60
-
Our documentation, written using Sphinx, is available in various formats from the following sources. It currently is a combination of reference guide and user manual. For a **quick start** look at the [tutorial](https://pymupdf.readthedocs.io/en/latest/tutorial.html) and the [recipes](https://pymupdf.readthedocs.io/en/latest/faq.html) chapters.
60
+
Documentation is written using Sphinx and is available in various formats from the following sources. It currently is a combination of reference guide and user manual. For a **quick start** look at the [tutorial](https://pymupdf.readthedocs.io/en/latest/tutorial.html) and the [recipes](https://pymupdf.readthedocs.io/en/latest/faq.html) chapters.
61
61
62
62
* You can view it online at [Read the Docs](https://readthedocs.org/projects/pymupdf/). This site also provides download options for PDF.
63
63
* The search function on Read the Docs does not work for me currently. If you want a working searchable local version, please download a zipped HTML for [here](https://github.com/pymupdf/PyMuPDF-optional-material/tree/master/doc/pymupdf.zip).
@@ -68,7 +68,7 @@ The latest changelog can be viewed [here](https://pymupdf.readthedocs.io/en/late
68
68
69
69
# Installation
70
70
71
-
PyMuPDF requires **Python 3.6 or later**.
71
+
PyMuPDF **requires Python 3.6 or later**.
72
72
73
73
Python wheels exist for **Windows** (32bit and 64bit), **Linux** (64bit, Intel and ARM) and **Mac OSX** (64bit, Intel only), so it can be installed from [PyPI](https://pypi.org/search/?q=pymupdf) in the usual way:
Copy file name to clipboardExpand all lines: changes.rst
+31-3Lines changed: 31 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,15 +3,43 @@ Change Log
3
3
4
4
------
5
5
6
+
**Changes in Version 1.19.2**
7
+
8
+
This patch version implements minor improvements for :meth:`Page.get_drawings` and also some important fixes.
9
+
10
+
* **Fixed** `#1388 <https://github.com/pymupdf/PyMuPDF/discussions/1388>`_. Fixed intermittent memory corruption when insert or updating annotations.
11
+
12
+
* **Fixed** `#1375 <https://github.com/pymupdf/PyMuPDF/discussions/1375>`_. Inconsistencies between line numbers as returned by the "words" and the "dict" options of :meth:`Page.get_text` have been corrected.
13
+
14
+
* **Fixed** `#1364 <https://github.com/pymupdf/PyMuPDF/issues/1342>`_. The check for being a ``"rawdict"`` span in :meth:`recover_span_quad` now works correctly.
15
+
16
+
* **Fixed** `#1342 <https://github.com/pymupdf/PyMuPDF/issues/1364>`_. Corrected the check for rectangle infiniteness in :meth:`Page.show_pdf_page`.
17
+
18
+
* **Changed** :meth:`Page.get_drawings`, :meth:`Page.get_cdrawings` to return an indicator on the area orientation covered by a rectangle. This implements `#1355 <https://github.com/pymupdf/PyMuPDF/issues/1355>`_. Also, the recognition rate for rectangles and quads has been significantly improved.
19
+
20
+
* **Changed** all text search and extraction methods to set the new ``flags`` option ``TEXT_MEDIABOX_CLIP`` to ON by default. That bit causes the automatic suppression of all characters that are completely outside a page's mediabox (in as far as that notion is supported for a document type). This eliminates the need for using ``clip=page.rect`` or similar for omitting text outside the visible area.
21
+
22
+
* **Added** parameter ``"dpi"`` to :meth:`Page.get_pixmap` and :meth:`Annot.get_pixmap`. When given, parameter ``"matrix"`` is ignored, and a :ref:`Pixmap` with the desired dots per inch is created.
23
+
24
+
* **Added** attributes :attr:`Pixmap.is_monochrome` and :attr:`Pixmap.is_unicolor` allowing fast checks of pixmap properties. Addresses `#1397 <https://github.com/pymupdf/PyMuPDF/discussions/1397>`_.
25
+
26
+
* **Added** method :meth:`Pixmap.color_count` to determine the unique colors in the pixmap.
27
+
28
+
* **Added** boolean parameter ``"compress"`` to PDF document method :meth:`Document.update_stream`. Addresses / enables solution for `#1408 <https://github.com/pymupdf/PyMuPDF/discussions/1408>`_.
29
+
30
+
------
31
+
6
32
**Changes in Version 1.19.1**
7
33
8
-
* **Fixed** `#1328 <https://github.com/pymupdf/PyMuPDF/issues/1328>`_. "words" text extraction again returns correct coordinates.
34
+
This is the first patch version to support MuPDF v1.19.0. Apart from one bug fix, it includes important improvements for OCR support and the option to **sort extracted text** to the standard reading order "from top-left to bottom-right".
35
+
36
+
* **Fixed** `#1328 <https://github.com/pymupdf/PyMuPDF/issues/1328>`_. "words" text extraction again returns correct ``(x0, y0)`` coordinates.
9
37
10
-
* **Changed** :meth:`Page.get_textpage_ocr` -- support specifying the desired OCR quality via parameter ``dpi``, support choice between full page OCR versus only OCRing displayed images.
38
+
* **Changed** :meth:`Page.get_textpage_ocr`: it now supports parameter ``dpi`` to control OCR quality. It is also possible to choose whether the **full page** should be OCRed or **only the images displayed** by the page.
11
39
12
40
* **Changed** :meth:`Page.get_drawings` and :meth:`Page.get_cdrawings` to automatically convert colors to RGB color tuples. Implements `#1332 <https://github.com/pymupdf/PyMuPDF/discussions/1332>`_. Similar change was applied to :meth:`Page.get_texttrace`.
13
41
14
-
* **Changed** :meth:`Page.get_text` to support a new parameter ``sort``. If set to ``True`` the output is conveniently sorted.
42
+
* **Changed** :meth:`Page.get_text` to support a parameter ``sort``. If set to ``True`` the output is conveniently sorted.
0 commit comments