Skip to content

Commit 52b80b5

Browse files
committed
updates 0 for v1.19.3
1 parent b184bf7 commit 52b80b5

31 files changed

+608
-423
lines changed

README.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
1-
# PyMuPDF 1.19.2
1+
# PyMuPDF 1.19.3
22

33
![logo](https://github.com/pymupdf/PyMuPDF/blob/master/demo/pymupdf.jpg)
44

5-
Release date: November 20, 2021
5+
Release date: December 15, 2021
6+
7+
On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016: [![Downloads](https://static.pepy.tech/personalized-badge/pymupdf?period=total&units=international_system&left_color=black&right_color=orange&left_text=Downloads)](https://pepy.tech/project/pymupdf)
68

7-
On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016: [![](https://pepy.tech/badge/pymupdf)](https://pepy.tech/project/pymupdf)
89
# Author
910
[Jorj X. McKie](mailto:[email protected]), based on original code by [Ruikai Liu](mailto:[email protected]).
1011

1112
# Introduction
1213

13-
PyMuPDF (current version 1.19.2) is a Python binding with support for [MuPDF](https://mupdf.com/) (current version 1.19.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.
14+
PyMuPDF (current version 1.19.3) is a Python binding with support for [MuPDF](https://mupdf.com/) (current version 1.19.*), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.
1415

1516
MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.
1617

@@ -59,7 +60,11 @@ Have a look at the basic [demos](https://github.com/pymupdf/PyMuPDF-Utilities/tr
5960
Documentation is written using Sphinx and is available in various formats from the following sources. It currently is a combination of reference guide and user manual. For a **quick start** look at the [tutorial](https://pymupdf.readthedocs.io/en/latest/tutorial.html) and the [recipes](https://pymupdf.readthedocs.io/en/latest/faq.html) chapters.
6061

6162
* You can view it online at [Read the Docs](https://readthedocs.org/projects/pymupdf/). This site also provides download options for PDF.
63+
<<<<<<< Updated upstream
6264
* The search function on Read the Docs does not work for me currently. If you want a working searchable local version, please download a zipped HTML for [here](https://github.com/pymupdf/PyMuPDF-optional-material/tree/master/doc/pymupdf.zip).
65+
=======
66+
* The search function on Read the Docs does not work for me currently. If you want a working searchable local version, please download a zipped HTML from [here](https://github.com/pymupdf/PyMuPDF-optional-material/tree/master/doc/pymupdf.zip).
67+
>>>>>>> Stashed changes
6368
* Find a Windows help file [here](https://github.com/pymupdf/PyMuPDF-optional-material/tree/master/doc/PyMuPDF.chm).
6469

6570
The latest changelog can be viewed [here](https://pymupdf.readthedocs.io/en/latest/changes.html).
@@ -76,7 +81,7 @@ python -m pip install --upgrade pip
7681
python -m pip install --upgrade pymupdf
7782
```
7883

79-
There are **no mandatory** external dependencies. However, some **optional features** become available only if additional packages are installed:
84+
There are **no mandatory** external dependencies. However, some **optional features** become available if additional packages are installed:
8085

8186
* [Pillow](https://pypi.org/project/Pillow/) for using pillow image output directly from PyMuPDF
8287
* [fontTools](https://pypi.org/project/fonttools/) for creating font subsets

changes.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,23 @@ Change Log
33

44
------
55

6+
**Changes in Version 1.19.3**
7+
8+
This patch version implements minor improvements for :ref:`Pixmap` and also some important fixes.
9+
10+
* **Fixed** `#1351 <https://github.com/pymupdf/PyMuPDF/discussions/1351>`_. Reverted code that introduced the memory growth in v1.18.15.
11+
* **Fixed** `#1417 <https://github.com/pymupdf/PyMuPDF/discussions/1417>`_. Developped circumvention for growth of open file handles using :meth:`Document.insert_pdf`.
12+
* **Fixed** `#1418 <https://github.com/pymupdf/PyMuPDF/discussions/1418>`_. Developped circumvention for memory growth using :meth:`Document.insert_pdf`.
13+
* **Fixed** `#1430 <https://github.com/pymupdf/PyMuPDF/discussions/1430>`_. Developped circumvention for mass pixmap generations of document pages.
14+
* **Fixed** `#1433 <https://github.com/pymupdf/PyMuPDF/discussions/1433>`_. Solves a bbox error for some Type 3 font in PyMuPDF text processing.
15+
* **Added** :meth:`Pixmap.color_topusage` to determine the share of the most frequently used color. Solves `#1397 <https://github.com/pymupdf/PyMuPDF/discussions/1397>`_.
16+
* **Added** :meth:`Pixmap.warp` which makes a new pixmap from a given arbitrary convex quad inside the pixmap.
17+
* **Added** :meth:`Rect.torect` and :meth:`IRect.torect` which compute a matrix that transforms to a given other rectangle.
18+
* **Changed** :meth:`Pixmap.color_count` to also return the count of each color.
19+
* **Changed** :meth:`Page.get_texttrace` to also return correct span and character bboxes if ``span["dir"] != (1, 0)``.
20+
21+
------
22+
623
**Changes in Version 1.19.2**
724

825
This patch version implements minor improvements for :meth:`Page.get_drawings` and also some important fixes.

docs/changes.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,23 @@ Change Log
33

44
------
55

6+
**Changes in Version 1.19.3**
7+
8+
This patch version implements minor improvements for :ref:`Pixmap` and also some important fixes.
9+
10+
* **Fixed** `#1351 <https://github.com/pymupdf/PyMuPDF/discussions/1351>`_. Reverted code that introduced the memory growth in v1.18.15.
11+
* **Fixed** `#1417 <https://github.com/pymupdf/PyMuPDF/discussions/1417>`_. Developped circumvention for growth of open file handles using :meth:`Document.insert_pdf`.
12+
* **Fixed** `#1418 <https://github.com/pymupdf/PyMuPDF/discussions/1418>`_. Developped circumvention for memory growth using :meth:`Document.insert_pdf`.
13+
* **Fixed** `#1430 <https://github.com/pymupdf/PyMuPDF/discussions/1430>`_. Developped circumvention for mass pixmap generations of document pages.
14+
* **Fixed** `#1433 <https://github.com/pymupdf/PyMuPDF/discussions/1433>`_. Solves a bbox error for some Type 3 font in PyMuPDF text processing.
15+
* **Added** :meth:`Pixmap.color_topusage` to determine the share of the most frequently used color. Solves `#1397 <https://github.com/pymupdf/PyMuPDF/discussions/1397>`_.
16+
* **Added** :meth:`Pixmap.warp` which makes a new pixmap from a given arbitrary convex quad inside the pixmap.
17+
* **Added** :meth:`Rect.torect` and :meth:`IRect.torect` which compute a matrix that transforms to a given other rectangle.
18+
* **Changed** :meth:`Pixmap.color_count` to also return the count of each color.
19+
* **Changed** :meth:`Page.get_texttrace` to also return correct span and character bboxes if ``span["dir"] != (1, 0)``.
20+
21+
------
22+
623
**Changes in Version 1.19.2**
724

825
This patch version implements minor improvements for :meth:`Page.get_drawings` and also some important fixes.

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343
# built documents.
4444
#
4545
# The full version, including alpha/beta/rc tags.
46-
release = "1.19.2"
46+
release = "1.19.3"
4747

4848
# The short X.Y version
4949
version = release

docs/document.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ For details on **embedded files** refer to Appendix 3.
4949
:meth:`Document.find_bookmark` retrieve page location after layouting document
5050
:meth:`Document.fullcopy_page` PDF only: duplicate a page
5151
:meth:`Document.get_layer` PDF only: lists of OCGs in ON, OFF, RBGroups
52+
:meth:`Document.get_layers` PDF only: list of optional content configurations
5253
:meth:`Document.get_oc` PDF only: get OCG /OCMD xref of image / form xobject
5354
:meth:`Document.get_ocgs` PDF only: info on all optional content groups
5455
:meth:`Document.get_ocmd` PDF only: retrieve definition of an :data:`OCMD`
@@ -76,7 +77,6 @@ For details on **embedded files** refer to Appendix 3.
7677
:meth:`Document.journal_redo` PDF only: redo current operation
7778
:meth:`Document.journal_save` PDF only: save joural to a file
7879
:meth:`Document.journal_load` PDF only: load joural from a file
79-
:meth:`Document.layer_configs` PDF only: list of optional content configurations
8080
:meth:`Document.layer_ui_configs` PDF only: list of optional content intents
8181
:meth:`Document.layout` re-paginate the document (if supported)
8282
:meth:`Document.load_page` read a page
@@ -226,13 +226,13 @@ For details on **embedded files** refer to Appendix 3.
226226
:arg int ocxref: the :data:`xref` number of an :data:`OCG` / :data:`OCMD`. If not zero, an invalid reference raises an exception. If zero, any OC reference is removed.
227227

228228

229-
.. method:: layer_configs()
229+
.. method:: get_layers()
230230

231231
*(New in v1.18.3)*
232232

233233
Show optional layer configurations. There always is a standard one, which is not included in the response.
234234

235-
>>> for item in doc.layer_configs: print(item)
235+
>>> for item in doc.get_layers(): print(item)
236236
{'number': 0, 'name': 'my-config', 'creator': ''}
237237
>>> # use 'number' as config identifyer in add_ocg
238238

docs/faq.rst

Lines changed: 4 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -706,97 +706,12 @@ The text sequence extracted from a page modified in this way will look like this
706706
2. header line
707707
3. footer line
708708

709-
PyMuPDF has several means to re-establish some reading sequence or even to re-generate a layout close to the original.
709+
PyMuPDF has several means to re-establish some reading sequence or even to re-generate a layout close to the original:
710710

711-
As a starting point take the above mentioned `script <https://github.com/pymupdf/PyMuPDF/wiki/How-to-extract-text-from-a-rectangle>`_ and then use the full page rectangle.
712-
713-
On rare occasions, when the PDF creator has been "over-creative", extracted text does not even keep the correct reading sequence of **single letters**: instead of the two words "DELUXE PROPERTY" you might sometimes get an anagram, consisting of 8 words like "DEL", "XE" , "P", "OP", "RTY", "U", "R" and "E".
714-
715-
Such a PDF is also not searchable by all PDF viewers, but it is displayed correctly and looks harmless.
716-
717-
In those cases, the following function will help composing the original words of the page. The resulting list is also searchable and can be used to deliver rectangles for the found text locations::
718-
719-
from operator import itemgetter
720-
from itertools import groupby
721-
import fitz
722-
723-
def recover(words, rect):
724-
""" Word recovery.
725-
726-
Notes:
727-
Method 'get_textWords()' does not try to recover words, if their single
728-
letters do not appear in correct lexical order. This function steps in
729-
here and creates a new list of recovered words.
730-
Args:
731-
words: list of words as created by 'get_textWords()'
732-
rect: rectangle to consider (usually the full page)
733-
Returns:
734-
List of recovered words. Same format as 'get_text_words', but left out
735-
block, line and word number - a list of items of the following format:
736-
[x0, y0, x1, y1, "word"]
737-
"""
738-
# build my sublist of words contained in given rectangle
739-
mywords = [w for w in words if fitz.Rect(w[:4]) in rect]
740-
741-
# sort the words by lower line, then by word start coordinate
742-
mywords.sort(key=itemgetter(3, 0)) # sort by y1, x0 of word rectangle
743-
744-
# build word groups on same line
745-
grouped_lines = groupby(mywords, key=itemgetter(3))
746-
747-
words_out = [] # we will return this
748-
749-
# iterate through the grouped lines
750-
# for each line coordinate ("_"), the list of words is given
751-
for _, words_in_line in grouped_lines:
752-
for i, w in enumerate(words_in_line):
753-
if i == 0: # store first word
754-
x0, y0, x1, y1, word = w[:5]
755-
continue
756-
757-
r = fitz.Rect(w[:4]) # word rect
758-
759-
# Compute word distance threshold as 20% of width of 1 letter.
760-
# So we should be safe joining text pieces into one word if they
761-
# have a distance shorter than that.
762-
threshold = r.width / len(w[4]) / 5
763-
if r.x0 <= x1 + threshold: # join with previous word
764-
word += w[4] # add string
765-
x1 = r.x1 # new end-of-word coordinate
766-
y0 = max(y0, r.y0) # extend word rect upper bound
767-
continue
768-
769-
# now have a new word, output previous one
770-
words_out.append([x0, y0, x1, y1, word])
771-
772-
# store the new word
773-
x0, y0, x1, y1, word = w[:5]
774-
775-
# output word waiting for completion
776-
words_out.append([x0, y0, x1, y1, word])
777-
778-
return words_out
779-
780-
def search_for(text, words):
781-
""" Search for text in items of list of words
782-
783-
Notes:
784-
Can be adjusted / extended in obvious ways, e.g. using regular
785-
expressions, or being case insensitive, or only looking for complete
786-
words, etc.
787-
Args:
788-
text: string to be searched for
789-
words: list of items in format delivered by 'get_text_words()'.
790-
Returns:
791-
List of rectangles, one for each found locations.
792-
"""
793-
rect_list = []
794-
for w in words:
795-
if text in w[4]:
796-
rect_list.append(fitz.Rect(w[:4]))
797-
798-
return rect_list
711+
1. Use ``sort`` parameter of :meth:`Page.get_text`. It will sort the output from top-left to bottom-right (ignored for XHTML, HTML and XML output).
712+
2. Use the ``fitz`` module in CLI: ``python -m fitz gettext ...``, which produces a text file where text has been re-arranged in layout-preserving mode. Many options are available to control the output.
799713

714+
You can also use the above mentioned `script <https://github.com/pymupdf/PyMuPDF/wiki/How-to-extract-text-from-a-rectangle>`_ with your modifications.
800715

801716
----------
802717

0 commit comments

Comments
 (0)