You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/annot.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -279,7 +279,7 @@ There is a parent-child relationship between an annotation and its page. If the
279
279
:arg sequence stroke: see above.
280
280
:arg sequence fill: see above.
281
281
282
-
*Changed in v1.18.5:* To completely remove a color specification, use an empty sequence like ``[]``.
282
+
*Changed in v1.18.5:* To completely remove a color specification, use an empty sequence like ``[]``. If you specify ``None``, an existing specification will not be changed.
@@ -298,7 +298,7 @@ To show the effect of *TEXT_INHIBIT_SPACES* have a look at this example::
298
298
299
299
Performance
300
300
~~~~~~~~~~~~
301
-
The text extraction methods differ significantly: in terms of information they supply, and in terms of resource requirements and runtimes. Generally, more information of course means that more processing is required and a higher data volume is generated.
301
+
The text extraction methods differ significantly: in terms of information they supply, and in terms of resource requirements and runtimes. Generally, more information of course means, that more processing is required and a higher data volume is generated.
302
302
303
303
.. note:: Especially images have a **very significant** impact. Make sure to exclude them (via the *flags* parameter) whenever you do not need them. To process the below mentioned 2'700 total pages with default flags settings required 160 seconds across all extraction methods. When all images where excluded, less than 50% of that time (77 seconds) were needed.
304
304
@@ -319,6 +319,6 @@ DICT 3.93 **binary** images, **span** level text, layout and font details
319
319
RAWDICT 4.50 **binary** images, **char** level text, layout and font details 1.68
As mentioned: when excluding all images (last column), the relative speeds are changing drastically: except RAWDICT and XML, the other methods are almost equally fast, and RAWDICT requires 40% less execution time than the **now slowest XML**.
322
+
As mentioned: when excluding image extraction (last column), the relative speeds are changing drastically: except RAWDICT and XML, the other methods are almost equally fast, and RAWDICT requires 40% less execution time than the **now slowest XML**.
323
323
324
324
Look at chapter **Appendix 1** for more performance information.
Copy file name to clipboardExpand all lines: docs/changes.rst
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,23 @@
1
1
Change Logs
2
2
===============
3
3
4
+
Changes in Version 1.18.15
5
+
---------------------------
6
+
* **Fixed** issue `#1088 <https://github.com/pymupdf/PyMuPDF/issues/1088>`_. Removing an annotation's fill color should now work again both ways, using the ``fill_color=[]`` argument in :meth:`Annot.update` as well as ``fill=[]`` in :meth:`Annot.set_colors`.
7
+
8
+
* **Fixed** issue `#1081 <https://github.com/pymupdf/PyMuPDF/issues/1081>`_. :meth:`Document.subset_fonts`: fixed an error which created wrong character widths for some fonts.
9
+
10
+
* **Fixed** issue `#1078 <https://github.com/pymupdf/PyMuPDF/issues/1078>`_. :meth:`Page.get_text` and other methods related to text extraction: changed the default value of the :ref:`TextPage` ``flags`` parameter. All whitespace and ligatures are now preserved.
11
+
12
+
* **Fixed** issue `#1085 <https://github.com/pymupdf/PyMuPDF/issues/1085>`_. The old *snake_cased* alias of ``fitz.detTextlength`` is now defined correctly.
13
+
14
+
* **Changed** :meth:`Document.subset_fonts` will now correctly prefix font subsets with an appropriate six letter uppercase tag, complying with the PDF specification.
15
+
16
+
* **Added** new method :meth:`Widget.button_states` which returns the possible values that a button-type field can have when being set to "on" or "off".
17
+
18
+
* **Added** support of text with **Small Capital** letters to the :ref:`Font` and :ref:`TextWriter` classes. This is reflected by an additional bool parameter ``small_caps`` in various of their methods.
19
+
20
+
4
21
Changes in Version 1.18.14
5
22
---------------------------
6
23
* **Finished** implementing new, "snake_cased" names for methods and properties, that were "camelCased" and awkward in many aspects. At the end of this documentation, there is section :ref:`Deprecated` with more background and a mapping of old to new names.
Copy file name to clipboardExpand all lines: docs/document.rst
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1046,7 +1046,7 @@ For details on **embedded files** refer to Appendix 3.
1046
1046
:arg bool attached_files: Search for 'FileAttachment' annotations and remove the file content.
1047
1047
:arg bool clean_pages: Remove any comments from page painting sources. If this option is set to *False*, then this is also done for *hidden_text* and *redactions*.
1048
1048
:arg bool embedded_files: Remove embedded files.
1049
-
:arg bool hidden_text: Remove OCR-ed text and invisible text.
1049
+
:arg bool hidden_text: Remove OCR-ed text and invisible text [#f7]_.
.. [#f2] However, you **can** use :meth:`Document.get_toc` and :meth:`Page.get_links` (which are available for all document types) and copy this information over to the output PDF. See demo `pdf-converter.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo/pdf-converter.py>`_.
1754
1754
1755
-
.. [#f3] For applicable (EPUB) document types, loading a page via its absolute number may result in layouting a large part of the document, before the page can be accessed. To avoid this performance impact, prefer chapter-based access. Use convenience methods / attributes :meth:`Document.next_location`, :meth:`Document.prev_location` and :attr:`Document.last_location` for maintaining a high level of coding efficiency.
1755
+
.. [#f3] For applicable (EPUB) document types, loading a page via its absolute number may result in layouting a large part of the document, before the page can be accessed. To avoid this performance impact, prefer chapter-based access. Use convenience methods and attributes :meth:`Document.next_location`, :meth:`Document.prev_location` and :attr:`Document.last_location` for maintaining a high level of coding efficiency.
1756
1756
1757
1757
.. [#f4] These parameters cause separate handling of stream categories: use it together with ``expand`` to restrict decompression to streams other than images / fontfiles.
1758
1758
1759
1759
.. [#f5] Examples for "Form XObjects" are created by :meth:`Page.show_pdf_page`.
1760
1760
1761
1761
.. [#f6] For a *False* the **complete document** must be scanned. Both methods **do not load pages,** but only scan object definitions. This makes them at least 10 times faster than application-level loops (where total response time roughly equals the time for loading all pages). For the :ref:`AdobeManual` (1'310 pages) and the Pandas documentation (over 3'070 pages) -- both havo no annotations -- the method needs about 11 ms for the answer *False*. So response times will probably become significant only well beyond this order of magnitude.
1762
1762
1763
+
.. [#f7] This only works under certain conditions. For example, if there is normal text covered by some image on top of it, then this is undetectable and the respective text is **not** removed. Similar is true for white text on white background, and so on.
Copy file name to clipboardExpand all lines: docs/faq.rst
+49-12Lines changed: 49 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,9 +77,9 @@ In the above we construct *clip* by specifying two diagonally opposite points: t
77
77
78
78
----------
79
79
80
-
How to Fit a Clip to a GUI Window
80
+
How to Zoom a Clip to a GUI Window
81
81
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
82
-
This is similar to the previous section. This time, we want to **compute the zoom factor** for a clip such, that its image best fits a given GUI window. This means, that either the clip image's width or height (or both) will equal the window dimension.
82
+
Please also read the previous section. This time we want to **compute the zoom factor** for a clip such that its image best fits a given GUI window. This means, that the image's width or height (or both) will equal the window dimension.
83
83
84
84
::
85
85
@@ -89,21 +89,21 @@ This is similar to the previous section. This time, we want to **compute the zoo
89
89
# compare width/height ratios of image and window
90
90
91
91
if clip.width / clip.height < WIDTH / HEIGHT:
92
-
# clip is narrower
93
-
zoom = HEIGHT / clip.height # hence fit window height
94
-
else:
95
-
zoom = WIDTH / clip.width # else fit window width
92
+
# clip is narrower: zoom to window height
93
+
zoom = HEIGHT / clip.height
94
+
else: # else zoom to window width
95
+
zoom = WIDTH / clip.width
96
96
mat = fitz.Matrix(zoom, zoom)
97
97
pix = page.get_pixmap(matrix=mat, clip=clip)
98
98
99
-
Now assume you **have the zoom factor** and need to compute the fitting clip.
99
+
Now assume you **have** the zoom factor and need to compute the fitting clip.
100
100
101
-
In this case we again have ``zoom = HEIGHT/clip.height = WIDTH/clip.width``, so we must set ``clip.height = HEIGHT/zoom`` and, similarly ``clip.width = WIDTH/zoom``. Now you only need to choose a top-left point ``tl`` of the clip on the page to compute the right pixmap::
101
+
In this case we have ``zoom = HEIGHT/clip.height = WIDTH/clip.width``, so we must set ``clip.height = HEIGHT/zoom`` and, ``clip.width = WIDTH/zoom``. Choose the top-left point ``tl`` of the clip on the page to compute the right pixmap::
102
102
103
103
width = WIDTH / zoom
104
104
height = HEIGHT / zoom
105
105
clip = fitz.Rect(tl, tl.x + width, tl.y + height)
106
-
# make sure we still are inside the page
106
+
# ensure we still are inside the page
107
107
clip &= page.rect
108
108
mat = fitz.Matrix(zoom, zoom)
109
109
pix = fitz.Pixmap(matrix=mat, clip=clip)
@@ -410,7 +410,7 @@ The general scheme is just the following two lines::
410
410
.. index::
411
411
pair: copy;examples
412
412
413
-
How to Use Pixmaps: Gluing Images
413
+
How to Use Pixmaps: Glueing Images
414
414
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
415
415
416
416
This shows how pixmaps can be used for purely graphical, non-document purposes. The script reads an image file and creates a new image which consist of 3 * 4 tiles of the original::
@@ -956,6 +956,7 @@ All of the above is provided by three basic :ref:`Page`, resp. :ref:`Shape` meth
956
956
* :meth:`Page.insert_font` -- install a font for the page for later reference. The result is reflected in the output of :meth:`Document.get_page_fonts`. The font can be:
957
957
958
958
- provided as a file,
959
+
- via :ref:`Font` (then use :attr:`Font.buffer`)
959
960
- already present somewhere in **this or another** PDF, or
960
961
- be a **built-in** font.
961
962
@@ -1353,9 +1354,45 @@ Extracting Drawings
1353
1354
1354
1355
The drawing commands issued by a page can be extracted. Interestingly, this is possible for **all supported document types** -- not just PDF: so you can use it for XPS, EPUB and others as well.
1355
1356
1356
-
A new page method, :meth:`Page.get_drawings()` accesses draw commands and converts them into a list of Python dictionaries. Each dictionary -- called a "path" -- represents a separate drawing -- it may be simple like a single line, or a complex combination of lines and curves representing one of the shapes of the previous section.
1357
+
Page method, :meth:`Page.get_drawings()` accesses draw commands and converts them into a list of Python dictionaries. Each dictionary -- called a "path" -- represents a separate drawing -- it may be simple like a single line, or a complex combination of lines and curves representing one of the shapes of the previous section.
1358
+
1359
+
The *path* dictionary has been designed such that it can easily be used by the :ref:`Shape` class and its methods. Here is an example for a page with one path, that draws a red-bordered yellow circle inside rectangle `Rect(100, 100, 200, 200)`::
1360
+
1361
+
>>> pprint(page.get_drawings())
1362
+
[{'closePath': True,
1363
+
'color': [1.0, 0.0, 0.0],
1364
+
'dashes': '[] 0',
1365
+
'even_odd': False,
1366
+
'fill': [1.0, 1.0, 0.0],
1367
+
'items': [('c',
1368
+
Point(100.0, 150.0),
1369
+
Point(100.0, 177.614013671875),
1370
+
Point(122.38600158691406, 200.0),
1371
+
Point(150.0, 200.0)),
1372
+
('c',
1373
+
Point(150.0, 200.0),
1374
+
Point(177.61399841308594, 200.0),
1375
+
Point(200.0, 177.614013671875),
1376
+
Point(200.0, 150.0)),
1377
+
('c',
1378
+
Point(200.0, 150.0),
1379
+
Point(200.0, 122.385986328125),
1380
+
Point(177.61399841308594, 100.0),
1381
+
Point(150.0, 100.0)),
1382
+
('c',
1383
+
Point(150.0, 100.0),
1384
+
Point(122.38600158691406, 100.0),
1385
+
Point(100.0, 122.385986328125),
1386
+
Point(100.0, 150.0))],
1387
+
'lineCap': (0, 0, 0),
1388
+
'lineJoin': 0,
1389
+
'opacity': 1.0,
1390
+
'rect': Rect(100.0, 100.0, 200.0, 200.0),
1391
+
'width': 1.0}]
1392
+
>>>
1393
+
1394
+
.. note:: You need (at least) 4 Bézier curves (of 3rd order) to draw a circle with acceptable precision. See this `Wikipedia article<https://en.wikipedia.org/wiki/B%C3%A9zier_curve>`_ for some background.
1357
1395
1358
-
The *path* dictionary has been designed such that it can easily be used by the :ref:`Shape` class and its methods.
1359
1396
1360
1397
The following is a code snippet which extracts the drawings of a page and re-draws them on a new page::
Copy file name to clipboardExpand all lines: docs/installation.rst
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,7 +56,7 @@ Now perform a *python setup.py install*.
56
56
Option 2: Install from Binaries
57
57
--------------------------------
58
58
You can install PyMuPDF from Python wheels. Wheels are *self-contained*, i.e. you will **not need any other software** nor download / install MuPDF to run PyMuPDF scripts.
59
-
This installation option is available for all MS Windows and the most **popular 64-bit** Mac OSX and Linux platforms for Python versions 3.6 through 3.9. Since version 1.18.13, Linux ARM 64-bit architectures are also supported.
59
+
This installation option is available for all MS Windows and the most **popular 64-bit** Mac OSX and Linux platforms for Python versions 3.6 through 3.9. Since version 1.18.13, Linux ARM 64-bit architectures are also supported, and since version 1.18.15, Mac OSX universal architectures, too.
60
60
Windows binaries are provided for Python 64-bit **and** 32-bit versions.
61
61
62
62
**Overview of wheel names (PyMuPDF version is x.xx.xx):**
@@ -66,5 +66,5 @@ Windows binaries are provided for Python 64-bit **and** 32-bit versions.
66
66
67
67
Older versions can be found in the releases directory of our home page https://github.com/pymupdf/PyMuPDF/releases.
68
68
69
-
If you unexpectedly run into problems installing the wheel for your system, please make sure you have updated your PIP to the current version.
69
+
Please **always** make sure you have updated your PIP to the current version and always invoke pip as a module within the right Python version ``python -m pip install ...``.
.. note:: In PDF, links are a subtype of annotations technically and **do not support fill colors**. However, to keep a consistent API, we do allow specifying a ``fill=`` parameter like with all annotations, which will be ignored with a warning.
44
46
45
47
*(Changed in version 1.16.9)* Allow colors to be directly set. These parameters are used if *colors* is not a dictionary.
46
48
47
49
:arg dict colors: a dictionary containing color specifications. For accepted dictionary keys and values see below. The most practical way should be to first make a copy of the *colors* property and then modify this dictionary as required.
48
50
:arg sequence stroke: see above.
49
-
:arg sequence fill: see above.
50
51
51
52
52
53
.. attribute:: colors
53
54
54
-
Meaningful for PDF only: A dictionary of two lists of floats in range *0 <= float <= 1* specifying the *stroke* and the interior (*fill*) colors. If not a PDF, *None* is returned. The stroke color is used for borders and everything that is actively painted or written ("stroked"). The lengths of these lists implicitely determine the colorspaces used: 1 = GRAY, 3 = RGB, 4 = CMYK. So *[1.0, 0.0, 0.0]* stands for RGB color red. Both lists can be *[]* if no color is specified. The value of each float *f* is mapped to the integer value *i* in range 0 to 255 via the computation *f = i / 255*.
55
+
Meaningful for PDF only: A dictionary of two tuples of floats in range ``0 <= float <= 1`` specifying the *stroke* and the interior (*fill*) colors. If not a PDF, *None* is returned. As mentioned above, the fill color is always ``None`` for links. The stroke color is used for the border of the link rectangle. The length of the tuple implicitely determines the colorspace: 1 = GRAY, 3 = RGB, 4 = CMYK. So ``(1.0, 0.0, 0.0)`` stands for RGB color red. The value of each float *f* is mapped to the integer value *i* in range 0 to 255 via the computation *f = i / 255*.
0 commit comments