Skip to content

Commit d7f55b3

Browse files
committed
update documentation to v1.18.15
1 parent 3ce5119 commit d7f55b3

17 files changed

+157
-63
lines changed

docs/annot.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@ There is a parent-child relationship between an annotation and its page. If the
279279
:arg sequence stroke: see above.
280280
:arg sequence fill: see above.
281281

282-
*Changed in v1.18.5:* To completely remove a color specification, use an empty sequence like ``[]``.
282+
*Changed in v1.18.5:* To completely remove a color specification, use an empty sequence like ``[]``. If you specify ``None``, an existing specification will not be changed.
283283

284284

285285
.. method:: delete_responses()

docs/app2.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ Text Extraction Flags Defaults
264264
=================== ==== ==== ===== === ==== ======= ===== ====== ======
265265
Indicator text html xhtml xml dict rawdict words blocks search
266266
=================== ==== ==== ===== === ==== ======= ===== ====== ======
267-
preserve ligatures 1 1 1 1 1 1 1 1 0
267+
preserve ligatures 1 1 1 1 1 1 1 1 1
268268
preserve whitespace 1 1 1 1 1 1 1 1 1
269269
preserve images n/a 1 1 n/a 1 1 n/a 0 0
270270
inhibit spaces 0 0 0 0 0 0 0 0 0
@@ -298,7 +298,7 @@ To show the effect of *TEXT_INHIBIT_SPACES* have a look at this example::
298298

299299
Performance
300300
~~~~~~~~~~~~
301-
The text extraction methods differ significantly: in terms of information they supply, and in terms of resource requirements and runtimes. Generally, more information of course means that more processing is required and a higher data volume is generated.
301+
The text extraction methods differ significantly: in terms of information they supply, and in terms of resource requirements and runtimes. Generally, more information of course means, that more processing is required and a higher data volume is generated.
302302

303303
.. note:: Especially images have a **very significant** impact. Make sure to exclude them (via the *flags* parameter) whenever you do not need them. To process the below mentioned 2'700 total pages with default flags settings required 160 seconds across all extraction methods. When all images where excluded, less than 50% of that time (77 seconds) were needed.
304304

@@ -319,6 +319,6 @@ DICT 3.93 **binary** images, **span** level text, layout and font details
319319
RAWDICT 4.50 **binary** images, **char** level text, layout and font details 1.68
320320
======= ====== ===================================================================== ==========
321321

322-
As mentioned: when excluding all images (last column), the relative speeds are changing drastically: except RAWDICT and XML, the other methods are almost equally fast, and RAWDICT requires 40% less execution time than the **now slowest XML**.
322+
As mentioned: when excluding image extraction (last column), the relative speeds are changing drastically: except RAWDICT and XML, the other methods are almost equally fast, and RAWDICT requires 40% less execution time than the **now slowest XML**.
323323

324324
Look at chapter **Appendix 1** for more performance information.

docs/changes.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,23 @@
11
Change Logs
22
===============
33

4+
Changes in Version 1.18.15
5+
---------------------------
6+
* **Fixed** issue `#1088 <https://github.com/pymupdf/PyMuPDF/issues/1088>`_. Removing an annotation's fill color should now work again both ways, using the ``fill_color=[]`` argument in :meth:`Annot.update` as well as ``fill=[]`` in :meth:`Annot.set_colors`.
7+
8+
* **Fixed** issue `#1081 <https://github.com/pymupdf/PyMuPDF/issues/1081>`_. :meth:`Document.subset_fonts`: fixed an error which created wrong character widths for some fonts.
9+
10+
* **Fixed** issue `#1078 <https://github.com/pymupdf/PyMuPDF/issues/1078>`_. :meth:`Page.get_text` and other methods related to text extraction: changed the default value of the :ref:`TextPage` ``flags`` parameter. All whitespace and ligatures are now preserved.
11+
12+
* **Fixed** issue `#1085 <https://github.com/pymupdf/PyMuPDF/issues/1085>`_. The old *snake_cased* alias of ``fitz.detTextlength`` is now defined correctly.
13+
14+
* **Changed** :meth:`Document.subset_fonts` will now correctly prefix font subsets with an appropriate six letter uppercase tag, complying with the PDF specification.
15+
16+
* **Added** new method :meth:`Widget.button_states` which returns the possible values that a button-type field can have when being set to "on" or "off".
17+
18+
* **Added** support of text with **Small Capital** letters to the :ref:`Font` and :ref:`TextWriter` classes. This is reflected by an additional bool parameter ``small_caps`` in various of their methods.
19+
20+
421
Changes in Version 1.18.14
522
---------------------------
623
* **Finished** implementing new, "snake_cased" names for methods and properties, that were "camelCased" and awkward in many aspects. At the end of this documentation, there is section :ref:`Deprecated` with more background and a mapping of old to new names.

docs/conf.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
extensions = [
2121
"extensions.searchrepair",
2222
"extensions.fulltoc",
23+
"rinoh.frontend.sphinx",
2324
]
2425
# Add any paths that contain templates here, relative to this directory.
2526
templates_path = ["_templates"]
@@ -42,7 +43,7 @@
4243
# built documents.
4344
#
4445
# The full version, including alpha/beta/rc tags.
45-
release = "1.18.14"
46+
release = "1.18.15"
4647

4748
# The short X.Y version
4849
version = release

docs/document.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1046,7 +1046,7 @@ For details on **embedded files** refer to Appendix 3.
10461046
:arg bool attached_files: Search for 'FileAttachment' annotations and remove the file content.
10471047
:arg bool clean_pages: Remove any comments from page painting sources. If this option is set to *False*, then this is also done for *hidden_text* and *redactions*.
10481048
:arg bool embedded_files: Remove embedded files.
1049-
:arg bool hidden_text: Remove OCR-ed text and invisible text.
1049+
:arg bool hidden_text: Remove OCR-ed text and invisible text [#f7]_.
10501050
:arg bool javascript: Remove JavaScript sources.
10511051
:arg bool metadata: Remove PDF standard metadata.
10521052
:arg bool redactions: Apply redaction annotations.
@@ -1752,11 +1752,12 @@ Other Examples
17521752
17531753
.. [#f2] However, you **can** use :meth:`Document.get_toc` and :meth:`Page.get_links` (which are available for all document types) and copy this information over to the output PDF. See demo `pdf-converter.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo/pdf-converter.py>`_.
17541754
1755-
.. [#f3] For applicable (EPUB) document types, loading a page via its absolute number may result in layouting a large part of the document, before the page can be accessed. To avoid this performance impact, prefer chapter-based access. Use convenience methods / attributes :meth:`Document.next_location`, :meth:`Document.prev_location` and :attr:`Document.last_location` for maintaining a high level of coding efficiency.
1755+
.. [#f3] For applicable (EPUB) document types, loading a page via its absolute number may result in layouting a large part of the document, before the page can be accessed. To avoid this performance impact, prefer chapter-based access. Use convenience methods and attributes :meth:`Document.next_location`, :meth:`Document.prev_location` and :attr:`Document.last_location` for maintaining a high level of coding efficiency.
17561756
17571757
.. [#f4] These parameters cause separate handling of stream categories: use it together with ``expand`` to restrict decompression to streams other than images / fontfiles.
17581758
17591759
.. [#f5] Examples for "Form XObjects" are created by :meth:`Page.show_pdf_page`.
17601760
17611761
.. [#f6] For a *False* the **complete document** must be scanned. Both methods **do not load pages,** but only scan object definitions. This makes them at least 10 times faster than application-level loops (where total response time roughly equals the time for loading all pages). For the :ref:`AdobeManual` (1'310 pages) and the Pandas documentation (over 3'070 pages) -- both havo no annotations -- the method needs about 11 ms for the answer *False*. So response times will probably become significant only well beyond this order of magnitude.
17621762
1763+
.. [#f7] This only works under certain conditions. For example, if there is normal text covered by some image on top of it, then this is undetectable and the respective text is **not** removed. Similar is true for white text on white background, and so on.

docs/faq.rst

Lines changed: 49 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -77,9 +77,9 @@ In the above we construct *clip* by specifying two diagonally opposite points: t
7777

7878
----------
7979

80-
How to Fit a Clip to a GUI Window
80+
How to Zoom a Clip to a GUI Window
8181
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
82-
This is similar to the previous section. This time, we want to **compute the zoom factor** for a clip such, that its image best fits a given GUI window. This means, that either the clip image's width or height (or both) will equal the window dimension.
82+
Please also read the previous section. This time we want to **compute the zoom factor** for a clip such that its image best fits a given GUI window. This means, that the image's width or height (or both) will equal the window dimension.
8383

8484
::
8585

@@ -89,21 +89,21 @@ This is similar to the previous section. This time, we want to **compute the zoo
8989
# compare width/height ratios of image and window
9090

9191
if clip.width / clip.height < WIDTH / HEIGHT:
92-
# clip is narrower
93-
zoom = HEIGHT / clip.height # hence fit window height
94-
else:
95-
zoom = WIDTH / clip.width # else fit window width
92+
# clip is narrower: zoom to window height
93+
zoom = HEIGHT / clip.height
94+
else: # else zoom to window width
95+
zoom = WIDTH / clip.width
9696
mat = fitz.Matrix(zoom, zoom)
9797
pix = page.get_pixmap(matrix=mat, clip=clip)
9898

99-
Now assume you **have the zoom factor** and need to compute the fitting clip.
99+
Now assume you **have** the zoom factor and need to compute the fitting clip.
100100

101-
In this case we again have ``zoom = HEIGHT/clip.height = WIDTH/clip.width``, so we must set ``clip.height = HEIGHT/zoom`` and, similarly ``clip.width = WIDTH/zoom``. Now you only need to choose a top-left point ``tl`` of the clip on the page to compute the right pixmap::
101+
In this case we have ``zoom = HEIGHT/clip.height = WIDTH/clip.width``, so we must set ``clip.height = HEIGHT/zoom`` and, ``clip.width = WIDTH/zoom``. Choose the top-left point ``tl`` of the clip on the page to compute the right pixmap::
102102

103103
width = WIDTH / zoom
104104
height = HEIGHT / zoom
105105
clip = fitz.Rect(tl, tl.x + width, tl.y + height)
106-
# make sure we still are inside the page
106+
# ensure we still are inside the page
107107
clip &= page.rect
108108
mat = fitz.Matrix(zoom, zoom)
109109
pix = fitz.Pixmap(matrix=mat, clip=clip)
@@ -410,7 +410,7 @@ The general scheme is just the following two lines::
410410
.. index::
411411
pair: copy;examples
412412

413-
How to Use Pixmaps: Gluing Images
413+
How to Use Pixmaps: Glueing Images
414414
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
415415

416416
This shows how pixmaps can be used for purely graphical, non-document purposes. The script reads an image file and creates a new image which consist of 3 * 4 tiles of the original::
@@ -956,6 +956,7 @@ All of the above is provided by three basic :ref:`Page`, resp. :ref:`Shape` meth
956956
* :meth:`Page.insert_font` -- install a font for the page for later reference. The result is reflected in the output of :meth:`Document.get_page_fonts`. The font can be:
957957

958958
- provided as a file,
959+
- via :ref:`Font` (then use :attr:`Font.buffer`)
959960
- already present somewhere in **this or another** PDF, or
960961
- be a **built-in** font.
961962

@@ -1353,9 +1354,45 @@ Extracting Drawings
13531354

13541355
The drawing commands issued by a page can be extracted. Interestingly, this is possible for **all supported document types** -- not just PDF: so you can use it for XPS, EPUB and others as well.
13551356

1356-
A new page method, :meth:`Page.get_drawings()` accesses draw commands and converts them into a list of Python dictionaries. Each dictionary -- called a "path" -- represents a separate drawing -- it may be simple like a single line, or a complex combination of lines and curves representing one of the shapes of the previous section.
1357+
Page method, :meth:`Page.get_drawings()` accesses draw commands and converts them into a list of Python dictionaries. Each dictionary -- called a "path" -- represents a separate drawing -- it may be simple like a single line, or a complex combination of lines and curves representing one of the shapes of the previous section.
1358+
1359+
The *path* dictionary has been designed such that it can easily be used by the :ref:`Shape` class and its methods. Here is an example for a page with one path, that draws a red-bordered yellow circle inside rectangle `Rect(100, 100, 200, 200)`::
1360+
1361+
>>> pprint(page.get_drawings())
1362+
[{'closePath': True,
1363+
'color': [1.0, 0.0, 0.0],
1364+
'dashes': '[] 0',
1365+
'even_odd': False,
1366+
'fill': [1.0, 1.0, 0.0],
1367+
'items': [('c',
1368+
Point(100.0, 150.0),
1369+
Point(100.0, 177.614013671875),
1370+
Point(122.38600158691406, 200.0),
1371+
Point(150.0, 200.0)),
1372+
('c',
1373+
Point(150.0, 200.0),
1374+
Point(177.61399841308594, 200.0),
1375+
Point(200.0, 177.614013671875),
1376+
Point(200.0, 150.0)),
1377+
('c',
1378+
Point(200.0, 150.0),
1379+
Point(200.0, 122.385986328125),
1380+
Point(177.61399841308594, 100.0),
1381+
Point(150.0, 100.0)),
1382+
('c',
1383+
Point(150.0, 100.0),
1384+
Point(122.38600158691406, 100.0),
1385+
Point(100.0, 122.385986328125),
1386+
Point(100.0, 150.0))],
1387+
'lineCap': (0, 0, 0),
1388+
'lineJoin': 0,
1389+
'opacity': 1.0,
1390+
'rect': Rect(100.0, 100.0, 200.0, 200.0),
1391+
'width': 1.0}]
1392+
>>>
1393+
1394+
.. note:: You need (at least) 4 Bézier curves (of 3rd order) to draw a circle with acceptable precision. See this `Wikipedia article<https://en.wikipedia.org/wiki/B%C3%A9zier_curve>`_ for some background.
13571395

1358-
The *path* dictionary has been designed such that it can easily be used by the :ref:`Shape` class and its methods.
13591396

13601397
The following is a code snippet which extracts the drawings of a page and re-draws them on a new page::
13611398

docs/images/img-smallcaps.jpg

4.87 KB
Loading

docs/installation.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Now perform a *python setup.py install*.
5656
Option 2: Install from Binaries
5757
--------------------------------
5858
You can install PyMuPDF from Python wheels. Wheels are *self-contained*, i.e. you will **not need any other software** nor download / install MuPDF to run PyMuPDF scripts.
59-
This installation option is available for all MS Windows and the most **popular 64-bit** Mac OSX and Linux platforms for Python versions 3.6 through 3.9. Since version 1.18.13, Linux ARM 64-bit architectures are also supported.
59+
This installation option is available for all MS Windows and the most **popular 64-bit** Mac OSX and Linux platforms for Python versions 3.6 through 3.9. Since version 1.18.13, Linux ARM 64-bit architectures are also supported, and since version 1.18.15, Mac OSX universal architectures, too.
6060
Windows binaries are provided for Python 64-bit **and** 32-bit versions.
6161

6262
**Overview of wheel names (PyMuPDF version is x.xx.xx):**
@@ -66,5 +66,5 @@ Windows binaries are provided for Python 64-bit **and** 32-bit versions.
6666

6767
Older versions can be found in the releases directory of our home page https://github.com/pymupdf/PyMuPDF/releases.
6868

69-
If you unexpectedly run into problems installing the wheel for your system, please make sure you have updated your PIP to the current version.
69+
Please **always** make sure you have updated your PIP to the current version and always invoke pip as a module within the right Python version ``python -m pip install ...``.
7070

docs/link.rst

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@ There is a parent-child relationship between a link and its page. If the page ob
1010
========================= ============================================
1111
**Attribute** **Short Description**
1212
========================= ============================================
13-
:meth:`Link.setBorder` modify border properties
14-
:meth:`Link.setColors` modify color properties
13+
:meth:`Link.set_border` modify border properties
14+
:meth:`Link.set_colors` modify color properties
1515
:attr:`Link.border` border characteristics
1616
:attr:`Link.colors` border line color
17-
:attr:`Link.dest` points to link destination details
18-
:attr:`Link.isExternal` external link destination?
17+
:attr:`Link.dest` points to destination details
18+
:attr:`Link.is_external` external destination?
1919
:attr:`Link.next` points to next link
2020
:attr:`Link.rect` clickable area in untransformed coordinates.
2121
:attr:`Link.uri` link destination
@@ -26,7 +26,7 @@ There is a parent-child relationship between a link and its page. If the page ob
2626

2727
.. class:: Link
2828

29-
.. method:: setBorder(border=None, width=0, style=None, dashes=None)
29+
.. method:: set_border(border=None, width=0, style=None, dashes=None)
3030

3131
PDF only: Change border width and dashing properties.
3232

@@ -38,20 +38,21 @@ There is a parent-child relationship between a link and its page. If the page ob
3838
:arg str style: see above.
3939
:arg sequence dashes: see above.
4040

41-
.. method:: setColors(colors=None, stroke=None, fill=None)
41+
.. method:: set_colors(colors=None, stroke=None)
4242

43-
Changes the "stroke" and "fill" colors.
43+
Changes the "stroke" color.
44+
45+
.. note:: In PDF, links are a subtype of annotations technically and **do not support fill colors**. However, to keep a consistent API, we do allow specifying a ``fill=`` parameter like with all annotations, which will be ignored with a warning.
4446

4547
*(Changed in version 1.16.9)* Allow colors to be directly set. These parameters are used if *colors* is not a dictionary.
4648

4749
:arg dict colors: a dictionary containing color specifications. For accepted dictionary keys and values see below. The most practical way should be to first make a copy of the *colors* property and then modify this dictionary as required.
4850
:arg sequence stroke: see above.
49-
:arg sequence fill: see above.
5051

5152

5253
.. attribute:: colors
5354

54-
Meaningful for PDF only: A dictionary of two lists of floats in range *0 <= float <= 1* specifying the *stroke* and the interior (*fill*) colors. If not a PDF, *None* is returned. The stroke color is used for borders and everything that is actively painted or written ("stroked"). The lengths of these lists implicitely determine the colorspaces used: 1 = GRAY, 3 = RGB, 4 = CMYK. So *[1.0, 0.0, 0.0]* stands for RGB color red. Both lists can be *[]* if no color is specified. The value of each float *f* is mapped to the integer value *i* in range 0 to 255 via the computation *f = i / 255*.
55+
Meaningful for PDF only: A dictionary of two tuples of floats in range ``0 <= float <= 1`` specifying the *stroke* and the interior (*fill*) colors. If not a PDF, *None* is returned. As mentioned above, the fill color is always ``None`` for links. The stroke color is used for the border of the link rectangle. The length of the tuple implicitely determines the colorspace: 1 = GRAY, 3 = RGB, 4 = CMYK. So ``(1.0, 0.0, 0.0)`` stands for RGB color red. The value of each float *f* is mapped to the integer value *i* in range 0 to 255 via the computation *f = i / 255*.
5556

5657
:rtype: dict
5758

0 commit comments

Comments
 (0)