pymupdf
diff --git a/‎docs/annot.rst
Lines changed: 1 addition & 1 deletion b/‎docs/annot.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/app2.rst
Lines changed: 3 additions & 3 deletions b/‎docs/app2.rst
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/changes.rst
Lines changed: 17 additions & 0 deletions b/‎docs/changes.rst
Lines changed: 17 additions & 0 deletions
diff --git a/‎docs/conf.py
Lines changed: 2 additions & 1 deletion b/‎docs/conf.py
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/document.rst
Lines changed: 3 additions & 2 deletions b/‎docs/document.rst
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/faq.rst
Lines changed: 49 additions & 12 deletions b/‎docs/faq.rst
Lines changed: 49 additions & 12 deletions
diff --git a/‎docs/images/img-smallcaps.jpg
4.87 KB b/‎docs/images/img-smallcaps.jpg
4.87 KB
diff --git a/‎docs/installation.rst
Lines changed: 2 additions & 2 deletions b/‎docs/installation.rst
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/link.rst
Lines changed: 10 additions & 9 deletions b/‎docs/link.rst
Lines changed: 10 additions & 9 deletions
@@ -279,7 +279,7 @@ There is a parent-child relationship between an annotation and its page. If the
       :arg sequence stroke: see above.
       :arg sequence fill: see above.
 
-      *Changed in v1.18.5:* To completely remove a color specification, use an empty sequence like ``[]``. 
+      *Changed in v1.18.5:* To completely remove a color specification, use an empty sequence like ``[]``. If you specify ``None``, an existing specification will not be changed.
 
 
    .. method:: delete_responses()
 
@@ -264,7 +264,7 @@ Text Extraction Flags Defaults
 =================== ==== ==== ===== === ==== ======= ===== ====== ======
 Indicator           text html xhtml xml dict rawdict words blocks search
 =================== ==== ==== ===== === ==== ======= ===== ====== ======
-preserve ligatures  1    1    1     1   1    1       1     1       0
+preserve ligatures  1    1    1     1   1    1       1     1       1
 preserve whitespace 1    1    1     1   1    1       1     1       1
 preserve images     n/a  1    1     n/a 1    1       n/a   0       0
 inhibit spaces      0    0    0     0   0    0       0     0       0
@@ -298,7 +298,7 @@ To show the effect of *TEXT_INHIBIT_SPACES* have a look at this example::
 
 Performance
 ~~~~~~~~~~~~
-The text extraction methods differ significantly: in terms of information they supply, and in terms of resource requirements and runtimes. Generally, more information of course means that more processing is required and a higher data volume is generated.
+The text extraction methods differ significantly: in terms of information they supply, and in terms of resource requirements and runtimes. Generally, more information of course means, that more processing is required and a higher data volume is generated.
 
 .. note:: Especially images have a **very significant** impact. Make sure to exclude them (via the *flags* parameter) whenever you do not need them. To process the below mentioned 2'700 total pages with default flags settings required 160 seconds across all extraction methods. When all images where excluded, less than 50% of that time (77 seconds) were needed.
 
@@ -319,6 +319,6 @@ DICT     3.93  **binary** images, **span** level text, layout and font details
 RAWDICT  4.50  **binary** images, **char** level text, layout and font details        1.68
 ======= ====== ===================================================================== ==========
 
-As mentioned: when excluding all images (last column), the relative speeds are changing drastically: except RAWDICT and XML, the other methods are almost equally fast, and RAWDICT requires 40% less execution time than the **now slowest XML**.
+As mentioned: when excluding image extraction (last column), the relative speeds are changing drastically: except RAWDICT and XML, the other methods are almost equally fast, and RAWDICT requires 40% less execution time than the **now slowest XML**.
 
 Look at chapter **Appendix 1** for more performance information.
@@ -1,6 +1,23 @@
 Change Logs
 ===============
 
+Changes in Version 1.18.15
+---------------------------
+* **Fixed** issue `#1088 <https://github.com/pymupdf/PyMuPDF/issues/1088>`_. Removing an annotation's fill color should now work again both ways, using the ``fill_color=[]`` argument in :meth:`Annot.update` as well as ``fill=[]`` in :meth:`Annot.set_colors`.
+
+* **Fixed** issue `#1081 <https://github.com/pymupdf/PyMuPDF/issues/1081>`_. :meth:`Document.subset_fonts`: fixed an error which created wrong character widths for some fonts.
+
+* **Fixed** issue `#1078 <https://github.com/pymupdf/PyMuPDF/issues/1078>`_. :meth:`Page.get_text` and other methods related to text extraction: changed the default value of the :ref:`TextPage` ``flags`` parameter. All whitespace and ligatures are now preserved.
+
+* **Fixed** issue `#1085 <https://github.com/pymupdf/PyMuPDF/issues/1085>`_. The old *snake_cased* alias of ``fitz.detTextlength`` is now defined correctly.
+
+* **Changed** :meth:`Document.subset_fonts` will now correctly prefix font subsets with an appropriate six letter uppercase tag, complying with the PDF specification.
+
+* **Added** new method :meth:`Widget.button_states` which returns the possible values that a button-type field can have when being set to "on" or "off".
+
+* **Added** support of text with **Small Capital** letters to the :ref:`Font` and :ref:`TextWriter` classes. This is reflected by an additional bool parameter ``small_caps`` in various of their methods.
+
+
 Changes in Version 1.18.14
 ---------------------------
 * **Finished** implementing new, "snake_cased" names for methods and properties, that were "camelCased" and awkward in many aspects. At the end of this documentation, there is section :ref:`Deprecated` with more background and a mapping of old to new names.
 
@@ -20,6 +20,7 @@
 extensions = [
     "extensions.searchrepair",
     "extensions.fulltoc",
+    "rinoh.frontend.sphinx",
 ]
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ["_templates"]
@@ -42,7 +43,7 @@
 # built documents.
 #
 # The full version, including alpha/beta/rc tags.
-release = "1.18.14"
+release = "1.18.15"
 
 # The short X.Y version
 version = release
 
@@ -1046,7 +1046,7 @@ For details on **embedded files** refer to Appendix 3.
       :arg bool attached_files: Search for 'FileAttachment' annotations and remove the file content.
       :arg bool clean_pages: Remove any comments from page painting sources. If this option is set to *False*, then this is also done for *hidden_text* and *redactions*.
       :arg bool embedded_files: Remove embedded files.
-      :arg bool hidden_text: Remove OCR-ed text and invisible text.
+      :arg bool hidden_text: Remove OCR-ed text and invisible text [#f7]_.
       :arg bool javascript: Remove JavaScript sources.
       :arg bool metadata: Remove PDF standard metadata.
       :arg bool redactions: Apply redaction annotations.
@@ -1752,11 +1752,12 @@ Other Examples
 
 .. [#f2] However, you **can** use :meth:`Document.get_toc` and :meth:`Page.get_links` (which are available for all document types) and copy this information over to the output PDF. See demo `pdf-converter.py <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/demo/pdf-converter.py>`_.
 
-.. [#f3] For applicable (EPUB) document types, loading a page via its absolute number may result in layouting a large part of the document, before the page can be accessed. To avoid this performance impact, prefer chapter-based access. Use convenience methods / attributes :meth:`Document.next_location`, :meth:`Document.prev_location` and :attr:`Document.last_location` for maintaining a high level of coding efficiency.
+.. [#f3] For applicable (EPUB) document types, loading a page via its absolute number may result in layouting a large part of the document, before the page can be accessed. To avoid this performance impact, prefer chapter-based access. Use convenience methods and attributes :meth:`Document.next_location`, :meth:`Document.prev_location` and :attr:`Document.last_location` for maintaining a high level of coding efficiency.
 
 .. [#f4] These parameters cause separate handling of stream categories: use it together with ``expand`` to restrict decompression to streams other than images / fontfiles.
 
 .. [#f5] Examples for "Form XObjects" are created by :meth:`Page.show_pdf_page`.
 
 .. [#f6] For a *False* the **complete document** must be scanned. Both methods **do not load pages,** but only scan object definitions. This makes them at least 10 times faster than application-level loops (where total response time roughly equals the time for loading all pages). For the :ref:`AdobeManual` (1'310 pages) and the Pandas documentation (over 3'070 pages) -- both havo no annotations -- the method needs about 11 ms for the answer *False*. So response times will probably become significant only well beyond this order of magnitude.
 
+.. [#f7] This only works under certain conditions. For example, if there is normal text covered by some image on top of it, then this is undetectable and the respective text is **not** removed. Similar is true for white text on white background, and so on.
@@ -77,9 +77,9 @@ In the above we construct *clip* by specifying two diagonally opposite points: t
 
 ----------
 
-How to Fit a Clip to a GUI Window
+How to Zoom a Clip to a GUI Window
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-This is similar to the previous section. This time, we want to **compute the zoom factor** for a clip such, that its image best fits a given GUI window. This means, that either the clip image's width or height (or both) will equal the window dimension.
+Please also read the previous section. This time we want to **compute the zoom factor** for a clip such that its image best fits a given GUI window. This means, that the image's width or height (or both) will equal the window dimension.
 
 ::
 
@@ -89,21 +89,21 @@ This is similar to the previous section. This time, we want to **compute the zoo
     # compare width/height ratios of image and window
 
     if clip.width / clip.height < WIDTH / HEIGHT:
-        # clip is narrower
-        zoom = HEIGHT / clip.height  # hence fit window height
-    else:
-        zoom = WIDTH / clip.width  # else fit window width
+        # clip is narrower: zoom to window height
+        zoom = HEIGHT / clip.height
+    else:  # else zoom to window width
+        zoom = WIDTH / clip.width
     mat = fitz.Matrix(zoom, zoom)
     pix = page.get_pixmap(matrix=mat, clip=clip)
 
-Now assume you **have the zoom factor** and need to compute the fitting clip.
+Now assume you **have** the zoom factor and need to compute the fitting clip.
 
-In this case we again have ``zoom = HEIGHT/clip.height = WIDTH/clip.width``, so we must set ``clip.height = HEIGHT/zoom`` and, similarly ``clip.width = WIDTH/zoom``. Now you only need to choose a top-left point ``tl`` of the clip on the page to compute the right pixmap::
+In this case we have ``zoom = HEIGHT/clip.height = WIDTH/clip.width``, so we must set ``clip.height = HEIGHT/zoom`` and, ``clip.width = WIDTH/zoom``. Choose the top-left point ``tl`` of the clip on the page to compute the right pixmap::
 
     width = WIDTH / zoom
     height = HEIGHT / zoom
     clip = fitz.Rect(tl, tl.x + width, tl.y + height)  
-    # make sure we still are inside the page
+    # ensure we still are inside the page
     clip &= page.rect
     mat = fitz.Matrix(zoom, zoom)
     pix = fitz.Pixmap(matrix=mat, clip=clip)
@@ -410,7 +410,7 @@ The general scheme is just the following two lines::
 .. index::
    pair: copy;examples
 
-How to Use Pixmaps: Gluing Images
+How to Use Pixmaps: Glueing Images
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 This shows how pixmaps can be used for purely graphical, non-document purposes. The script reads an image file and creates a new image which consist of 3 * 4 tiles of the original::
@@ -956,6 +956,7 @@ All of the above is provided by three basic :ref:`Page`, resp. :ref:`Shape` meth
 * :meth:`Page.insert_font` -- install a font for the page for later reference. The result is reflected in the output of :meth:`Document.get_page_fonts`. The font can be:
 
     - provided as a file,
+    - via :ref:`Font` (then use :attr:`Font.buffer`)
     - already present somewhere in **this or another** PDF, or
     - be a **built-in** font.
 
@@ -1353,9 +1354,45 @@ Extracting Drawings
 
 The drawing commands issued by a page can be extracted. Interestingly, this is possible for **all supported document types** -- not just PDF: so you can use it for XPS, EPUB and others as well.
 
-A new page method, :meth:`Page.get_drawings()` accesses draw commands and converts them into a list of Python dictionaries. Each dictionary -- called a "path" -- represents a separate drawing -- it may be simple like a single line, or a complex combination of lines and curves representing one of the shapes of the previous section.
+Page method, :meth:`Page.get_drawings()` accesses draw commands and converts them into a list of Python dictionaries. Each dictionary -- called a "path" -- represents a separate drawing -- it may be simple like a single line, or a complex combination of lines and curves representing one of the shapes of the previous section.
+
+The *path* dictionary has been designed such that it can easily be used by the :ref:`Shape` class and its methods. Here is an example for a page with one path, that draws a red-bordered yellow circle inside rectangle `Rect(100, 100, 200, 200)`::
+
+    >>> pprint(page.get_drawings())
+    [{'closePath': True,
+    'color': [1.0, 0.0, 0.0],
+    'dashes': '[] 0',
+    'even_odd': False,
+    'fill': [1.0, 1.0, 0.0],
+    'items': [('c',
+                Point(100.0, 150.0),
+                Point(100.0, 177.614013671875),
+                Point(122.38600158691406, 200.0),
+                Point(150.0, 200.0)),
+                ('c',
+                Point(150.0, 200.0),
+                Point(177.61399841308594, 200.0),
+                Point(200.0, 177.614013671875),
+                Point(200.0, 150.0)),
+                ('c',
+                Point(200.0, 150.0),
+                Point(200.0, 122.385986328125),
+                Point(177.61399841308594, 100.0),
+                Point(150.0, 100.0)),
+                ('c',
+                Point(150.0, 100.0),
+                Point(122.38600158691406, 100.0),
+                Point(100.0, 122.385986328125),
+                Point(100.0, 150.0))],
+    'lineCap': (0, 0, 0),
+    'lineJoin': 0,
+    'opacity': 1.0,
+    'rect': Rect(100.0, 100.0, 200.0, 200.0),
+    'width': 1.0}]
+    >>> 
+
+.. note:: You need (at least) 4 Bézier curves (of 3rd order) to draw a circle with acceptable precision. See this `Wikipedia article<https://en.wikipedia.org/wiki/B%C3%A9zier_curve>`_ for some background.
 
-The *path* dictionary has been designed such that it can easily be used by the :ref:`Shape` class and its methods.
 
 The following is a code snippet which extracts the drawings of a page and re-draws them on a new page::
 
 
@@ -56,7 +56,7 @@ Now perform a *python setup.py install*.
 Option 2: Install from Binaries
 --------------------------------
 You can install PyMuPDF from Python wheels. Wheels are *self-contained*, i.e. you will **not need any other software** nor download / install MuPDF to run PyMuPDF scripts.
-This installation option is available for all MS Windows and the most **popular 64-bit** Mac OSX and Linux platforms for Python versions 3.6 through 3.9. Since version 1.18.13, Linux ARM 64-bit architectures are also supported.
+This installation option is available for all MS Windows and the most **popular 64-bit** Mac OSX and Linux platforms for Python versions 3.6 through 3.9. Since version 1.18.13, Linux ARM 64-bit architectures are also supported, and since version 1.18.15, Mac OSX universal architectures, too.
 Windows binaries are provided for Python 64-bit **and** 32-bit versions.
 
 **Overview of wheel names (PyMuPDF version is x.xx.xx):**
@@ -66,5 +66,5 @@ Windows binaries are provided for Python 64-bit **and** 32-bit versions.
 
 Older versions can be found in the releases directory of our home page https://github.com/pymupdf/PyMuPDF/releases.
 
-If you unexpectedly run into problems installing the wheel for your system, please make sure you have updated your PIP to the current version.
+Please **always** make sure you have updated your PIP to the current version and always invoke pip as a module within the right Python version ``python -m pip install ...``.
 
@@ -10,12 +10,12 @@ There is a parent-child relationship between a link and its page. If the page ob
 ========================= ============================================
 **Attribute**             **Short Description**
 ========================= ============================================
-:meth:`Link.setBorder`    modify border properties
-:meth:`Link.setColors`    modify color properties
+:meth:`Link.set_border`   modify border properties
+:meth:`Link.set_colors`   modify color properties
 :attr:`Link.border`       border characteristics
 :attr:`Link.colors`       border line color
-:attr:`Link.dest`         points to link destination details
-:attr:`Link.isExternal`   external link destination?
+:attr:`Link.dest`         points to destination details
+:attr:`Link.is_external`  external destination?
 :attr:`Link.next`         points to next link
 :attr:`Link.rect`         clickable area in untransformed coordinates.
 :attr:`Link.uri`          link destination
@@ -26,7 +26,7 @@ There is a parent-child relationship between a link and its page. If the page ob
 
 .. class:: Link
 
-   .. method:: setBorder(border=None, width=0, style=None, dashes=None)
+   .. method:: set_border(border=None, width=0, style=None, dashes=None)
 
       PDF only: Change border width and dashing properties.
 
@@ -38,20 +38,21 @@ There is a parent-child relationship between a link and its page. If the page ob
       :arg str style: see above.
       :arg sequence dashes: see above.
 
-   .. method:: setColors(colors=None, stroke=None, fill=None)
+   .. method:: set_colors(colors=None, stroke=None)
 
-      Changes the "stroke" and "fill" colors.
+      Changes the "stroke" color.
+      
+      .. note:: In PDF, links are a subtype of annotations technically and **do not support fill colors**. However, to keep a consistent API, we do allow specifying a ``fill=`` parameter like with all annotations, which will be ignored with a warning.
 
       *(Changed in version 1.16.9)* Allow colors to be directly set. These parameters are used if *colors* is not a dictionary.
 
       :arg dict colors: a dictionary containing color specifications. For accepted dictionary keys and values see below. The most practical way should be to first make a copy of the *colors* property and then modify this dictionary as required.
       :arg sequence stroke: see above.
-      :arg sequence fill: see above.
 
 
    .. attribute:: colors
 
-      Meaningful for PDF only: A dictionary of two lists of floats in range *0 <= float <= 1* specifying the *stroke* and the interior (*fill*) colors. If not a PDF, *None* is returned. The stroke color is used for borders and everything that is actively painted or written ("stroked"). The lengths of these lists implicitely determine the colorspaces used: 1 = GRAY, 3 = RGB, 4 = CMYK. So *[1.0, 0.0, 0.0]* stands for RGB color red. Both lists can be *[]* if no color is specified. The value of each float *f* is mapped to the integer value *i* in range 0 to 255 via the computation *f = i / 255*.
+      Meaningful for PDF only: A dictionary of two tuples of floats in range ``0 <= float <= 1`` specifying the *stroke* and the interior (*fill*) colors. If not a PDF, *None* is returned. As mentioned above, the fill color is always ``None`` for links. The stroke color is used for the border of the link rectangle. The length of the tuple implicitely determines the colorspace: 1 = GRAY, 3 = RGB, 4 = CMYK. So ``(1.0, 0.0, 0.0)`` stands for RGB color red. The value of each float *f* is mapped to the integer value *i* in range 0 to 255 via the computation *f = i / 255*.
 
       :rtype: dict