Skip to content

Commit 913ea5e

Browse files
committed
upload v1.18.9
1 parent cfc0019 commit 913ea5e

File tree

8 files changed

+73
-28
lines changed

8 files changed

+73
-28
lines changed

docs/changes.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Changes in Version 1.18.9
1111
* **Changed** :meth:`Document.subset_fonts`: Text is **not rewritten** any more and should therefore **retain all its origial properties** -- like being hidden or being controlled by Optional Content mechanisms.
1212
* **Changed** :ref:`TextWriter` output to also accept text in right to left mode (Arabian, Hebrew): :meth:`TextWriter.fill_textbox`, :meth:`TextWriter.append`. These methods now accept a new boolean parameter `right_to_left`, which is *False* by default. Implements `#897 <https://github.com/pymupdf/PyMuPDF/issues/897>`_.
1313
* **Changed** :meth:`TextWriter.fill_textbox` to return all lines of text, that did not fit in the given rectangle. Also changed the default of the ``warn`` parameter to no longer print a warning message in overflow situations.
14+
* **Added** a utility function :meth:`recover_quad`, which computes the quadrilateral of a span. This function can be used when quadrilaterals for text extracted with the "dict" or "rawdict" options of :meth:`Page.get_text`.
1415

1516
Changes in Version 1.18.8
1617
-------------------------

docs/faq.rst

Lines changed: 3 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -818,30 +818,11 @@ How to Mark Non-horizontal Text
818818
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
819819
The previous section already shows an example for marking non-horizontal text detected by text **searching**.
820820

821-
But text **extraction** with the "dict" option of :meth:`Page.get_text` may also return text with a non-zero angle to the x-axis. This is reflected by the value of the ``"dir"`` key of the line dictionary: it is the tuple ``(cosine, sine)`` of that angle.
821+
But text **extraction** with the "dict" / "rawdict" options of :meth:`Page.get_text` may also return text with a non-zero angle to the x-axis. This is reflected by the value of the ``"dir"`` key of the line dictionary: it is the tuple ``(cosine, sine)`` of that angle. If this value **does not equal** ``(1, 0)``, then the extracted text / characters are rotated by some angle != 0.
822822

823-
Currently, all bboxes returned by the method's are rectangles only -- no quads. So we can mark the span text (correctly) only, if the **angle is 0, 90, 180 or 270 degrees.**
824-
825-
In this case we can convert the span bbox into the right quad by choosing the right sequence of its corners::
826-
827-
r = fitz.Rect(span["bbox"])
828-
829-
if line["dir"] == (1, 0): # rotation 0
830-
q = fitz.Quad(r)
831-
832-
elif line["dir"] == (0, -1): # rotation 90
833-
q = fitz.Quad(r.bl, r.tl, r.br, r.tr)
834-
835-
elif line["dir"] == (-1, 0): # rotation 180
836-
q = fitz.Quad(r.br, r.bl, r.tr, r.tl)
837-
838-
elif line["dir"] == (0, 1): # rotation 270
839-
q = fitz.Quad(r.tr, r.br, r.tl, r.bl)
840-
841-
else:
842-
q = fitz.Quad(r)
843-
print("warning: unsupported text flow")
823+
All bboxes returned by the method are rectangles only -- no quads. In order to mark the span text correctly (or fitting a quad around it), its quadrilateral must be recovered from the data in the line and the span. Do this with the following utility function::
844824

825+
q = fitz.recover_quad(line["dir"], span)
845826
annot = page.addHighlightAnnot(q)
846827

847828

docs/functions.rst

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ Yet others are handy, general-purpose utilities.
4848
:meth:`PaperRect` return rectangle for a known paper format
4949
:meth:`sRGB_to_pdf` return PDF RGB color tuple from a sRGB integer
5050
:meth:`sRGB_to_rgb` return (R, G, B) color tuple from a sRGB integer
51+
:meth:`recover_quad` return the quad for a text span ("dict" / "rawdict")
5152
:meth:`glyph_name_to_unicode` return unicode from a glyph name
5253
:meth:`unicode_to_glyph_name` return glyph name from a unicode
5354
:meth:`make_table` split rectangle in sub-rectangles
@@ -160,12 +161,25 @@ Yet others are handy, general-purpose utilities.
160161

161162
*New in v1.17.4*
162163

163-
Convenience function returning a color (red, green, blue) for a given *sRGB* color integer .
164+
Convenience function returning a color (red, green, blue) for a given *sRGB* color integer.
164165

165166
:arg int srgb: an integer of format RRGGBB, where each color component is an integer in range(255).
166167

167168
:returns: a tuple (red, green, blue) with integer items in intervall *0 <= item <= 255* representing the same color.
168169

170+
-----
171+
172+
.. method:: recover_quad(line_dir, span)
173+
174+
*New in v1.18.9*
175+
176+
Convenience function returning the quadrilateral envelopping the text of a text span, as returned by :meth:`Page.get_text` using the "dict" or "rawdict" options.
177+
178+
:arg tuple line_dict: the value ``line["dir"]`` of the span's line.
179+
:arg dict span: the span sub-dictionary.
180+
181+
:returns: the quadrilateral of the span's text.
182+
169183
-----
170184

171185
.. method:: make_table(rect, cols=1, rows=1)

docs/page.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -303,9 +303,9 @@ In a nutshell, this is what you can do with PyMuPDF:
303303
>>> page.addHighlightAnnot(quads)
304304

305305
.. note::
306-
Obviously, text marker annotations need to know what is the top and the bottom, the left and the right side of the tetragon to be marked. If the arguments are quads, this information is given by the sequence of the quad points. In contrast, a rectangle delivers much less information -- this is illustrated by the fact, that 4! = 24 different quads can be constructed with the four corners of each reactangle.
306+
Obviously, text marker annotations need to know what is the top, the bottom, the left, and the right side of the tetragon to be marked. If the arguments are quads, this information is given by the sequence of the quad points. In contrast, a rectangle delivers much less information -- this is illustrated by the fact, that 4! = 24 different quads can be constructed with the four corners of each reactangle.
307307

308-
Therefore, we **strongly recommend** to use the ``quads`` option for text searches, to ensure correct text markers. For more details on text marking see section "How to Mark Non-horizontal Text" of :ref:`FAQ`.
308+
Therefore, we **strongly recommend** to use the ``quads`` option for text searches, to ensure correct text markers. A similar consideration applies to **marking text spans** extracted with the "dict" / "rawdict" options of :meth:`Page.get_text`. For more details on text marking see section "How to Mark Non-horizontal Text" of :ref:`FAQ`.
309309

310310
:arg rect_like,quad_like,list,tuple quads: *(Changed in v1.14.20)* the location(s) -- rectangle(s) or quad(s) -- to be marked. A list or tuple must consist of :data:`rect_like` or :data:`quad_like` items (or even a mixture of either). Every item must be finite, convex and not empty (as applicable). *(Changed in v1.16.14)* **Set this parameter to** *None* if you want to use the following arguments.
311311
:arg point_like start: *(New in v1.16.14)* start text marking at this point. Defaults to the top-left point of *clip*.

docs/version.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Covered Version
22
--------------------
33

4-
This documentation covers PyMuPDF v1.18.9 features as of **2021-02-25 12:22:20**.
4+
This documentation covers PyMuPDF v1.18.9 features as of **2021-02-26 13:46:32**.
55

66
.. note:: The major and minor versions of **PyMuPDF** and **MuPDF** will always be the same. Only the third qualifier (patch level) may deviate from that of MuPDF.

fitz/helper-python.i

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1359,6 +1359,54 @@ def make_table(rect: rect_like =(0, 0, 1, 1), cols: int =1, rows: int =1) -> lis
13591359
return rects
13601360
13611361
1362+
def recover_quad(line_dir, span):
1363+
"""Recover the quadrilateral of a text span.
1364+
1365+
Args:
1366+
line_dir: the value 'line["dir"]' of the span's line, which is
1367+
a tuple (cos, sin) of the text angle with the x-axis.
1368+
span: the span dictionary
1369+
Returns:
1370+
The quadrilateral envelopping the span's text.
1371+
"""
1372+
if type(line_dir) is not tuple or len(line_dir) != 2:
1373+
raise ValueError("bad line dir argument")
1374+
if type(span) is not dict:
1375+
raise ValueError("bad span argument")
1376+
cos, sin = line_dir
1377+
bbox = Rect(span["bbox"])
1378+
1379+
if TOOLS.set_small_glyph_heights(): # ==> just fontsize
1380+
d = 1
1381+
else:
1382+
d = span["ascender"] - span["descender"]
1383+
1384+
height = d * span["size"]
1385+
hs = height * sin
1386+
hc = height * cos
1387+
if hc >= 0 and hs <= 0: # Quadrant 1
1388+
ul = bbox.bl - (0, hc)
1389+
ur = bbox.tr + (hs, 0)
1390+
ll = bbox.bl - (hs, 0)
1391+
lr = bbox.tr + (0, hc)
1392+
elif hc <= 0 and hs <= 0: # Quadrant 2
1393+
ul = bbox.br + (hs, 0)
1394+
ur = bbox.tl - (0, hc)
1395+
ll = bbox.br + (0, hc)
1396+
lr = bbox.tl - (hs, 0)
1397+
elif hc <= 0 and hs >= 0: # Quadrant 3
1398+
ul = bbox.tr - (0, hc)
1399+
ur = bbox.bl + (hs, 0)
1400+
ll = bbox.tr - (hs, 0)
1401+
lr = bbox.bl + (0, hc)
1402+
else: # Quadrant 4
1403+
ul = bbox.tl + (hs, 0)
1404+
ur = bbox.br - (0, hc)
1405+
ll = bbox.tl + (0, hc)
1406+
lr = bbox.br - (hs, 0)
1407+
return Quad(ul, ur, ll, lr)
1408+
1409+
13621410
def repair_mono_font(page: "Page", font: "Font") -> None:
13631411
"""Repair character spacing for mono fonts.
13641412

fitz/utils.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1087,6 +1087,7 @@ def setToC(
10871087
old_xrefs = doc._delToC() # del old outlines, get their xref numbers
10881088

10891089
# prepare table of xrefs for new bookmarks
1090+
old_xrefs = []
10901091
xref = [0] + old_xrefs
10911092
xref[0] = doc._getOLRootNumber() # entry zero is outline root xref number
10921093
if toclen > len(old_xrefs): # too few old xrefs?

fitz/version.i

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
%pythoncode %{
22
VersionFitz = "1.18.0"
33
VersionBind = "1.18.9"
4-
VersionDate = "2021-02-25 12:22:20"
5-
version = (VersionBind, VersionFitz, "20210225122220")
4+
VersionDate = "2021-02-26 13:46:32"
5+
version = (VersionBind, VersionFitz, "20210226134632")
66
%}

0 commit comments

Comments
 (0)