You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/changes.rst
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,7 @@ Changes in Version 1.18.9
11
11
* **Changed** :meth:`Document.subset_fonts`: Text is **not rewritten** any more and should therefore **retain all its origial properties** -- like being hidden or being controlled by Optional Content mechanisms.
12
12
* **Changed** :ref:`TextWriter` output to also accept text in right to left mode (Arabian, Hebrew): :meth:`TextWriter.fill_textbox`, :meth:`TextWriter.append`. These methods now accept a new boolean parameter `right_to_left`, which is *False* by default. Implements `#897 <https://github.com/pymupdf/PyMuPDF/issues/897>`_.
13
13
* **Changed** :meth:`TextWriter.fill_textbox` to return all lines of text, that did not fit in the given rectangle. Also changed the default of the ``warn`` parameter to no longer print a warning message in overflow situations.
14
+
* **Added** a utility function :meth:`recover_quad`, which computes the quadrilateral of a span. This function can be used when quadrilaterals for text extracted with the "dict" or "rawdict" options of :meth:`Page.get_text`.
Copy file name to clipboardExpand all lines: docs/faq.rst
+3-22Lines changed: 3 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -818,30 +818,11 @@ How to Mark Non-horizontal Text
818
818
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
819
819
The previous section already shows an example for marking non-horizontal text detected by text **searching**.
820
820
821
-
But text **extraction** with the "dict" option of :meth:`Page.get_text` may also return text with a non-zero angle to the x-axis. This is reflected by the value of the ``"dir"`` key of the line dictionary: it is the tuple ``(cosine, sine)`` of that angle.
821
+
But text **extraction** with the "dict" / "rawdict" options of :meth:`Page.get_text` may also return text with a non-zero angle to the x-axis. This is reflected by the value of the ``"dir"`` key of the line dictionary: it is the tuple ``(cosine, sine)`` of that angle. If this value **does not equal** ``(1, 0)``, then the extracted text / characters are rotated by some angle != 0.
822
822
823
-
Currently, all bboxes returned by the method's are rectangles only -- no quads. So we can mark the span text (correctly) only, if the **angle is 0, 90, 180 or 270 degrees.**
824
-
825
-
In this case we can convert the span bbox into the right quad by choosing the right sequence of its corners::
826
-
827
-
r = fitz.Rect(span["bbox"])
828
-
829
-
if line["dir"] == (1, 0): # rotation 0
830
-
q = fitz.Quad(r)
831
-
832
-
elif line["dir"] == (0, -1): # rotation 90
833
-
q = fitz.Quad(r.bl, r.tl, r.br, r.tr)
834
-
835
-
elif line["dir"] == (-1, 0): # rotation 180
836
-
q = fitz.Quad(r.br, r.bl, r.tr, r.tl)
837
-
838
-
elif line["dir"] == (0, 1): # rotation 270
839
-
q = fitz.Quad(r.tr, r.br, r.tl, r.bl)
840
-
841
-
else:
842
-
q = fitz.Quad(r)
843
-
print("warning: unsupported text flow")
823
+
All bboxes returned by the method are rectangles only -- no quads. In order to mark the span text correctly (or fitting a quad around it), its quadrilateral must be recovered from the data in the line and the span. Do this with the following utility function::
Copy file name to clipboardExpand all lines: docs/functions.rst
+15-1Lines changed: 15 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,6 +48,7 @@ Yet others are handy, general-purpose utilities.
48
48
:meth:`PaperRect` return rectangle for a known paper format
49
49
:meth:`sRGB_to_pdf` return PDF RGB color tuple from a sRGB integer
50
50
:meth:`sRGB_to_rgb` return (R, G, B) color tuple from a sRGB integer
51
+
:meth:`recover_quad` return the quad for a text span ("dict" / "rawdict")
51
52
:meth:`glyph_name_to_unicode` return unicode from a glyph name
52
53
:meth:`unicode_to_glyph_name` return glyph name from a unicode
53
54
:meth:`make_table` split rectangle in sub-rectangles
@@ -160,12 +161,25 @@ Yet others are handy, general-purpose utilities.
160
161
161
162
*New in v1.17.4*
162
163
163
-
Convenience function returning a color (red, green, blue) for a given *sRGB* color integer.
164
+
Convenience function returning a color (red, green, blue) for a given *sRGB* color integer.
164
165
165
166
:arg int srgb: an integer of format RRGGBB, where each color component is an integer in range(255).
166
167
167
168
:returns: a tuple (red, green, blue) with integer items in intervall *0 <= item <= 255* representing the same color.
168
169
170
+
-----
171
+
172
+
.. method:: recover_quad(line_dir, span)
173
+
174
+
*New in v1.18.9*
175
+
176
+
Convenience function returning the quadrilateral envelopping the text of a text span, as returned by :meth:`Page.get_text` using the "dict" or "rawdict" options.
177
+
178
+
:arg tuple line_dict: the value ``line["dir"]`` of the span's line.
Copy file name to clipboardExpand all lines: docs/page.rst
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -303,9 +303,9 @@ In a nutshell, this is what you can do with PyMuPDF:
303
303
>>> page.addHighlightAnnot(quads)
304
304
305
305
.. note::
306
-
Obviously, text marker annotations need to know what is the top and the bottom, the left and the right side of the tetragon to be marked. If the arguments are quads, this information is given by the sequence of the quad points. In contrast, a rectangle delivers much less information -- this is illustrated by the fact, that 4! = 24 different quads can be constructed with the four corners of each reactangle.
306
+
Obviously, text marker annotations need to know what is the top, the bottom, the left, and the right side of the tetragon to be marked. If the arguments are quads, this information is given by the sequence of the quad points. In contrast, a rectangle delivers much less information -- this is illustrated by the fact, that 4! = 24 different quads can be constructed with the four corners of each reactangle.
307
307
308
-
Therefore, we **strongly recommend** to use the ``quads`` option for text searches, to ensure correct text markers. For more details on text marking see section "How to Mark Non-horizontal Text" of :ref:`FAQ`.
308
+
Therefore, we **strongly recommend** to use the ``quads`` option for text searches, to ensure correct text markers. A similar consideration applies to **marking text spans** extracted with the "dict" / "rawdict" options of :meth:`Page.get_text`. For more details on text marking see section "How to Mark Non-horizontal Text" of :ref:`FAQ`.
309
309
310
310
:arg rect_like,quad_like,list,tuple quads: *(Changed in v1.14.20)* the location(s) -- rectangle(s) or quad(s) -- to be marked. A list or tuple must consist of :data:`rect_like` or :data:`quad_like` items (or even a mixture of either). Every item must be finite, convex and not empty (as applicable). *(Changed in v1.16.14)* **Set this parameter to** *None* if you want to use the following arguments.
311
311
:arg point_like start: *(New in v1.16.14)* start text marking at this point. Defaults to the top-left point of *clip*.
This documentation covers PyMuPDF v1.18.9 features as of **2021-02-25 12:22:20**.
4
+
This documentation covers PyMuPDF v1.18.9 features as of **2021-02-26 13:46:32**.
5
5
6
6
.. note:: The major and minor versions of **PyMuPDF** and **MuPDF** will always be the same. Only the third qualifier (patch level) may deviate from that of MuPDF.
0 commit comments