You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/annot.rst
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -195,7 +195,7 @@ There is a parent-child relationship between an annotation and its page. If the
195
195
196
196
Three overlapping 'Circle' annotations with each opacity set to 0.5:
197
197
198
-
.. image:: images/img-opacity.jpg
198
+
.. image:: images/img-opacity.*
199
199
200
200
.. attribute:: blendmode
201
201
@@ -322,7 +322,7 @@ There is a parent-child relationship between an annotation and its page. If the
322
322
* 'Line', 'Polyline', 'Polygon' annotations: use it to give applicable line end symbols a fill color other than that of the annotation *(changed in v1.16.16)*.
323
323
324
324
:arg bool cross_out: *(new in v1.17.2)* add two diagonal lines to the annotation rectangle. 'Redact' annotations only. If not desired, *False* must be specified even if the annotation was created with *False*.
325
-
:arg int rotate: new rotation value. Default (-1) means no change. Supports 'FreeText' and several other annotation types (see :meth:`Annot.setRotation`), [#f1]_. Only choose 0, 90, 180, or 270 degrees for 'FreeText'. Otherwise any integer is acceptable.
325
+
:arg int rotate: new rotation value. Default (-1) means no change. Supports 'FreeText' and several other annotation types (see :meth:`Annot.set_rotation`), [#f1]_. Only choose 0, 90, 180, or 270 degrees for 'FreeText'. Otherwise any integer is acceptable.
326
326
327
327
:rtype: bool
328
328
@@ -515,7 +515,7 @@ Annotation Icons in MuPDF
515
515
-------------------------
516
516
This is a list of icons referencable by name for annotation types 'Text' and 'FileAttachment'. You can use them via the *icon* parameter when adding an annotation, or use the as argument in :meth:`Annot.setName`. It is left to your discretion which item to choose when -- no mechanism will keep you from using e.g. the "Speaker" icon for a 'FileAttachment'.
517
517
518
-
.. image:: images/mupdf-icons.jpg
518
+
.. image:: images/mupdf-icons.*
519
519
520
520
521
521
Example
@@ -547,7 +547,7 @@ This is how the circle annotation looks like before and after the change (pop-up
Copy file name to clipboardExpand all lines: docs/app1.rst
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ Following are three sections that deal with different aspects of performance:
12
12
13
13
In each section, the same fixed set of PDF files is being processed by a set of tools. The set of tools varies -- for reasons we will explain in the section.
14
14
15
-
.. |fsizes| image:: images/img-filesizes.png
15
+
.. |fsizes| image:: images/img-filesizes.*
16
16
17
17
Here is the list of files we are using. Each file name is accompanied by further information: **size** in bytes, number of **pages**, number of bookmarks (**toc** entries), number of **links**, **text** size as a percentage of file size, **KB** per page, PDF **version** and remarks. **text %** and **KB index** are indicators for whether a file is text or graphics oriented.
18
18
|fsizes|
@@ -72,8 +72,8 @@ This is how each of the tools was used:
These are our run time findings (in **seconds**, please note the European number convention: meaning of decimal point and comma is reversed):
79
79
@@ -115,7 +115,7 @@ All tools have been used with their most basic, fanciless functionality -- no la
115
115
116
116
For demonstration purposes, we have included a version of *GetText(doc, output = "json")*, that also re-arranges the output according to occurrence on the page.
Copy file name to clipboardExpand all lines: docs/app2.rst
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,18 +33,18 @@ A **span** consists of adjacent characters with identical font properties: name,
33
33
Plain Text
34
34
~~~~~~~~~~
35
35
36
-
Function :meth:`TextPage.extractText` (or *Page.getText("text")*) extracts a page's plain **text in original order** as specified by the creator of the document (which may not equal a natural reading order).
36
+
Function :meth:`TextPage.extractText` (or *Page.get_text("text")*) extracts a page's plain **text in original order** as specified by the creator of the document (which may not equal a natural reading order).
37
37
38
38
An example output::
39
39
40
-
>>> print(page.getText("text"))
40
+
>>> print(page.get_text("text"))
41
41
Some text on first page.
42
42
43
43
44
44
BLOCKS
45
45
~~~~~~~~~~
46
46
47
-
Function :meth:`TextPage.extractBLOCKS` (or *Page.getText("blocks")*) extracts a page's text blocks as a list of items like::
47
+
Function :meth:`TextPage.extractBLOCKS` (or *Page.get_text("blocks")*) extracts a page's text blocks as a list of items like::
48
48
49
49
(x0, y0, x1, y1, "lines in block", block_type, block_no)
50
50
@@ -54,15 +54,15 @@ This is a high-speed method with enough information to re-arrange the page's tex
:meth:`TextPage.extractHTML` (or *Page.getText("html")* output fully reflects the structure of the page's *TextPage* -- much like DICT / JSON below. This includes images, font information and text positions. If wrapped in HTML header and trailer code, it can readily be displayed by an internet browser. Our above example::
91
+
:meth:`TextPage.extractHTML` (or *Page.get_text("html")* output fully reflects the structure of the page's *TextPage* -- much like DICT / JSON below. This includes images, font information and text positions. If wrapped in HTML header and trailer code, it can readily be displayed by an internet browser. Our above example::
92
92
93
-
>>> for line in page.getText("html").splitlines():
93
+
>>> for line in page.get_text("html").splitlines():
@@ -153,7 +153,7 @@ To address the font issue, you can use a simple utility script to scan through t
153
153
DICT (or JSON)
154
154
~~~~~~~~~~~~~~~~
155
155
156
-
:meth:`TextPage.extractDICT` (or *Page.getText("dict")*) output fully reflects the structure of a *TextPage* and provides image content and position details (*bbox* -- boundary boxes in pixel units) for every block and line. This information can be used to present text in another reading order if required (e.g. from top-left to bottom-right). Images are stored as *bytes* (*bytearray* in Python 2) for DICT output and base64 encoded strings for JSON output.
156
+
:meth:`TextPage.extractDICT` (or *Page.get_text("dict")*) output fully reflects the structure of a *TextPage* and provides image content and position details (*bbox* -- boundary boxes in pixel units) for every block and line. This information can be used to present text in another reading order if required (e.g. from top-left to bottom-right). Images are stored as *bytes* (*bytearray* in Python 2) for DICT output and base64 encoded strings for JSON output.
157
157
158
158
For a visuallization of the dictionary structure have a look at :ref:`textpagedict`.
159
159
@@ -183,7 +183,7 @@ Here is how this looks like::
183
183
184
184
RAWDICT
185
185
~~~~~~~~~~~~~~~~
186
-
:meth:`TextPage.extractRAWDICT` (or *Page.getText("rawdict")*) is an **information superset of DICT** and takes the detail level one step deeper. It looks exactly like the above, except that the *"text"* items (*string*) are replaced by *"chars"* items (*list*). Each *"chars"* entry is a character *dict*. For example, here is what you would see in place of item *"text": "Text in black color."* above::
186
+
:meth:`TextPage.extractRAWDICT` (or *Page.get_text("rawdict")*) is an **information superset of DICT** and takes the detail level one step deeper. It looks exactly like the above, except that the *"text"* items (*string*) are replaced by *"chars"* items (*list*). Each *"chars"* entry is a character *dict*. For example, here is what you would see in place of item *"text": "Text in black color."* above::
187
187
188
188
"chars": [{
189
189
"origin": [50.0, 100.0],
@@ -216,9 +216,9 @@ RAWDICT
216
216
XML
217
217
~~~
218
218
219
-
The :meth:`TextPage.extractXML` (or *Page.getText("xml")*) version extracts text (no images) with the detail level of RAWDICT::
219
+
The :meth:`TextPage.extractXML` (or *Page.get_text("xml")*) version extracts text (no images) with the detail level of RAWDICT::
220
220
221
-
>>> for line in page.getText("xml").splitlines():
221
+
>>> for line in page.get_text("xml").splitlines():
222
222
print(line)
223
223
224
224
<page id="page0" width="300" height="350">
@@ -249,7 +249,7 @@ The :meth:`TextPage.extractXML` (or *Page.getText("xml")*) version extracts text
249
249
250
250
XHTML
251
251
~~~~~
252
-
:meth:`TextPage.extractXHTML` (or *Page.getText("xhtml")*) is a variation of TEXT but in HTML format, containing the bare text and images ("semantic" output)::
252
+
:meth:`TextPage.extractXHTML` (or *Page.get_text("xhtml")*) is a variation of TEXT but in HTML format, containing the bare text and images ("semantic" output)::
253
253
254
254
<div id="page0">
255
255
<p>Some text on first page.</p>
@@ -259,7 +259,7 @@ XHTML
259
259
260
260
Text Extraction Flags Defaults
261
261
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
262
-
*(New in version 1.16.2)* Method :meth:`Page.getText` supports a keyword parameter *flags* *(int)* to control the amount and the quality of extracted data. The following table shows the defaults settings (flags parameter omitted or None) for each extraction variant. If you specify flags with a value other than *None*, be aware that you must set **all desired** options. A description of the respective bit settings can be found in :ref:`TextPreserve`.
262
+
*(New in version 1.16.2)* Method :meth:`Page.get_text` supports a keyword parameter *flags* *(int)* to control the amount and the quality of extracted data. The following table shows the defaults settings (flags parameter omitted or None) for each extraction variant. If you specify flags with a value other than *None*, be aware that you must set **all desired** options. A description of the respective bit settings can be found in :ref:`TextPreserve`.
Copy file name to clipboardExpand all lines: docs/app3.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,4 +29,4 @@ PyMuPDF Support
29
29
------------------
30
30
We continue to support the full old API with respect to embedded files -- with only minor, cosmetic changes.
31
31
32
-
There even also is a new function, which delivers a list of all names under which embedded data are resgistered in a PDF, :meth:`Document.embeddedFileNames`.
32
+
There even also is a new function, which delivers a list of all names under which embedded data are resgistered in a PDF, :meth:`Document.embfile_names`.
Copy file name to clipboardExpand all lines: docs/app4.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -113,7 +113,7 @@ Python on the other hand implements the OO-model in a very clean way. The interf
113
113
114
114
When you use one of PyMuPDF's objects or methods, this will result in excution of some code in *fitz.py*, which in turn will call some C code compiled with *fitz_wrap.c*.
115
115
116
-
Because SWIG goes a long way to keep the Python and the C level in sync, everything works fine, if a certain set of rules is being strictly followed. For example: **never access** a :ref:`Page` object, after you have closed (or deleted or set to *None*) the owning :ref:`Document`. Or, less obvious: **never access** a page or any of its children (links or annotations) after you have executed one of the document methods *select()*, *deletePage()*, *insert_page()* ... and more.
116
+
Because SWIG goes a long way to keep the Python and the C level in sync, everything works fine, if a certain set of rules is being strictly followed. For example: **never access** a :ref:`Page` object, after you have closed (or deleted or set to *None*) the owning :ref:`Document`. Or, less obvious: **never access** a page or any of its children (links or annotations) after you have executed one of the document methods *select()*, *delete_page()*, *insert_page()* ... and more.
117
117
118
118
But just no longer accessing invalidated objects is actually not enough: They should rather be actively deleted entirely, to also free C-level resources (meaning allocated memory).
0 commit comments