Skip to content

Commit 8ec2407

Browse files
committed
Documentation: Updates for more file support.
Also adds a little more info to rag.rst.
1 parent f26b673 commit 8ec2407

File tree

7 files changed

+174
-1
lines changed

7 files changed

+174
-1
lines changed

docs/about-feature-matrix.rst

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,22 @@
3939
:width: 0
4040
:height: 0
4141

42+
.. image:: images/icons/icon-docx.svg
43+
:width: 0
44+
:height: 0
45+
46+
.. image:: images/icons/icon-pptx.svg
47+
:width: 0
48+
:height: 0
49+
50+
.. image:: images/icons/icon-xlsx.svg
51+
:width: 0
52+
:height: 0
53+
54+
.. image:: images/icons/icon-hangul.svg
55+
:width: 0
56+
:height: 0
57+
4258
.. raw:: html
4359

4460

@@ -145,6 +161,26 @@
145161
background-size: 40px 40px;
146162
}
147163
164+
#feature-matrix .icon.docx {
165+
background: url("_images/icon-docx.svg") 0 0 transparent no-repeat;
166+
background-size: 40px 40px;
167+
}
168+
169+
#feature-matrix .icon.pptx {
170+
background: url("_images/icon-pptx.svg") 0 0 transparent no-repeat;
171+
background-size: 40px 40px;
172+
}
173+
174+
#feature-matrix .icon.xlsx {
175+
background: url("_images/icon-xlsx.svg") 0 0 transparent no-repeat;
176+
background-size: 40px 40px;
177+
}
178+
179+
#feature-matrix .icon.hangul {
180+
background: url("_images/icon-hangul.svg") 0 0 transparent no-repeat;
181+
background-size: 40px 40px;
182+
}
183+
148184
</style>
149185

150186

@@ -172,6 +208,12 @@
172208
<span class="icon svg"><cite>SVG</cite></span>
173209
<span class="icon txt"><cite>TXT</cite></span>
174210
<span class="icon image"><cite id="transFM3">Image</cite></span>
211+
<hr/>
212+
<span class="icon docx"><cite>DOCX</cite></span>
213+
<span class="icon xlsx"><cite>XLSX</cite></span>
214+
<span class="icon pptx"><cite>PPTX</cite></span>
215+
<span class="icon hangul"><cite>HWPX</cite></span>
216+
<span class=""><cite>See <a href="#note">note</a></cite></span>
175217
</td>
176218
<td>
177219
<span class="icon pdf"><cite>PDF</cite></span>
@@ -579,3 +621,5 @@
579621

580622

581623
<br/>
624+
625+
<div id="note"></div>

docs/about.rst

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,44 @@ The following table illustrates how |PyMuPDF| compares with other typical soluti
2121
.. include:: about-feature-matrix.rst
2222

2323

24+
----
25+
26+
.. image:: images/icons/icon-docx.svg
27+
:width: 40
28+
:height: 40
29+
30+
.. image:: images/icons/icon-xlsx.svg
31+
:width: 40
32+
:height: 40
33+
34+
.. image:: images/icons/icon-pptx.svg
35+
:width: 40
36+
:height: 40
37+
38+
39+
.. image:: images/icons/icon-hangul.svg
40+
:width: 40
41+
:height: 40
42+
43+
44+
45+
.. note::
46+
47+
A note about **Office** document types (DOCX, XLXS, PPTX) and **Hangul** documents (HWPX). These documents can be loaded into |PyMuPDF| and you will receive a :ref:`Document <Document>` object.
48+
49+
There are some caveats:
50+
51+
52+
- we convert the input to **HTML** to layout the content.
53+
- because of this the original page separation has gone.
54+
55+
When saving out the result any faithful representation of the original layout cannot be expected.
56+
57+
Therefore input files are mostly in a form that's useful for text extraction.
58+
59+
60+
----
61+
2462
.. _About_Performance:
2563

2664
Performance

docs/images/icons/icon-docx.svg

Lines changed: 19 additions & 0 deletions
Loading

docs/images/icons/icon-hangul.svg

Lines changed: 35 additions & 0 deletions
Loading

docs/images/icons/icon-pptx.svg

Lines changed: 19 additions & 0 deletions
Loading

docs/images/icons/icon-xlsx.svg

Lines changed: 18 additions & 0 deletions
Loading

docs/rag.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Integrating |PyMuPDF| into your :title:`Large Language Model (LLM)` framework an
1010

1111
There are a few well known :title:`LLM` solutions which have their own interfaces with |PyMuPDF| - it is a fast growing area, so please let us know if you discover any more!
1212

13-
If you need to export to :title:`Markdown`:
13+
If you need to export to :title:`Markdown` or obtain a :title:`LlamaIndex` Document from a file:
1414

1515
.. raw:: html
1616

0 commit comments

Comments
 (0)