speechify-to-pdf

Transfer your Speechify highlights directly into your PDF — as real, standard-compliant PDF annotations, compatible with Citavi, Zotero, Adobe Acrobat, Okular, and every other PDF reader.

If this tool saves you time, consider buying me a coffee ☕
➡ Donate via PayPal

What it does

Speechify lets you read and highlight PDFs — but your highlights stay locked inside Speechify. This tool extracts them from the saved HTML export and writes them back into your local PDF as proper annotations. Your highlights, your PDF, your reader.

Installation

Via pip (recommended)

pip install speechify-to-pdf

This installs the speechify-to-pdf command globally.

Manual

pip install pymupdf
# then download speechify_to_pdf.py and run it directly

Python 3.10 or newer.

Quick Start

1. Save the Speechify page in your browser

Open the document in Speechify (app.speechify.com)
In your browser: File → Save Page As (or Ctrl+S on Windows/Linux, Cmd+S on macOS)
Choose format: "Webpage, Complete" (not HTML only)

The result will look like:

Book.pdf _ Speechify.html
Book.pdf _ Speechify_files/   ← folder must be next to the HTML file

Note: The sidebar with highlights must be visible when you save. If it is collapsed, expand it (icon in the top left) and save again.

2. Run the tool

speechify-to-pdf "Book.pdf _ Speechify.html" "Book.pdf"

Installed manually? Replace speechify-to-pdf with python3 speechify_to_pdf.py in any command below.

This creates Book_highlights.pdf in the same folder as the original PDF.

Custom output path:

speechify-to-pdf "Book.pdf _ Speechify.html" "Book.pdf" -o "Book_annotated.pdf"

Print all highlights with details:

speechify-to-pdf "Book.pdf _ Speechify.html" "Book.pdf" -v

Auto-detect the PDF (if HTML filename matches):

speechify-to-pdf "Book.pdf _ Speechify.html"

Inspect highlights without a PDF (list mode):

speechify-to-pdf "Book.pdf _ Speechify.html" --list

Prints a color breakdown and count of all highlights found in the HTML file — no PDF needed. Useful to verify the export before processing.

Inspect highlights with full text (list + verbose):

speechify-to-pdf "Book.pdf _ Speechify.html" --list -v

Same as --list, but also prints each highlight's page, color, and text excerpt (up to 70 characters) so you can quickly scan the content before annotating.

Preview without writing any file (dry run):

speechify-to-pdf "Book.pdf _ Speechify.html" "Book.pdf" --dry-run

Suppress progress output (useful for scripts/batch processing):

speechify-to-pdf "Book.pdf _ Speechify.html" "Book.pdf" -q

Open a password-protected PDF:

speechify-to-pdf "Book.pdf _ Speechify.html" "Book.pdf" --password "mysecret"

Fix page offset (e.g. PDF has a 20-page preface Speechify doesn't count):

speechify-to-pdf "Book.pdf _ Speechify.html" "Book.pdf" --page-offset 20

Fix page offset for journal articles (PDF pages start above 1, e.g. pages 300–320):

speechify-to-pdf "Article _ Speechify.html" "Article.pdf" --page-offset -299

Transfer only specific highlight colors:

speechify-to-pdf "Book.pdf _ Speechify.html" "Book.pdf" --colors yellow
speechify-to-pdf "Book.pdf _ Speechify.html" "Book.pdf" --colors yellow,pink

Check the version:

speechify-to-pdf --version

Example Output

Running the tool on a typical document:

$ speechify-to-pdf "Algorithms.pdf _ Speechify.html" "Algorithms.pdf"
HTML:  Algorithms.pdf _ Speechify.html
       17 highlights found
PDF:   Algorithms.pdf  (412 pages)
  Locating: 17/17

  Annotating: 17/17

Result: 16/17 highlights transferred.
Not found (1):
  p.203: This is a very long highlight that starts with the opening words of...

Saved: Algorithms_highlights.pdf

With --verbose, each highlight is shown as it is placed:

$ speechify-to-pdf "Algorithms.pdf _ Speechify.html" "Algorithms.pdf" -v
HTML:  Algorithms.pdf _ Speechify.html
       17 highlights found
PDF:   Algorithms.pdf  (412 pages)
  Locating: 17/17
  ✓ p.12–13 [yellow]: A sorting algorithm is a method for reorganizing a...
  ✓ p.45 [pink] (…): The time complexity of this approach is bounded by...
  ~ p.98 [blue] (end not found, start line only): An invariant must hold at every...
  ✗ p.203 [yellow] NO RECTS: This is a very long highlight that starts with...

Result: 16/17 highlights transferred.
Saved: Algorithms_highlights.pdf

Dry run (preview without writing):

$ speechify-to-pdf "Algorithms.pdf _ Speechify.html" "Algorithms.pdf" --dry-run
HTML:  Algorithms.pdf _ Speechify.html
       17 highlights found
PDF:   Algorithms.pdf  (412 pages)
  Locating: 17/17

  Annotating: 17/17

Result: 16/17 highlights would be transferred.

Dry run — no file written. Would save to: Algorithms_highlights.pdf

What gets transferred?

Speechify element	PDF annotation
Yellow highlight	Yellow highlight
Pink highlight	Pink highlight
Blue highlight	Blue highlight
Green highlight	Green highlight
Orange highlight	Orange highlight
Purple highlight	Purple highlight
Note on a highlight	Comment on the annotation
Page number	Correct PDF page (±2 pages tolerance)

Limitations

Truncated texts: Speechify only shows the first ~80 characters of a long highlight in the sidebar. The tool first tries to recover the full text from the page source (aria-label attribute); when successful, the entire highlight is annotated correctly. When only the truncated text (~80 chars) is available, it marks from the start position and estimates the extent.
Image pages / scanned PDFs: On pure image pages without an embedded text layer, no text position can be found (no OCR).
Page offset: The script searches on the indicated page ±2 pages. With larger offsets (e.g. books with long prefaces not counted by Speechify) use --page-offset N to shift all lookups by N pages. When many highlights are not found, the tool automatically infers the shift from the ones it did locate and prints a suggested --page-offset value.

Troubleshooting

"No highlights found" → The sidebar was collapsed during saving. Expand it, reload the page, and save again.

Many "NOT FOUND" → The HTML and PDF might be from different versions of the book. Or: the PDF contains scanned text without a text layer. → If the PDF has unnumbered front matter (cover, preface, TOC) that Speechify does not count, add --page-offset N. The tool automatically detects a consistent shift from the highlights it did locate and prints the suggested value.

UnicodeDecodeError when reading the HTML file → This should not happen — the script tries UTF-8 first, then cp1252 (for Windows smart quotes and em-dashes), then latin-1 as a final catch-all that accepts any byte. If you do see this error, it likely means a corrupted file; try re-saving the page with File → Save Page As → Webpage, Complete in your browser.

Highlights appear on the wrong pages (shifted up or down) → The PDF page numbering does not match Speechify's. Use --page-offset N:

Positive N: PDF has front matter (preface, TOC) that Speechify does not count. E.g. --page-offset 20.
Negative N: PDF pages start above 1 (e.g. a journal article numbered pages 300–320). E.g. --page-offset -299. The tool prints a suggested offset automatically when many highlights are missed.

"PDF is password-protected" → Pass the password with --password "yourpassword". If you don't know the password, decrypt the file first with qpdf --decrypt input.pdf output.pdf.

ModuleNotFoundError: No module named 'fitz' → Run pip install pymupdf.

Roadmap

GUI (tkinter drag-and-drop) for non-technical users
Standalone executable (PyInstaller / .exe / .app)
Support for newer Speechify export formats
Batch processing of multiple files

Contributing

Pull requests and issue reports are welcome!
Please open an issue before starting work on larger changes.
See CONTRIBUTING.md for details.

Related projects

kindle-highlights-to-pdf — Same idea for Kindle users: transfer My Clippings.txt highlights into your PDF.

Support the project

This tool is free and open-source. If it saves you time, a small donation helps keep it maintained and improved:

☕ Donate via PayPal

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.github/workflows		.github/workflows
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
STATUS.md		STATUS.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
speechify_to_pdf.py		speechify_to_pdf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

speechify-to-pdf

What it does

Installation

Via pip (recommended)

Manual

Quick Start

1. Save the Speechify page in your browser

2. Run the tool

Example Output

What gets transferred?

Limitations

Troubleshooting

Roadmap

Contributing

Related projects

Support the project

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

speechify-to-pdf

What it does

Installation

Via pip (recommended)

Manual

Quick Start

1. Save the Speechify page in your browser

2. Run the tool

Example Output

What gets transferred?

Limitations

Troubleshooting

Roadmap

Contributing

Related projects

Support the project

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages