Skip to content

fix(pdf-extraction): improve handling of unreadable PDF content#224

Open
Luis-manzur wants to merge 5 commits intomainfrom
219-bad-processing-of-pdfs-in-texas
Open

fix(pdf-extraction): improve handling of unreadable PDF content#224
Luis-manzur wants to merge 5 commits intomainfrom
219-bad-processing-of-pdfs-in-texas

Conversation

@Luis-manzur
Copy link
Copy Markdown
Contributor

This pull request improves the extraction of text from PDF files by adding a check to ensure that the extracted text is actually readable and not just binary or corrupt data. It also updates the interface and tests to reflect this new behavior.

Issue - #219

@Luis-manzur Luis-manzur requested review from flooie and grossir October 30, 2025 15:54
@Luis-manzur Luis-manzur linked an issue Oct 30, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: PRs to Review

Development

Successfully merging this pull request may close these issues.

Bad Processing of PDFs in Texas

3 participants