fix(pdf-extraction): improve handling of unreadable PDF content by Luis-manzur · Pull Request #224 · freelawproject/doctor

Luis-manzur · 2025-10-30T15:54:23Z

This pull request improves the extraction of text from PDF files by adding a check to ensure that the extracted text is actually readable and not just binary or corrupt data. It also updates the interface and tests to reflect this new behavior.

Issue - #219

for more information, see https://pre-commit.ci

doctor/tasks.py

…OCR flag to prev version

…xas' into 219-bad-processing-of-pdfs-in-texas # Conflicts: # doctor/tasks.py

for more information, see https://pre-commit.ci

fix(pdf-extraction): improve handling of unreadable PDF content

b03baea

Luis-manzur requested review from flooie and grossir October 30, 2025 15:54

Luis-manzur assigned flooie Oct 30, 2025

Luis-manzur linked an issue Oct 30, 2025 that may be closed by this pull request

Bad Processing of PDFs in Texas #219

Open

[pre-commit.ci] auto fixes from pre-commit.com hooks

a510611

for more information, see https://pre-commit.ci

Luis-manzur added this to Sprint (Case Law) Oct 30, 2025

Luis-manzur moved this to PRs to Review in Sprint (Case Law) Oct 30, 2025

mlissner reviewed Oct 30, 2025

View reviewed changes

doctor/tasks.py Outdated Show resolved Hide resolved

Luis-manzur and others added 3 commits October 30, 2025 15:28

fix(pdf-extraction): enhance unreadable content detection and adjust …

c22f21f

…OCR flag to prev version

Merge remote-tracking branch 'origin/219-bad-processing-of-pdfs-in-te…

0d21923

…xas' into 219-bad-processing-of-pdfs-in-texas # Conflicts: # doctor/tasks.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

d0f468d

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(pdf-extraction): improve handling of unreadable PDF content#224

fix(pdf-extraction): improve handling of unreadable PDF content#224
Luis-manzur wants to merge 5 commits intomainfrom
219-bad-processing-of-pdfs-in-texas

Luis-manzur commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Luis-manzur commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants