Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AcroPDF style forms unable to read. #510

Open
KrishnaGadia opened this issue Nov 19, 2024 · 0 comments
Open

AcroPDF style forms unable to read. #510

KrishnaGadia opened this issue Nov 19, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@KrishnaGadia
Copy link

When using the lattice mode on a PDF generated using AcroPDF, I am unable to see the text. Only the table structure is visible.

Steps to reproduce the bug

  1. pip install camelot-py
  1. camelot -o cam -f csv -p3 lattice 26_Form_MGT-7-XXXX.pdf

Expected behavior

the tables should generate the text within them. Not blanks.

Code

import camelot 

pdf_path="26_Form_MGT-7-XXX.pdf"
pages = f"{3}-{3}" 
# Extract tables
tables = camelot.read_pdf(
    pdf_path,
    pages=pages,
    flavor="lattice"  # Use 'stream' for detecting lines or 'lattice' if grids are present
)

for table in tables:
    for row in table.cells:
        for cell in row:
            print(cell.text) #prints empty

# the number of rows, columns are correct

PDF

In the attachment
26_Form_MGT-7-21122016_signed.pdf

Screenshots

NA

Environment

  • OS: macOS
  • Python version: 3.9
  • Numpy version:
  • OpenCV version:
  • Ghostscript version:
  • Camelot version:

Additional context

this is visible using the fitz , PyMuPDF library, under widgets, words.

I did honestly try to figure it out myself, but got lost on the text extraction part.
I was able to view the form text via the filtz package. But it was not directly visible, instead had to look into the span, and get that information.
I really liked how you presented the data using the numpy df, and enabled the csv, json and other formats.
I am usually not active on github. Kindly reach out to [email protected]

@KrishnaGadia KrishnaGadia added the bug Something isn't working label Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant