You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the tables should generate the text within them. Not blanks.
Code
import camelot
pdf_path="26_Form_MGT-7-XXX.pdf"
pages = f"{3}-{3}"
# Extract tables
tables = camelot.read_pdf(
pdf_path,
pages=pages,
flavor="lattice" # Use 'stream' for detecting lines or 'lattice' if grids are present
)
for table in tables:
for row in table.cells:
for cell in row:
print(cell.text) #prints empty
# the number of rows, columns are correct
this is visible using the fitz , PyMuPDF library, under widgets, words.
I did honestly try to figure it out myself, but got lost on the text extraction part.
I was able to view the form text via the filtz package. But it was not directly visible, instead had to look into the span, and get that information.
I really liked how you presented the data using the numpy df, and enabled the csv, json and other formats.
I am usually not active on github. Kindly reach out to [email protected]
The text was updated successfully, but these errors were encountered:
When using the lattice mode on a PDF generated using AcroPDF, I am unable to see the text. Only the table structure is visible.
Steps to reproduce the bug
Expected behavior
the tables should generate the text within them. Not blanks.
Code
PDF
In the attachment
26_Form_MGT-7-21122016_signed.pdf
Screenshots
NA
Environment
Additional context
this is visible using the fitz , PyMuPDF library, under widgets, words.
I did honestly try to figure it out myself, but got lost on the text extraction part.
I was able to view the form text via the filtz package. But it was not directly visible, instead had to look into the span, and get that information.
I really liked how you presented the data using the numpy df, and enabled the csv, json and other formats.
I am usually not active on github. Kindly reach out to [email protected]
The text was updated successfully, but these errors were encountered: