You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While parsing tabular data a new line is invoked every time this condition is met: if ( textYPosition > previousTextYPosition )
Now this is too sensitive if a row of a table contains two different font sizes.
It doesn't have to be a huge difference in font size.
One point in a font size is enough for the existing function getNumberOfNewLinesFromPreviousTextPosition()
to call for a new line, which of course results in a bad text output.
I've modified this function to have simple threshold, while checking for new line: if ( textYPosition - previousTextYPosition > newLineHeightThreshold )
and now it works just perfect.
BTW: great job Jonathan with this little class :)
The text was updated successfully, but these errors were encountered:
Thank you a lot for reporting that and I appreciate that you like this class.
I am going to see what I can do. The difficulty is about finding the good threshold. Maybe I could add an optional parameter to the PDFLayoutTextStripper constructor to let the user define such a threshold.
While parsing tabular data a new line is invoked every time this condition is met:
if ( textYPosition > previousTextYPosition )
Now this is too sensitive if a row of a table contains two different font sizes.
It doesn't have to be a huge difference in font size.
One point in a font size is enough for the existing function
getNumberOfNewLinesFromPreviousTextPosition()
to call for a new line, which of course results in a bad text output.
I've modified this function to have simple threshold, while checking for new line:
if ( textYPosition - previousTextYPosition > newLineHeightThreshold )
and now it works just perfect.
BTW: great job Jonathan with this little class :)
The text was updated successfully, but these errors were encountered: