Tail characters getting stripped off #15

rohitgadia · 2017-08-28T07:33:06Z

I am working with a host of PDF reports and while I am able to maintain the layout using your class, sometimes the tail characters are getting stripped off, but the parent class i.e. PDFTextStripper works fine.

Does this have anything to do with this.setCurrentPageWidth(pageRectangle.getWidth()); ??

By the way great work with the class, made the process of extracting tables so easy.

The text was updated successfully, but these errors were encountered:

JonathanLink · 2017-08-28T21:13:20Z

Hi! Thank you, great to hear that the class helps you! Can you send me (here or through my email) a PDF file which doesn't work?

rohitgadia · 2017-08-29T05:57:33Z

` Independent Auditors Report
To members of Silverlake Axis Ltd.

          rePort  on  the finAnciAl   stAtements                                                                                                   
                                                                                                                                                   
          We have audited  the  accompanying  fnancial  statements  of  Silverlake Axis  Ltd.  and its  subsidiaries  (collectively,  the  Group), 
          50  to  159,  which  comprise  the  statements  of  fnancial position  of  the  Group  and  the  Company  as  at 30  June  2016,  the  co
          of  changes  in equity,  consolidated  income  statement, consolidated statement  of  comprehensive  income  and  consolidated  statement
          fows of  the  Group  for  the  year  then ended,  and a  summary  of signifcant  accounting  policies and  other  explanatory  informatio
                                                                                                                                                   
          Management’s   Responsibility for  the  Financial  Statements                                                                            
          Management  is  responsible  for  the  preparation  of  fnancial statements  that give  a  true  and  fair view  in accordance  with  Int
          Reporting  Standards,  and for  devising  and  maintaining  a  system  of internal  accounting  controls suffcient  to  provide  a  **reaso**
          that  assets  are safeguarded  against  loss  from  unauthorised  use  or  disposition; and transactions  are  properly  authorised  and 
          recorded  as  necessary  to  permit  the  preparation  of  true  and fair consolidated  income  statement and  statements  of  fnancial  
          maintain  accountability  of  assets.                                                                                                    
                                                                                                                                                   
          Auditors’  Responsibility                                                                                                                
          Our  responsibility  is  to express an  opinion  on  these  fnancial  statements  based  on  our  audit.  We  conducted our  audit in  ac
          International  Standards  on  Auditing.  Those  standards require  that we  comply  with  ethical  requirements  and  plan  and  perform 
          obtain  reasonable  assurance  about whether  the  consolidated  fnancial  statements  are free  from  material  misstatement.           
                                                                                                                                                   
          An  audit  involves  performing  procedures  to  obtain  audit evidence about  the  amounts  and  disclosures  in  the consolidated  **fnan**
          The  procedures  selected  depend  on  the auditor’s judgement, including  the  assessment  of  risks  of  material  misstatement of  the
          fnancial  statements,  whether  due to  fraud or  error.  In making  those  risk  assessments,  the auditor  considers  internal  control
          the  entity’s  preparation  of  the consolidated  fnancial  statements  that  give  a  true and  fair  view in order  to  design  audit  
          appropriate  in the circumstances,  but  not  for the  purpose  of expressing an opinion  on  the  effectiveness of  the entity’s  intern
          audit  also  includes  evaluating  the  appropriateness  of  accounting policies  used  and the  reasonableness  of accounting  estimates
          management, as well  as  evaluating  the  overall presentation  of  the  consolidated  fnancial  statements.                             
                                                                                                                                                   
          We believe  that  the audit evidence  we  have obtained  is  suffcient and appropriate to  provide  a  basis  for our  audit opinion.    
                                                                                                                                                   
          Opinion                                                                                                                                  
          In  our  opinion,  the  consolidated  fnancial  statements  of the  Group  and  the  statement of  fnancial position  of  the  Company  a
          up  in  accordance  with the  International  Financial  Reporting  Standards  so  as to  give  a  true  and fair  view  of  the  **fnancial**
          and  of  the Company  as  at  30 June 2016  and  the results,  changes  in  equity  and  cash  **fows**  of  the  Group  for the  year ended 
                                                                                                                                                   
          other  mAtters                                                                                                                           
                                                                                                                                                   
          This  report  is  made solely to  the  members  of the  Company, as  a body,  and for  no  other  purpose.  We do  not  assume  **responsib**
          person  for  the content  of  this report.                                                                                               
                                                                                                                                                   
                                                                                                                                                   
                                                                                                                                                   
          eRNSt & YouNG                                                                                                                            
          AF:  0039                                                                                                                                
          Chartered  Accountants                                                                                                                   
                                                                                                                                                   
          Kuala  Lumpur,  Malaysia                                                                                                                 
          28  September  2016  `

This is what the extracted text looks like, if you look closely few of the characters are missing from words that I have highlighted and have also highlighted the issue where the tail characters are getting stripped.

I have attached the file as well, and the page number is 51 for the above extract. Thanks

AR 2016.pdf

JonathanLink · 2017-08-31T18:56:26Z

Thanks, I am going to investigate on that this week-end

rohitgadia · 2017-09-01T07:00:15Z

Thanks a lot, I was wondering if you could explain why were certain characters getting stripped!

JonathanLink · 2017-09-02T23:50:16Z

You were right, it has to to with this.setCurrentPageWidth(pageRectangle.getWidth());
I'll make an update but meanwhile you can change that line to: this.setCurrentPageWidth(pageRectangle.getWidth() * 1.2);
I also noticed that the space between the columns were sometimes not big enough (for instance with page 6). I'll try to fix that too.

zanonmark · 2024-02-06T12:11:11Z

@JonathanLink:
Thanks for this class, very useful.

About Your last commit (88bfd8c): I see it's still in the 'dev' branch and hasn't been merged to 'master'. Is there any reason for that?

Also, do we have any way to set the page width externally (i.e.: call pdflayouttextstripper.setPageWidth() or something like that)?
That would be very useful to decide case-by-case how to behave...

Thanks,
MZ

JonathanLink added a commit that referenced this issue Jan 4, 2018

Attempt to fix issues #18 and #15

88bfd8c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tail characters getting stripped off #15

Tail characters getting stripped off #15

rohitgadia commented Aug 28, 2017

JonathanLink commented Aug 28, 2017

rohitgadia commented Aug 29, 2017 •

edited

Loading

JonathanLink commented Aug 31, 2017 •

edited

Loading

rohitgadia commented Sep 1, 2017

JonathanLink commented Sep 2, 2017

zanonmark commented Feb 6, 2024

Tail characters getting stripped off #15

Tail characters getting stripped off #15

Comments

rohitgadia commented Aug 28, 2017

JonathanLink commented Aug 28, 2017

rohitgadia commented Aug 29, 2017 • edited Loading

JonathanLink commented Aug 31, 2017 • edited Loading

rohitgadia commented Sep 1, 2017

JonathanLink commented Sep 2, 2017

zanonmark commented Feb 6, 2024

rohitgadia commented Aug 29, 2017 •

edited

Loading

JonathanLink commented Aug 31, 2017 •

edited

Loading