You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Hi have this code, with attached PDF to test.
public void doStrip() {
String string = null;
try {
PDFParser pdfParser = new PDFParser(new RandomAccessFile(new File("D:/escaner/errorsPDFBOX/AN20-0149-0602201842.pdf"), "r"));
pdfParser.parse();
PDDocument pdDocument = new PDDocument(pdfParser.getDocument());
PDFTextStripper pdfTextStripper = new PDFLayoutTextStripper();
string = pdfTextStripper.getText(pdDocument);
BufferedWriter writer = Files.newBufferedWriter(FileSystems.getDefault().getPath("D:/escaner","fichero.txt"), Charset.forName("UTF-8"));
writer.write(string);
writer.flush();
writer.close();
} catch (InvalidPasswordException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
AN20-0149-0602201842.pdf
I have this exception error:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.charAt(String.java:658)
at com.sagedillepasa.gestion.TextLine.isSpaceCharacterAtIndex(PDFLayoutTextStripper.java:269)
at com.sagedillepasa.gestion.TextLine.getNextValidIndex(PDFLayoutTextStripper.java:283)
at com.sagedillepasa.gestion.TextLine.computeIndexForCharacter(PDFLayoutTextStripper.java:263)
at com.sagedillepasa.gestion.TextLine.writeCharacterAtIndex(PDFLayoutTextStripper.java:229)
at com.sagedillepasa.gestion.PDFLayoutTextStripper.writeLine(PDFLayoutTextStripper.java:127)
at com.sagedillepasa.gestion.PDFLayoutTextStripper.writeTextPositionList(PDFLayoutTextStripper.java:157)
at com.sagedillepasa.gestion.PDFLayoutTextStripper.iterateThroughTextList(PDFLayoutTextStripper.java:152)
at com.sagedillepasa.gestion.PDFLayoutTextStripper.writePage(PDFLayoutTextStripper.java:96)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392)
at com.sagedillepasa.gestion.PDFLayoutTextStripper.processPage(PDFLayoutTextStripper.java:80)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227)
at com.sagedillepasa.gestion.test.doStrip(test.java:44)
at com.sagedillepasa.gestion.test.main(test.java:61)
The text was updated successfully, but these errors were encountered:
I'm encountering the same issue.
The exception seems to happen because index is 0here so isSpaceCharacterAtIndex is called with -1.
Changing the condition to !isCharacterPartOfPreviousWord && index > 0 && this.isSpaceCharacterAtIndex(index - 1) in the condition seems to fix the issue.
Hi,
Hi have this code, with attached PDF to test.
public void doStrip() {
String string = null;
try {
PDFParser pdfParser = new PDFParser(new RandomAccessFile(new File("D:/escaner/errorsPDFBOX/AN20-0149-0602201842.pdf"), "r"));
pdfParser.parse();
PDDocument pdDocument = new PDDocument(pdfParser.getDocument());
PDFTextStripper pdfTextStripper = new PDFLayoutTextStripper();
string = pdfTextStripper.getText(pdDocument);
BufferedWriter writer = Files.newBufferedWriter(FileSystems.getDefault().getPath("D:/escaner","fichero.txt"), Charset.forName("UTF-8"));
writer.write(string);
writer.flush();
writer.close();
} catch (InvalidPasswordException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
AN20-0149-0602201842.pdf
I have this exception error:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.charAt(String.java:658)
at com.sagedillepasa.gestion.TextLine.isSpaceCharacterAtIndex(PDFLayoutTextStripper.java:269)
at com.sagedillepasa.gestion.TextLine.getNextValidIndex(PDFLayoutTextStripper.java:283)
at com.sagedillepasa.gestion.TextLine.computeIndexForCharacter(PDFLayoutTextStripper.java:263)
at com.sagedillepasa.gestion.TextLine.writeCharacterAtIndex(PDFLayoutTextStripper.java:229)
at com.sagedillepasa.gestion.PDFLayoutTextStripper.writeLine(PDFLayoutTextStripper.java:127)
at com.sagedillepasa.gestion.PDFLayoutTextStripper.writeTextPositionList(PDFLayoutTextStripper.java:157)
at com.sagedillepasa.gestion.PDFLayoutTextStripper.iterateThroughTextList(PDFLayoutTextStripper.java:152)
at com.sagedillepasa.gestion.PDFLayoutTextStripper.writePage(PDFLayoutTextStripper.java:96)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392)
at com.sagedillepasa.gestion.PDFLayoutTextStripper.processPage(PDFLayoutTextStripper.java:80)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227)
at com.sagedillepasa.gestion.test.doStrip(test.java:44)
at com.sagedillepasa.gestion.test.main(test.java:61)
The text was updated successfully, but these errors were encountered: