Native Java binding to pdf_oxide via JNI (jni-rs 0.22). Same Rust core as the Python / Go / JS / C# / WASM bindings, sub-millisecond text extraction, 100% pass rate on 3,830 real-world PDFs. JDK 11 LTS floor, free Kotlin interop via the same JAR.
<dependency>
<groupId>fyi.oxide</groupId>
<artifactId>pdf-oxide</artifactId>
<version>0.3.53</version>
</dependency>// Kotlin DSL
implementation("fyi.oxide:pdf-oxide:0.3.53")// Groovy
implementation 'fyi.oxide:pdf-oxide:0.3.53'The JAR embeds native libraries for linux x86_64, linux aarch64, macOS x86_64, macOS aarch64, and windows x86_64. The right one is extracted to a UUID-suffixed temp file on first call via NativeLoader (snappy-java pattern — multi-classloader safe).
import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.AutoExtractor;
import fyi.oxide.pdf.Pdf;
import fyi.oxide.pdf.MarkdownConverter;
// Open + extract text
try (PdfDocument doc = PdfDocument.open(Path.of("report.pdf"))) {
System.out.println("pages: " + doc.pageCount());
System.out.println(doc.extractText(0));
}
// Convert to Markdown
try (PdfDocument doc = PdfDocument.open(Path.of("report.pdf"))) {
String md = MarkdownConverter.toMarkdown(doc);
Files.writeString(Path.of("report.md"), md);
}
// Smart text routing — picks text-layer or OCR per page automatically
try (PdfDocument doc = PdfDocument.open(Path.of("mixed.pdf"))) {
AutoExtractor extractor = AutoExtractor.balanced(doc);
String text = extractor.extractText();
}
// Markdown → PDF
try (Pdf pdf = Pdf.fromMarkdown("# Hello\n\nWorld")) {
pdf.saveTo(Path.of("out.pdf"));
}All v0.3.52 features available in Java:
PdfDocument— open, authenticate, extractText (page or auto), render PNG, formFields, search, producer/creator, toMarkdown/toHtml conveniencePdfPage— words, lines, chars, images, tables, annotations, text(BBox region)DocumentEditor— setFormField, addRedaction, applyRedactionsDestructive (v0.3.50 #231), scrubMetadata, savePdf— fromMarkdown, fromHtml, fromImages, split-by-bookmarks (v0.3.50 #482)MarkdownConverter— toMarkdown/toHtml × {whole-doc, per-page}AutoExtractor(v0.3.51 #517) — classifyPageKind, classifyDocumentKinds, extractText, extractAutoPage with simplifiedAutoResult, plusextractPageJson/extractDocumentJsonescape hatch for the full v0.3.51 rich shape (typed reasons + per-region bboxes + confidence)PdfSigner(v0.3.50 #235) — fromPkcs12, sign with PAdES B-B / B-T / B-LT (TSA over RFC 3161 HTTP), verify, classifyLevelPdfValidator— PDF/A and PDF/UA verdictPdfPolicy(v0.3.50 #230) — crypto-governance set-once policy
PdfException extends RuntimeException (unchecked, per Effective Java Item 71) + 8 typed subclasses (PdfParseException, PdfEncryptedException, PdfPermissionException, PdfIoException, PdfOcrUnavailableException, PdfSignatureException, PdfInvalidStateException, PdfUnsupportedException) + a PdfErrorKind enum for switch-on-enum dispatch.
try (PdfDocument doc = PdfDocument.open(Path.of("encrypted.pdf"))) {
// ...
} catch (PdfEncryptedException e) {
// Use PdfDocument.openWithPassword(path, password) instead
} catch (PdfException e) {
switch (e.kind()) {
case PARSE -> log.warn("malformed PDF");
case IO -> log.warn("io error");
default -> log.error("pdf error", e);
}
}PdfDocument, Pdf, and DocumentEditor are AutoCloseable with idempotent close:
- Calling
close()twice is safe (no double-free). AtomicLong-shared state coordinates concurrent close so callers can callclose()safely from any thread.- {@link PdfDocument} additionally registers a
Cleanerbackstop that frees the native handle if you forgetclose().PdfandDocumentEditordo not — always wrap them in try-with-resources or callclose()explicitly, or the native handle leaks for the lifetime of the JVM.
try (PdfDocument doc = PdfDocument.open(file)) {
// ... handle freed at end of try-with-resources
}| Property | Default | Purpose |
|---|---|---|
fyi.oxide.pdf.lib.path |
unset | Path to a pre-extracted native library (skip JAR extraction) |
fyi.oxide.pdf.use.systemlib |
false |
Use System.loadLibrary("pdf_oxide_jni") from java.library.path |
fyi.oxide.pdf.tempdir |
java.io.tmpdir |
Override the temp directory for native extraction (useful for read-only /tmp deployments) |
The JAR works directly from Kotlin — no extra adapter artifact needed. All value types use record-shaped accessors (bbox.x(), bbox.y()) which become Kotlin properties (bbox.x, bbox.y).
import fyi.oxide.pdf.PdfDocument
PdfDocument.open(Path.of("report.pdf")).use { doc ->
println("pages: ${doc.pageCount}")
println(doc.extractText(0))
}A future companion artifact will add Kotlin extension functions for idiomatic flow / coroutine APIs.
For FIPS-validated deployments, build pdf_oxide_jni with --no-default-features --features fips,signatures (excludes MD5/RC4 legacy-crypto). See FIPS guide.
MIT OR Apache-2.0 — same as the rest of pdf_oxide. Free for commercial use, no attribution required (though appreciated).