Move Hadoop file I/O helpers to Java module#15030
Conversation
3b14485 to
b5c391f
Compare
99d64a2 to
84ed0c3
Compare
84ed0c3 to
85f0628
Compare
b5c391f to
112e88b
Compare
Greptile SummaryThis PR is one reviewable layer in the unshim stack that moves Hadoop file I/O helpers from
Confidence Score: 4/5Safe to merge as a stack layer; all existing callers continue on the unchanged non-PerfIO path until the next layer wires in the factory. The migration is structurally clean and introduces no behaviour change for current callers. Two minor concerns:
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller as Caller (GpuParquetScan etc.)
participant HFileIO as HadoopFileIO (sql-plugin-fileio)
participant PerfFact as PerfIOHadoopInputFileFactory (sql-plugin-fileio)
participant RIF as RapidsInputFiles (sql-plugin-fileio)
participant Bridge as PerfIOS3Reader (sql-plugin)
participant PerfIO as PerfIO$ (Scala, sql-plugin)
participant HIF as HadoopInputFile (sql-plugin-fileio)
participant S3IF as S3InputFile (sql-plugin-fileio)
Caller->>HFileIO: newInputFile(path)
alt "inputFileFactory == null (existing callers)"
HFileIO->>HIF: create(path, conf)
HIF-->>Caller: HadoopInputFile
else "inputFileFactory != null (next stack layer)"
HFileIO->>PerfFact: create(path, conf)
PerfFact->>RIF: isS3PerfEnabled()
RIF->>Bridge: isEnabled()
Bridge->>Bridge: SparkEnv.get().conf()
Bridge-->>RIF: true/false
alt S3 scheme AND PerfIO enabled
PerfFact->>S3IF: create(path, conf)
S3IF-->>Caller: S3InputFile
Caller->>S3IF: readVectored(output, ranges)
S3IF->>RIF: readS3Vectored(...)
RIF->>Bridge: readVectored(...)
Bridge->>PerfIO: readToHostMemory(...)
PerfIO-->>Bridge: Option[result]
Bridge-->>S3IF: true/false
else fallback
PerfFact->>HIF: create(path, conf)
HIF-->>Caller: HadoopInputFile
end
end
Reviews (1): Last reviewed commit: "Move Hadoop file I/O helpers to Java mod..." | Re-trigger Greptile |
| throws IOException { | ||
| if (!RapidsInputFiles.readS3Vectored(hadoopConf, fileUri, output, copyRanges)) { | ||
| throw new IllegalArgumentException("expected to use PerfIO to read"); | ||
| } |
There was a problem hiding this comment.
IllegalArgumentException escapes from a method declared throws IOException
Both readVectored and readTail throw IllegalArgumentException when the PerfIO bridge returns false. Because IllegalArgumentException is an unchecked exception, it will propagate straight through callers that only have a catch (IOException e) guard — the error will be invisible at every layer that handles I/O failures. The Scala original used require() with the same semantics, but in Java the disparity between the declared checked signature and the actual unchecked throw is more surprising. Wrapping in an IOException (throw new IOException("expected to use PerfIO to read")) would align intent with the declared signature and ensure the failure is caught by any standard I/O error handler.
Related to #14834.
Description
This PR is one reviewable layer in the unshim stack introduced by #15025. It moves Hadoop file I/O helper code into the Java-friendly file I/O module. This includes the S3 input-file plumbing pilot and keeps the file I/O runtime path separate from the broader Scala SQL plugin compilation surface.
Stack context
Testing and validation notes
Checklists
Documentation
Testing
(Covered by the validation notes in the PR description.)
Performance