Google Cloud SDK Shading for GCS PerfIO#15130
Conversation
Signed-off-by: Rahul Prabhu <raprabhu@nvidia.com>
Greptile SummaryThis PR does two things: expands the Maven shade configuration in both aggregator POMs to relocate the full Google client library stack (
Confidence Score: 5/5Safe to merge. The shade configuration change is well-scoped and correctly documented, and the GCS PerfIO auto-enable logic cleanly delegates to executor-side state with the existing SparkEnv null guard preserved. All three files contain straightforward, well-commented changes. The com.google relocation broadens an existing pattern to cover the full GCS SDK stack; the protobuf exclusion correctly preserves cross-boundary interop with Spark and ORC. The Guava usages present in the plugin (Objects.hashCode, @VisibleForTesting, ThreadFactoryBuilder) are purely internal and do not cross the shade boundary, so no ClassCastException risk was found. The isGCSPerfEnabled change is a clean delegation with no missing null-safety. The only nit is an unused local variable that can be removed without changing behavior. No files require special attention. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant HFI as HadoopFileIO
participant RIF as RapidsInputFiles
participant PIO as PerfIO (executor state)
participant GCS as GCSInputFile
HFI->>RIF: isGCSPerfEnabled()
RIF->>RIF: SparkEnv.get() null check
alt SparkEnv is null
RIF-->>HFI: false (pre-init path)
else SparkEnv live
RIF->>PIO: isGCSPerfEnabled()
PIO-->>RIF: true/false (executor-resolved state)
RIF-->>HFI: result
end
alt GCS PerfIO enabled
HFI->>GCS: GCSInputFile.create(path, conf)
else fallback
HFI->>HFI: HadoopInputFile.create(path, conf)
end
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant HFI as HadoopFileIO
participant RIF as RapidsInputFiles
participant PIO as PerfIO (executor state)
participant GCS as GCSInputFile
HFI->>RIF: isGCSPerfEnabled()
RIF->>RIF: SparkEnv.get() null check
alt SparkEnv is null
RIF-->>HFI: false (pre-init path)
else SparkEnv live
RIF->>PIO: isGCSPerfEnabled()
PIO-->>RIF: true/false (executor-resolved state)
RIF-->>HFI: result
end
alt GCS PerfIO enabled
HFI->>GCS: GCSInputFile.create(path, conf)
else fallback
HFI->>HFI: HadoopInputFile.create(path, conf)
end
Reviews (2): Last reviewed commit: "Exclude protobuf from relocation" | Re-trigger Greptile |
There was a problem hiding this comment.
Pull request overview
This PR updates the shaded “aggregator” build to better isolate Google/GCS client libraries from Dataproc-provided dependencies, and adjusts GCS PerfIO enablement so it follows executor-side PerfIO initialization state rather than requiring an explicit SparkConf flag.
Changes:
- Expand aggregator shading relocations to relocate
com.google,io.grpc, andio.opencensus(Scala 2.12 + 2.13). - Keep existing FlatBuffers relocation behavior implicitly via the broader
com.googlerelocation. - Update
RapidsInputFiles.isGCSPerfEnabled()to delegate toPerfIO.isGCSPerfEnabled().
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| sql-plugin/src/main/java/com/nvidia/spark/rapids/fileio/RapidsInputFiles.java | Switches GCS PerfIO gating from a SparkConf boolean to PerfIO’s resolved executor-side state. |
| aggregator/pom.xml | Updates shading relocations to isolate bundled Google client stack from Dataproc classpath. |
| scala2.13/aggregator/pom.xml | Same shading relocation changes for the Scala 2.13 aggregator build. |
Description
Relocate the bundled Google client stack in the aggregator shade configuration to isolate the GCS SDK from Dataproc-provided Google/GCS connector libraries.
Enable GCS PerfIO automatically by using the resolved executor-side PerfIO state instead of requiring spark.rapids.perfio.gcs.enabled=true to be explicitly set in SparkConf.
Checklists
Documentation
Testing
(Please provide the names of the existing tests in the PR description.)
Performance