Skip to content

Fuse array higher-order functions in Project#15061

Open
thirtiseven wants to merge 11 commits into
NVIDIA:mainfrom
thirtiseven:issue-14711-fuse-array-hof
Open

Fuse array higher-order functions in Project#15061
thirtiseven wants to merge 11 commits into
NVIDIA:mainfrom
thirtiseven:issue-14711-fuse-array-hof

Conversation

@thirtiseven

@thirtiseven thirtiseven commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Fixes #14711.

Description

This PR fuses compatible array higher-order functions that appear as top-level Project outputs, so multiple expressions over the same array can share the explode and intermediate projection work.

The fused path currently covers transform, filter, exists, and supported aggregate expressions. It groups HOFs by the same array argument and lambda arity, evaluates the shared exploded element batch once, then evaluates each lambda against the shared batch and reconstructs each output independently. The implementation keeps conservative guards for deterministic expressions, side-effect checks, and Project output ordering.

Performance

  • HOF number: 8~12 per case
  • rows: 2000000
  • array length: 96
  • partitions: 16
  • warm runs averaged: 3
case CPU avg warm e2e ms GPU avg warm e2e ms GPU project op before ms GPU project op after ms project op gain
many_unary_transforms 43513.73 758.30 4534.35 3423.26 24.50%
heterogeneous_elementwise 20699.53 560.03 4063.81 3396.65 16.42%
aggregate_heavy_mix 26127.60 582.01 6003.95 4595.99 23.45%
indexed_transforms 44172.19 652.53 4339.08 3208.59 26.05%

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven

Copy link
Copy Markdown
Collaborator Author

@greptile review

@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces GpuArrayHofFusion, a Project-level optimization that detects compatible array higher-order functions (transform, filter, exists, aggregate) over the same array argument and lambda arity, explodes the array once, and shares the resulting element batch across all HOFs in the group instead of exploding independently for each one.

  • Core fusion logic (GpuArrayHofFusion) groups HOFs by canShareExplode (same argument + same lambda arity), using determinism/side-effect guards and a barrier flush when a non-reorderable expression appears; resource management uses the established closeOnExcept/withResource patterns throughout.
  • Refactoring extracts makeExplodedElementBatch into GpuArrayTransformBase.companion and adds transformElementResults/aggregateElementResults helpers so the fused path can reuse the post-lambda reconstruction logic without duplicating it.
  • Tests add two Python integration tests for GPU/CPU equality and a Scala unit suite for group-detection logic; performance numbers show 16–26% project-op speedup across representative workloads.

Confidence Score: 5/5

Safe to merge; the fusion is transparent to correctness (determinism and side-effect guards are conservative), resource lifecycle follows established ARM patterns, and GPU/CPU equality is verified by integration tests.

The fusion logic is well-guarded: canFuse checks determinism and side-effect freedom, canShareExplode enforces same argument and same lambda arity, and flushGroups correctly barriers around non-reorderable expressions. Resource management uses the project's standard closeOnExcept/withResource overloads (all validated against Arm.scala). The makeTransformLambdaBatch index remapping correctly preserves each transform's intermediate-column ordering relative to the union set. The only missing piece is a runtime escape hatch if an unforeseen edge case surfaces in production.

sql-plugin/src/main/scala/com/nvidia/spark/rapids/higherOrderFunctions.scala — the new GpuArrayHofFusion object is the entire change surface; give particular attention to the unionIntermediate column-index remapping in makeTransformLambdaBatch and the shared-arg lifetime in evaluateFusedGroup.

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/higherOrderFunctions.scala Adds GpuArrayHofFusion object and refactors GpuArrayTransformBase/GpuArrayAggregate to expose element-result helpers; fusion logic, resource management, and grouping semantics appear correct but the optimization ships without an internal config toggle
sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala Minimal two-line change: routes GpuProjectExec.project through GpuArrayHofFusion before the existing fallback; cachedNullVectors clear/try/finally semantics are preserved
tests/src/test/scala/com/nvidia/spark/rapids/GpuArrayHofFusionSuite.scala Unit tests for group-detection logic; covers barrier splitting and arity/argument mismatch but uses synthetic 1-arg aggregate lambdas that don't reflect real 2-arg SQL aggregate merger shape
integration_tests/src/main/python/higher_order_functions_test.py Adds two integration tests verifying GPU/CPU equality for mixed-HOF projects with aggregate and indexed lambdas; uses assert_gpu_and_cpu_are_equal_collect correctly
integration_tests/src/main/python/array_test.py Adds a heterogeneous elementwise HOF mixed-project integration test with outer-column references; GPU/CPU comparison is properly asserted

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[GpuProjectExec.project] --> B{GpuArrayHofFusion.project}
    B --> C{findFusedGroups}
    C --> D[Scan boundExprs extrachof + canFuse]
    D --> E{canShareExplode?}
    E -->|yes| F[Add to existing group]
    E -->|no| G[Start new group]
    D --> H{canReorderExpression?}
    H -->|no| I[flushGroups - barrier]
    C --> J{Any group >= 2 HOFs?}
    J -->|no| K[Return None - fallback]
    J -->|yes| L[projectWithFusedGroups]
    L --> M[evaluateFusedGroup per group]
    M --> N[eval arg once]
    N --> O[makeExplodedElementBatch with unionIntermediate]
    O --> P[shared exploded batch]
    P --> Q[foreach transform - makeTransformLambdaBatch]
    Q --> R[transform.function.columnarEval]
    R --> S[consumeElementResults]
    S --> T[fill outputColumns at HOF index]
    L --> U[non-HOF exprs: eval normally]
    T --> V[ColumnarBatch output]
    U --> V
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[GpuProjectExec.project] --> B{GpuArrayHofFusion.project}
    B --> C{findFusedGroups}
    C --> D[Scan boundExprs extrachof + canFuse]
    D --> E{canShareExplode?}
    E -->|yes| F[Add to existing group]
    E -->|no| G[Start new group]
    D --> H{canReorderExpression?}
    H -->|no| I[flushGroups - barrier]
    C --> J{Any group >= 2 HOFs?}
    J -->|no| K[Return None - fallback]
    J -->|yes| L[projectWithFusedGroups]
    L --> M[evaluateFusedGroup per group]
    M --> N[eval arg once]
    N --> O[makeExplodedElementBatch with unionIntermediate]
    O --> P[shared exploded batch]
    P --> Q[foreach transform - makeTransformLambdaBatch]
    Q --> R[transform.function.columnarEval]
    R --> S[consumeElementResults]
    S --> T[fill outputColumns at HOF index]
    L --> U[non-HOF exprs: eval normally]
    T --> V[ColumnarBatch output]
    U --> V
Loading

Reviews (6): Last reviewed commit: "address greptile comments" | Re-trigger Greptile

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a fusion path in GPU Project evaluation to share the explode + intermediate projection work across multiple compatible array higher-order functions (HOFs) that read the same array input, improving performance while preserving ordering and correctness via conservative guards.

Changes:

  • Introduces GpuArrayTransformFusion to detect fuseable top-level Project outputs (transform/filter/exists and supported array_aggregate) and evaluate them using a shared exploded element batch.
  • Refactors element-wise and aggregate HOF implementations to expose reusable “consume element results” helpers for the fused execution path.
  • Adds integration tests covering mixed Projects that include multiple array HOFs (including aggregate).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
sql-plugin/src/main/scala/com/nvidia/spark/rapids/higherOrderFunctions.scala Adds fusion planner/executor logic for array HOFs in Projects and refactors HOF result consumption to enable sharing the explode step.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/basicPhysicalOperators.scala Hooks the fusion attempt into GpuProjectExec.project, falling back to the existing per-expression evaluation when no fusion applies.
integration_tests/src/main/python/higher_order_functions_test.py Adds a mixed Project integration test combining transform/filter/exists with aggregate.
integration_tests/src/main/python/array_test.py Adds a heterogeneous mixed Project test combining multiple element-wise array HOF outputs.

Comment thread integration_tests/src/main/python/higher_order_functions_test.py
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven marked this pull request as ready for review June 24, 2026 07:17
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven self-assigned this Jun 26, 2026
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Fuse multiple array higher-order functions reading the same column to share the explode step

3 participants