Skip to content

Add skipped-path coverage for skewed BHJ private optimizer [databricks] #15153

Draft
wjxiz1992 wants to merge 1 commit into
NVIDIA:mainfrom
wjxiz1992:fix/15136-db-fallback-test
Draft

Add skipped-path coverage for skewed BHJ private optimizer [databricks] #15153
wjxiz1992 wants to merge 1 commit into
NVIDIA:mainfrom
wjxiz1992:fix/15136-db-fallback-test

Conversation

@wjxiz1992

Copy link
Copy Markdown
Collaborator

JaCoCo sql-plugin line coverage: not measurable locally (Python private-optimizer IT/docs only; full run requires private optimizer Databricks runtime)

Contributes to #15136.

Summary

This follows up on the Databricks-wide skip for the skewed BHJ private-optimizer marker test by adding explicit skipped-path coverage:

  • keep the existing positive marker test for Apache/streamed-side AQE shapes
  • add assert_rule_skipped for runtime-specific no-op paths where the rule should not produce its marker
  • add a Databricks-only skewed BHJ test that verifies coalesced and skewed is absent while OFF-CPU and ON-GPU results still match
  • update the private optimizer README so future rule tests do not weaken positive marker checks when a runtime-specific skip is expected

Rationale

The skewed BHJ marker test proves the rule fires by requiring coalesced and skewed in the ON physical plan. That is the wrong expectation for Databricks executor-broadcast AQE shapes, where the materialized shuffle can appear on the BHJ build side and the streamed-side skew marker is not stable.

The new Databricks-only test records the intended fallback behavior separately instead of making the positive marker test Databricks-aware.

Testing

  • git diff --check
  • python3 -m py_compile integration_tests/src/main/python/private_optimizer_common.py integration_tests/src/main/python/private_optimizer_skewed_bhj_join_test.py

Not run locally: full private_optimizer PySpark IT requires a RAPIDS/private optimizer runtime. A direct local pytest collection attempt initialized Spark and failed before collection because the local Spark classpath did not include com.nvidia.spark.SQLPlugin.

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

Signed-off-by: Allen Xu <allxu@nvidia.com>
@wjxiz1992 wjxiz1992 changed the title [databricks] Add skipped-path coverage for skewed BHJ private optimizer Add skipped-path coverage for skewed BHJ private optimizer [databricks] Jun 26, 2026
@wjxiz1992 wjxiz1992 requested a review from amahussein June 26, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants