Skip to content

Add regression test for assert_equal row-count mismatch#15109

Open
WilliamK112 wants to merge 1 commit into
NVIDIA:mainfrom
WilliamK112:spark-11305-regression
Open

Add regression test for assert_equal row-count mismatch#15109
WilliamK112 wants to merge 1 commit into
NVIDIA:mainfrom
WilliamK112:spark-11305-regression

Conversation

@WilliamK112

@WilliamK112 WilliamK112 commented Jun 18, 2026

Copy link
Copy Markdown

Fixes #11305.

Description

This PR adds regression coverage for assert_equal when tests compare scalar row counts instead of collected rows.

The new tests exercise row counts produced through the integration test Spark session helpers:

  • matching CPU/GPU row counts still pass through assert_equal
  • mismatched row counts preserve the original AssertionError instead of masking it with a scalar diff-rendering TypeError
  • scalar row-count mismatches still emit a unified diff containing the CPU and GPU counts

There is no production-code change.

Validation:

  • git diff --check
  • python -m py_compile integration_tests/src/main/python/asserts_regression_test.py
  • TESTS="asserts_regression_test.py" TEST_PARALLEL=1 ./integration_tests/run_pyspark_from_build.sh was attempted locally, but this checkout cannot execute the integration harness because SPARK_HOME is not set.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@ttnghia

ttnghia commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Please rebase your branch first.

@amahussein amahussein left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR targets wrong branch.
The target should be main branch.

@WilliamK112 WilliamK112 changed the base branch from branch-25.08 to main June 23, 2026 20:29
@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a new Python regression test file that covers the assert_equal row-count mismatch handling from issue #11305, specifically testing the scalar integer diff output path.

  • test_assert_equal_row_count_match verifies that equal row counts produced via with_cpu_session and with_gpu_session pass without error.
  • test_assert_equal_row_count_mismatch_raises_assertion_error confirms that mismatched counts raise AssertionError with the expected message and trigger the difflib.unified_diff stdout output (checked via capsys).

Confidence Score: 5/5

Test-only addition that exercises existing assert_equal infrastructure with real Spark sessions; no production code is changed and no new failure modes are introduced.

The single new file adds two focused regression tests for the assert_equal integer comparison path. Both tests drive real CPU and GPU Spark sessions, exercise the known-broken code path from the linked issue, and validate the error message text and diff stdout output. No production code, configuration, or shim logic is touched.

No files require special attention.

Important Files Changed

Filename Overview
integration_tests/src/main/python/asserts_regression_test.py New regression test file for assert_equal row-count mismatch; uses real Spark sessions (with_cpu_session/with_gpu_session) and correctly exercises the int-type branch and diff output in assert_equal.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Test
    participant CPU as with_cpu_session
    participant GPU as with_gpu_session
    participant AE as assert_equal

    Test->>CPU: spark.range(N).count()
    CPU-->>Test: cpu_count (int)
    Test->>GPU: spark.range(N).count()
    GPU-->>Test: gpu_count (int)
    Test->>AE: assert_equal(cpu_count, gpu_count)
    alt counts match
        AE-->>Test: passes silently
    else counts differ
        AE->>AE: _assert_equal raises AssertionError
        AE->>AE: write unified_diff to sys.stdout
        AE-->>Test: re-raises AssertionError
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Test
    participant CPU as with_cpu_session
    participant GPU as with_gpu_session
    participant AE as assert_equal

    Test->>CPU: spark.range(N).count()
    CPU-->>Test: cpu_count (int)
    Test->>GPU: spark.range(N).count()
    GPU-->>Test: gpu_count (int)
    Test->>AE: assert_equal(cpu_count, gpu_count)
    alt counts match
        AE-->>Test: passes silently
    else counts differ
        AE->>AE: _assert_equal raises AssertionError
        AE->>AE: write unified_diff to sys.stdout
        AE-->>Test: re-raises AssertionError
    end
Loading

Reviews (2): Last reviewed commit: "Add row-count assert_equal regression te..." | Re-trigger Greptile

Comment thread integration_tests/src/main/python/asserts_regression_test.py Outdated
Signed-off-by: WilliamK112 <164879897+WilliamK112@users.noreply.github.com>
@WilliamK112 WilliamK112 force-pushed the spark-11305-regression branch from 0fedab2 to fe9cd74 Compare June 25, 2026 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] row count only tests can fail with 'int' object is not iterable

4 participants