Skip to content

Conversation

@Aminsed
Copy link
Contributor

@Aminsed Aminsed commented Dec 7, 2025

Description

closes #3711

BlockAdjacentDifference::_TempStorage previously allocated two separate arrays (first_items and last_items), each of size BLOCK_THREADS. However, after analyzing all member functions:

Function Uses first_items Uses last_items
SubtractLeft No Yes
SubtractLeft (with predecessor) No Yes
SubtractLeftPartialTile No Yes
SubtractLeftPartialTile (with predecessor) No Yes
SubtractRight Yes No
SubtractRight (with successor) Yes No
SubtractRightPartialTile Yes No

The arrays are never used together. This PR unifies them into a single items array, reducing shared memory usage by 50%.

Impact

For a 256-thread block with 4-byte elements:

  • Before: 256 × 2 × 4 = 2,048 bytes
  • After: 256 × 1 × 4 = 1,024 bytes

This reduction in shared memory can improve occupancy and performance for kernels using BlockAdjacentDifference.

BlockAdjacentDifference::_TempStorage previously allocated two separate
arrays (first_items and last_items), but SubtractLeft* operations only
use last_items while SubtractRight* operations only use first_items.
Since these operations are never used together, a single array suffices.

This change halves the shared memory footprint, which can improve
occupancy and performance for kernels using BlockAdjacentDifference.

For a 256-thread block with 4-byte elements:
- Before: 256 × 2 × 4 = 2,048 bytes
- After:  256 × 1 × 4 = 1,024 bytes

closes NVIDIA#3711
@Aminsed Aminsed requested a review from a team as a code owner December 7, 2025 02:58
@Aminsed Aminsed requested a review from elstehle December 7, 2025 02:58
@github-project-automation github-project-automation bot moved this to Todo in CCCL Dec 7, 2025
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Dec 7, 2025
@gonidelis gonidelis self-requested a review December 11, 2025 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

[BUG]: BlockAdjacentDifference requests 2x the shared memory actually needed

1 participant