Skip to content

Fix ptxas compilation on sm103 for triton kernels#2539

Merged
tdophung merged 3 commits into
NVIDIA:mainfrom
tdophung:triton_sm103
Dec 22, 2025
Merged

Fix ptxas compilation on sm103 for triton kernels#2539
tdophung merged 3 commits into
NVIDIA:mainfrom
tdophung:triton_sm103

Conversation

@tdophung

@tdophung tdophung commented Dec 22, 2025

Copy link
Copy Markdown
Collaborator

Compilation failures happen because the system does not know where ptxas on *B300 systems. To resolve this, the user just needs to point it to the right place. This change is to do this automatically in QA cycles and CI (if CI were to run on GB110)

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

add env var to jax unittest scripts

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

…tion errors

Signed-off-by: tdophung <tdophung@nvidia.com>
@greptile-apps

greptile-apps Bot commented Dec 22, 2025

Copy link
Copy Markdown
Contributor

Greptile Summary

Sets TRITON_PTXAS_PATH environment variable in all JAX test scripts to fix Triton kernel compilation failures on GB300 (sm103) systems where the ptxas assembler location cannot be automatically detected.

  • Adds identical export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas line to 5 JAX test scripts (L0, L1, L2 levels)
  • Completes fix started in commit e855f9d by covering remaining distributed test scripts
  • Hardcoded path assumes standard CUDA installation location

Confidence Score: 4/5

  • Safe to merge with minor portability consideration
  • The fix correctly addresses the compilation issue by setting the required environment variable. All changes are identical, minimal, and follow the pattern from previous commit. Score is 4 instead of 5 due to hardcoded path assumption that may not work on systems with non-standard CUDA installations, though this is likely acceptable for the target QA/CI environment.
  • No files require special attention - all changes are identical and straightforward

Important Files Changed

Filename Overview
qa/L0_jax_distributed_unittest/test.sh Added TRITON_PTXAS_PATH export to fix Triton kernel compilation on GB300 systems
qa/L0_jax_unittest/test.sh Added TRITON_PTXAS_PATH export to fix Triton kernel compilation on GB300 systems
qa/L1_jax_distributed_unittest/test.sh Added TRITON_PTXAS_PATH export to fix Triton kernel compilation on GB300 systems
qa/L2_jax_distributed_unittest/test.sh Added TRITON_PTXAS_PATH export to fix Triton kernel compilation on GB300 systems
qa/L2_jax_unittest/test.sh Added TRITON_PTXAS_PATH export to fix Triton kernel compilation on GB300 systems

Sequence Diagram

sequenceDiagram
    participant TestScript as JAX Test Script
    participant Shell as Shell Environment
    participant Triton as Triton Compiler
    participant PTXAS as CUDA ptxas
    
    TestScript->>Shell: export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas
    Note over Shell: Environment variable set
    TestScript->>TestScript: Run JAX tests with Triton kernels
    TestScript->>Triton: Compile Triton kernel for sm103/GB300
    Triton->>Shell: Look for ptxas location
    Shell->>Triton: Return TRITON_PTXAS_PATH value
    Triton->>PTXAS: Execute /usr/local/cuda/bin/ptxas
    PTXAS->>Triton: Compilation successful
    Triton->>TestScript: Kernel ready
    TestScript->>TestScript: Continue test execution
Loading

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. qa/L0_jax_unittest/test.sh, line 4 (link)

    style: check that other JAX test scripts running triton kernels also set this environment variable - qa/L0_jax_distributed_unittest/test.sh, qa/L1_jax_distributed_unittest/test.sh, and qa/L2_jax_distributed_unittest/test.sh may also need this fix if they run on GB300 systems

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

…ded to multi gpus

Signed-off-by: tdophung <tdophung@nvidia.com>
@tdophung

Copy link
Copy Markdown
Collaborator Author

/te_ci jax

@greptile-apps

greptile-apps Bot commented Dec 22, 2025

Copy link
Copy Markdown
Contributor

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@jberchtold-nvidia

jberchtold-nvidia commented Dec 22, 2025

Copy link
Copy Markdown
Collaborator

@tdophung SR: can you also add this to L1_jax_distributed_unittest? We don't currently run Triton tests in L1 since L1 only has a multi-GPU config, but we will likely start when we add custom partitioning support to these triton wrappers.

@KshitijLakhani

Copy link
Copy Markdown
Collaborator

@tdophung SR: can you also add this to L1_jax_distributed_unittest? We don't currently run Triton tests in L1 since L1 only has a multi-GPU config, but we will likely start when we add custom partitioning support to these triton wrappers.

@tdophung when you do this please add the time it takes for your additional tests in the PR description using NVTE_JAX_TEST_TIMING=1 so that we can track how much time we are adding to our tests>
Also, we'd like to report this for any tests we add in the future to TE JAX as well
Thanks !

@tdophung

Copy link
Copy Markdown
Collaborator Author

I added the flag to L1 files, I will also put the test time increase down once it is enabled for L1

Signed-off-by: tdophung <tdophung@nvidia.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. qa/L0_jax_unittest/test.sh, line 4 (link)

    style: hardcoded path assumes CUDA is installed at /usr/local/cuda

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@tdophung tdophung merged commit 97a09c2 into NVIDIA:main Dec 22, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants