Fix ptxas compilation on sm103 for triton kernels#2539
Conversation
…tion errors Signed-off-by: tdophung <tdophung@nvidia.com>
Greptile SummarySets
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant TestScript as JAX Test Script
participant Shell as Shell Environment
participant Triton as Triton Compiler
participant PTXAS as CUDA ptxas
TestScript->>Shell: export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas
Note over Shell: Environment variable set
TestScript->>TestScript: Run JAX tests with Triton kernels
TestScript->>Triton: Compile Triton kernel for sm103/GB300
Triton->>Shell: Look for ptxas location
Shell->>Triton: Return TRITON_PTXAS_PATH value
Triton->>PTXAS: Execute /usr/local/cuda/bin/ptxas
PTXAS->>Triton: Compilation successful
Triton->>TestScript: Kernel ready
TestScript->>TestScript: Continue test execution
|
There was a problem hiding this comment.
Additional Comments (1)
-
qa/L0_jax_unittest/test.sh, line 4 (link)style: check that other JAX test scripts running triton kernels also set this environment variable -
qa/L0_jax_distributed_unittest/test.sh,qa/L1_jax_distributed_unittest/test.sh, andqa/L2_jax_distributed_unittest/test.shmay also need this fix if they run on GB300 systems
2 files reviewed, 1 comment
…ded to multi gpus Signed-off-by: tdophung <tdophung@nvidia.com>
|
/te_ci jax |
Greptile's behavior is changing!From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
|
@tdophung SR: can you also add this to |
@tdophung when you do this please add the time it takes for your additional tests in the PR description using |
|
I added the flag to L1 files, I will also put the test time increase down once it is enabled for L1 |
Signed-off-by: tdophung <tdophung@nvidia.com>
There was a problem hiding this comment.
Additional Comments (1)
-
qa/L0_jax_unittest/test.sh, line 4 (link)style: hardcoded path assumes CUDA is installed at
/usr/local/cuda
5 files reviewed, 1 comment
Compilation failures happen because the system does not know where ptxas on *B300 systems. To resolve this, the user just needs to point it to the right place. This change is to do this automatically in QA cycles and CI (if CI were to run on GB110)
Fixes # (issue)
Type of change
Changes
add env var to jax unittest scripts
Checklist: