Skip to content

Conversation

@prosenjitdhole
Copy link
Collaborator

This is an initial version of GH action for gemm-sweep analysis

Note : YML uses requirements.txt, but I have added this file in this review as it will cause conflict.

TODO :

  • Cretae rccl run action (rccl related changes are still on the way : Intergrate rccl changes into tracelense single config #41 )

  • Find out a way to dump comparing html

    • Need to figure out a way to compare the baseline run
    • Create a single config tracelens run workflow
  • Need a runner machine and update the yml accordingly

@prosenjitdhole prosenjitdhole marked this pull request as ready for review December 15, 2025 14:48
channels:
description: 'Comma-separated NCCL channel values'
required: true
default: '28,42,56,70'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce it to 28 and 56 only

gemm-sweep:
name: Run GEMM Sweep Profiling
runs-on: [self-hosted, gpu, rocm]
timeout-minutes: 180
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout-minutes seem high.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please suggest an apropriate limit.

docker exec ${{ env.CONTAINER_NAME }} bash -c "
bash scripts/gemm_analysis/run_train_various_channels.sh \
--output-dir ${{ steps.setup.outputs.sweep_dir }} \
--channels ${{ github.event.inputs.channels || '28,42,56,70' }} \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce the number of channels to 28 and 56

- name: Extract top GEMM kernels
run: |
# Parse channels and threads into space-separated format
CHANNELS=$(echo "${{ github.event.inputs.channels || '28,42,56,70' }}" | tr ',' ' ')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce the channels.

@prosenjitdhole prosenjitdhole merged commit f6f573a into main Dec 15, 2025
5 checks passed
@prosenjitdhole prosenjitdhole deleted the prosenj_aorta_gh_action branch December 15, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants