-
Notifications
You must be signed in to change notification settings - Fork 2
Initial workflow file for gemm-sweep-analysis #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| channels: | ||
| description: 'Comma-separated NCCL channel values' | ||
| required: true | ||
| default: '28,42,56,70' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce it to 28 and 56 only
| gemm-sweep: | ||
| name: Run GEMM Sweep Profiling | ||
| runs-on: [self-hosted, gpu, rocm] | ||
| timeout-minutes: 180 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout-minutes seem high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please suggest an apropriate limit.
| docker exec ${{ env.CONTAINER_NAME }} bash -c " | ||
| bash scripts/gemm_analysis/run_train_various_channels.sh \ | ||
| --output-dir ${{ steps.setup.outputs.sweep_dir }} \ | ||
| --channels ${{ github.event.inputs.channels || '28,42,56,70' }} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce the number of channels to 28 and 56
| - name: Extract top GEMM kernels | ||
| run: | | ||
| # Parse channels and threads into space-separated format | ||
| CHANNELS=$(echo "${{ github.event.inputs.channels || '28,42,56,70' }}" | tr ',' ' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce the channels.
This is an initial version of GH action for gemm-sweep analysis
Note : YML uses requirements.txt, but I have added this file in this review as it will cause conflict.
TODO :
Cretae rccl run action (rccl related changes are still on the way : Intergrate rccl changes into tracelense single config #41 )
Find out a way to dump comparing html
Need a runner machine and update the yml accordingly