forked from NVIDIA/TransformerEngine
-
Notifications
You must be signed in to change notification settings - Fork 16
[CICD] support Metax MACA workflow #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 15 commits
Commits
Show all changes
160 commits
Select commit
Hold shift + click to select a range
545db96
chore: test CI workflow trigger for project structure exploration
5e1ad3f
add 3rdparty
6e91eec
add 3rdparty for running
qqjxzxq e5b5e17
add workflow_dispatch
qqjxzxq 2915197
build cuda config and struct
qqjxzxq 0197046
[CICD] add metax config and tests
zhoujiamei 84a2cc2
[CICD] change all test cuda & ascend
zhoujiamei b5d2427
fix: adapt torchrun to dynamic GPU count
zhoujiamei 11718bd
fix: force docker to use local image with --pull never
zhoujiamei 5b4c3e2
fix:add image-pull-policy: never
qqjxzxq a270eaf
change tag as localhost
qqjxzxq 73eb570
tag as localhost:5000
zhoujiamei 79ad70a
fix: use local registry image and dynamic gpu count for C500
zhoujiamei af64379
delete gpu host
zhoujiamei b8da525
delete gpu all
zhoujiamei b5d0826
Modify comments
99fd037
add functional test config
ad05fe5
add all tests common for functional tests
27cda22
set path for mcc
0231710
fix utils for metax
c88f447
back to unit test
c0c75d1
add L0 unit test
a77c183
fix L0 unit test
14b445c
L0 pytorch wheel for functional test; gitignore
BrianPei 19eacbb
Check if tests should run
1cac161
Merge branch 'main' of github.com:BrianPei/TransformerEngine-FL
ca47247
force run all tests for debugging
efafa63
set inputs.setup_commands from all_tests_metax.yml
67ef5dd
add setup_commands for all_tests_common
3c9394f
force detect changes
4dc2722
detect .github
502aa7f
fix conda path
746f39f
export conda path
f72d678
add log to debug
888867b
speed up download
74c441d
change timeout minutes
315cbf2
more time needed
cb5c50a
no-build-isolation
82c66c1
export path
f68b94d
export path change
c2b78df
add cmake install and skip cuda
7aac806
debug qa L0 unit_tests on metax
BrianPei 7407cad
fix unit_tests excute script
BrianPei f011fb7
just run L0_pytorch_debug_unittest; add debug step
BrianPei 59df094
fix metax container config
BrianPei 816a044
install git when setpu
94967e7
delete skip cuda
6048655
delete original te & install transformer_engine_metax
bccb413
add home/muxiuser
9969192
--no-deps
1cfbe16
symbolic link for import transformer_engine_torch
4656c90
fix import error
ec2fe35
fix nvdlfw_inspect install
10b7a9c
delete uninstall nvdlfw-inspect in test.sh
1338cf8
test lint
80f4263
test wheel
91a3300
test unittest
47b0db0
check torch.cuda.is_available()
448de78
add dev path
a90bf0b
add dev path2
8188747
map home/muxiuser
f0f7bf4
map home/muxiuser2
de7ed03
--device=/dev/dri:/dev/dri
3e89fa0
run for test
79f508c
try build te & check version
BrianPei 95aab66
Platform-specific build environment variables
BrianPei 490fc7e
run for test
8ceba83
Merge branch 'main' of github.com:BrianPei/TransformerEngine-FL
5fcb8ed
add more docker config
681e16f
delete net host
4fe67b9
delete #
2ccdd53
delete hostname
b2dafd7
add ln -sf
bfd7a32
attach to data
93d84a2
rewrite
0d660a8
run in own container
e95812c
docker run in workflow
4f813c5
docker run in workflow
9e3e890
no result
2acab35
back to origin
7ebc2f8
unit_tests_common,check platform
BrianPei d687515
skip install .whl if it exist
45b49e6
skip install .whl if it exist
eba5af3
skip install .whl
c98f79e
again
1f7583b
again.
7b6858e
cuda platform add source conda
BrianPei b527546
try to install te
7bd355e
try to install te
8e51ac5
service ssh restart
dd0a408
ln -s
db0f785
turn back
6c47459
turn to lint
571b56b
test
df71628
fix cuda platform logic
BrianPei d0d59ae
fix platform env miss
BrianPei c5a062b
fix unit_tests script
BrianPei 410ed94
fix script IndentationError
BrianPei c131403
remove active python step
BrianPei a9f5699
fix excute tests env
BrianPei 13444b6
remove cuda container mount
BrianPei 22bc235
fix cuda container_volumes
BrianPei 900c37f
network host
5fae5cd
add TE_FL_PREFER: vendor; install nvdlfw-inspect
BrianPei 6e22262
a new build for workflow
e0fa383
a new build for workflow
5a635f3
delete sth for root
65e1e84
creat a container for deteting
c8faeb6
no checkout source code
e0e4b6a
git clone
5515026
git install
3e47700
check
23f0603
reset docker option
afdf0d2
check
f22cc18
check
5e00a83
check name
2ff2ac8
check name
d52168a
check name
07f13b9
setuptools and sed git clone nvdlfw-inspect
7bde6d9
http1.1
3778a95
git clone
c9992f8
activate conda
9188ad5
use actions/checkout@v4
a3fa59c
add TE_LIB_PATH
BrianPei d74417d
fix TE_LIB_PATH
BrianPei 0dfeff5
change TE_LIB_PATH
BrianPei 1e77251
add more L0 cases
BrianPei f353b3a
change place
96882f5
setputool
6f2a1ee
turn back
d675bb3
ignore L0_pytorch_whell cases
BrianPei ca4281f
delete /opt/maca:/opt/maca
213c9d7
delete /opt/maca:/opt/maca and host
4f9882e
delete MACA_VISIBLE_DEVICES=all
6a8c4b7
Excluding certain test cases
dd398e3
check pytorch_unittest
BrianPei 9be90cb
ignore some L0 unit cases on metax can't pass now
BrianPei 8646d69
Merge branch 'cuda_dev'
ac77020
Disabled the keep container alive on failure
5e18bce
fix Chinese
d6ed58c
Revert changes to utils.py
b67a6cd
ignore logic on metax
3fe8604
ignore some unittest on metax
0732040
ignore unittest logic on metax
df6799a
delete the volumes about home/muxiuser
aa45e58
change workspace path as
a39f49e
fixed format
12ef96c
remove chinese commet & failure debug step
BrianPei cfdf56a
add coverage report step
BrianPei a93796f
fix license action
BrianPei 0f31313
set cuda env; generate coverage report
BrianPei c56bbf2
fix coverage collection
BrianPei 338414d
install curl for upload coverage; active python
BrianPei e7318f5
check FlagCICD connection before upload coverage
BrianPei 5b02a7a
add pytest
qqjxzxq cb613b2
fix network timeout
qqjxzxq 26557d6
only ignore some tests about Core Unit Tests
qqjxzxq 6cdb75a
ignore some tests
qqjxzxq ad27a07
pass cuda L0
qqjxzxq f601ad9
pass L0 test
qqjxzxq File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # Huawei Ascend NPU configuration | ||
| image: ascend-infer:ubuntu18.04 | ||
| labels: | ||
| - npu | ||
| - ascend | ||
| docker_options: | | ||
| --device /dev/davinci0 | ||
| --device /dev/davinci1 | ||
| --device /dev/davinci2 | ||
| --device /dev/davinci3 | ||
| --device /dev/davinci_manager | ||
| --device /dev/devmm_svm | ||
| --device /dev/hisi_hdc | ||
| --volume /usr/local/Ascend/driver:/usr/local/Ascend/driver | ||
| --volume /usr/local/Ascend/add-ons:/usr/local/Ascend/add-ons | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| # CUDA Hardware Configuration for TransformerEngine-FL | ||
| # Refactored for BAAI DGX A100 Nodes | ||
| # This file defines environment variables, volumes, and test filters for TE tests. | ||
|
|
||
| hardware_name: cuda | ||
| display_name: "NVIDIA CUDA (A100)" | ||
|
|
||
| ci_image: harbor.baai.ac.cn/flagscale/cuda12.8.1-torch2.7.1-python3.10-te2.9:20260209 | ||
|
|
||
| # Runner labels for self-hosted A100 node | ||
| runner_labels: | ||
| - self-hosted | ||
| - Linux | ||
| - X64 | ||
| - nvidia | ||
| - gpu-8 | ||
|
|
||
| # Container volumes | ||
| container_volumes: | ||
| - .:/opt/transformerengine | ||
| - ./ci_logs:/logs | ||
| - /home/flagscale_cicd/data:/opt/data | ||
|
|
||
| # Container options | ||
| container_options: >- | ||
| --privileged | ||
| --gpus all | ||
| --shm-size=500g | ||
| --ipc=host | ||
| --ulimit memlock=-1 | ||
| --ulimit stack=67108864 | ||
| --user root | ||
|
|
||
| # Device types | ||
| device_types: | ||
| - a100 | ||
|
|
||
| # Environment variables | ||
| env_vars: | ||
| NVTE_FRAMEWORK: pytorch | ||
| TE_WITH_NCCL: 1 | ||
| NVTE_PROJECT_BUILDING: 1 | ||
| TE_FL_SKIP_CUDA: 0 | ||
|
|
||
| # Test matrix configuration | ||
| test_matrix: | ||
| l0_pytorch: | ||
| path: "qa/L0_pytorch_unittest/test.sh" | ||
| ignored_tests: | ||
| - test_sanity_layernorm_mlp | ||
| - test_sanity_gpt | ||
| - test_sanity_bert | ||
| - test_sanity_T5 | ||
| - test_sanity_amp_and_nvfuser | ||
| - test_sanity_drop_path | ||
| - test_layernorm_mlp_accuracy | ||
| - test_grouped_linear_accuracy | ||
| - test_gpt_accuracy | ||
| - test_basic_linear | ||
| - test_layer_norm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # CUDA Hardware Configuration for Megatron-LM-FL | ||
| # This file defines CI/CD settings for CUDA-based testing | ||
| # Test configurations are defined in tests/test_utils/config/platforms/cuda.yaml | ||
|
|
||
| hardware_name: metax | ||
| display_name: 'Metax Tests' | ||
|
|
||
| # Docker image for this hardware | ||
| # ci_image: cr.metax-tech.com/public-ai-release/maca/megatron-lm:0.12.0-maca.ai3.3.0.11-torch2.6-py312-ubuntu22.04-amd64 | ||
| ci_image: localhost:5000/megatron-lm-with-te:v1 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Current image dose not integrate TE-FL, we should update it later
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This image should already have TE-FL integrated. |
||
|
|
||
| # Runner labels for this hardware | ||
| runner_labels: | ||
| - self-hosted | ||
| - Linux | ||
| - X64 | ||
| - metax | ||
| # - gpu-8 | ||
| - dev | ||
|
|
||
| # Container volumes (hardware-specific paths) | ||
| container_volumes: | ||
| # - /home/flagscale_cicd/flask/static:/workspace/report | ||
| # - /home/flagscale_cicd/flask/config:/workspace/config | ||
| # - /home/flagscale_cicd/docker/docker_build/docker_data:/home/gitlab-runner/data | ||
| # - /home/flagscale_cicd/docker/docker_build/docker_tokenizers:/home/gitlab-runner/tokenizers | ||
| # - /home/flagscale_cicd/docker/docker_build/docker_data/Megatron-LM/datasets:/opt/data/datasets | ||
| # - /home/flagscale_cicd/docker/docker_build/docker_tokenizers/Megatron-LM/tokenizers:/opt/data/tokenizers | ||
| # --- 新增:Transformer Engine 开发专用路径 --- | ||
qqjxzxq marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - /home/muxiuser/jinglong/TransformerEngine-FL:/workspace/TransformerEngine-FL # 开发仓库 | ||
| - /home/muxiuser/jinglong:/opt/te_packages # 存放编译好的 TE 包,供测试安装使用 | ||
| - /usr/local/maca:/usr/local/maca:ro # [关键] 挂载宿主机的 MACA 驱动库(设为只读),确保算子能跑 | ||
|
|
||
| # Container options (hardware-specific settings) | ||
| container_options: '--privileged --shm-size=500g --hostname megatron_cicd --user root --ulimit nofile=65535:65535 ' | ||
|
|
||
| # Device types to run tests on | ||
| device_types: | ||
| - C500 | ||
|
|
||
| # Test matrix configuration | ||
| test_matrix: | ||
| unit: | ||
| devices: | ||
| - C500 | ||
| # Ignored test files for unit tests | ||
| # These files will be skipped when running pytest | ||
| ignored_tests: | ||
| - tests/unit_tests/data/test_preprocess_data.py | ||
| - tests/unit_tests/dist_checkpointing/test_global_metadata_reuse.py | ||
| - tests/unit_tests/dist_checkpointing/test_optimizer.py | ||
| - tests/unit_tests/dist_checkpointing/test_nonpersistent.py | ||
| - tests/unit_tests/dist_checkpointing/test_optimizer.py | ||
| - tests/unit_tests/dist_checkpointing/test_safe_globals.py | ||
| - tests/unit_tests/dist_checkpointing/models/test_moe_experts.py | ||
| - tests/unit_tests/distributed/test_grad_sync_with_expert_parallel.py | ||
| - tests/unit_tests/distributed/test_mcore_fully_sharded_data_parallel.py | ||
| - tests/unit_tests/export/trtllm/test_distributed_fp8.py | ||
| - tests/unit_tests/export/trtllm/test_single_device_fp8.py | ||
| - tests/unit_tests/transformer/moe/test_a2a_token_dispatcher.py | ||
| - tests/unit_tests/test_inference.py | ||
| - tests/unit_tests/test_rl_utils.py | ||
| - tests/unit_tests/models/test_gpt_model.py | ||
| - tests/unit_tests/models/test_mamba_model.py | ||
| - tests/unit_tests/post_training/test_modelopt_module_spec.py | ||
| - tests/unit_tests/transformer/moe/test_aux_loss.py | ||
| - tests/unit_tests/transformer/moe/test_moe_layer_discrepancy.py | ||
| - tests/unit_tests/transformer/moe/test_routers.py | ||
| - tests/unit_tests/transformer/test_attention.py | ||
| - tests/unit_tests/transformer/test_attention_packed_seq.py | ||
| - tests/unit_tests/transformer/test_cuda_graphs.py | ||
| - tests/unit_tests/transformer/test_full_cuda_graph.py | ||
| - tests/unit_tests/transformer/test_multi_latent_attention.py | ||
| - tests/unit_tests/transformer/test_multi_token_prediction.py | ||
| - tests/unit_tests/transformer/test_retro_attention.py | ||
| - tests/unit_tests/transformer/test_transformer_block.py | ||
| - tests/unit_tests/transformer/test_transformer_block_custom_pgs.py | ||
| - tests/unit_tests/dist_checkpointing/test_local.py | ||
|
|
||
| # functional: | ||
| # train: | ||
| # - device: C500 | ||
| # task: train | ||
| # model: gpt | ||
| # case: all | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| # Configuration Template | ||
| # This file describes the structure for hardware-specific configurations. | ||
| # | ||
| # Fields: | ||
| # - image: Docker image to use for the runner | ||
| # - labels: List of labels for the runner | ||
| # - docker_options: Additional Docker options for mounting devices, volumes, etc. | ||
| # | ||
| # Example: | ||
| # image: <docker_image> | ||
| # labels: | ||
| # - <label1> | ||
| # - <label2> | ||
| # docker_options: | | ||
| # --option1 value1 | ||
| # --option2 value2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| name: ascend_tests | ||
|
|
||
| on: | ||
| push: | ||
| branches: ["main"] | ||
| pull_request: | ||
| branches: ["main"] | ||
| workflow_dispatch: | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-${{ github.actor }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| run_tests: | ||
| # Package manager and environment settings are read from .github/configs/ascend.yml | ||
| uses: ./.github/workflows/all_tests_common.yml | ||
| with: | ||
| platform: ascend | ||
|
|
||
| all_tests: | ||
| needs: run_tests | ||
| runs-on: ubuntu-latest | ||
| if: always() | ||
| steps: | ||
| - name: Verify workflow status | ||
| run: | | ||
| if [ "${{ needs.run_tests.result }}" != "success" ]; then | ||
| echo "❌ Tests workflow failed" | ||
| exit 1 | ||
| fi | ||
| echo "✅ All tests passed!" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| name: Common All Tests | ||
|
|
||
| on: | ||
| workflow_call: | ||
| inputs: | ||
| platform: | ||
| required: true | ||
| type: string | ||
| description: Platform name (e.g., cuda, default) | ||
|
|
||
| jobs: | ||
| checkout_and_config: | ||
| defaults: | ||
| run: | ||
| shell: bash | ||
| runs-on: ubuntu-latest | ||
| outputs: | ||
| ci_image: ${{ steps.config.outputs.ci_image }} | ||
| runs_on: ${{ steps.config.outputs.runs_on }} | ||
| container_volumes: ${{ steps.config.outputs.container_volumes }} | ||
| container_options: ${{ steps.config.outputs.container_options }} | ||
| device_types: ${{ steps.config.outputs.device_types }} | ||
| train_test_matrix: ${{ steps.config.outputs.train_test_matrix }} | ||
| ignored_tests: ${{ steps.config.outputs.ignored_tests }} | ||
| steps: | ||
| - name: Checkout source code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Load platform configuration | ||
| id: config | ||
| run: | | ||
| set -euo pipefail | ||
|
|
||
| PLATFORM="${{ inputs.platform }}" | ||
| CONFIG_FILE=".github/configs/${PLATFORM}.yml" | ||
|
|
||
| # Install mikefarah/yq (v4) for YAML parsing | ||
| sudo wget -qO /usr/local/bin/yq https://github.com/mikefarah/yq/releases/download/v4.45.1/yq_linux_amd64 | ||
| sudo chmod +x /usr/local/bin/yq | ||
| /usr/local/bin/yq --version | ||
| echo "Loading configuration from $CONFIG_FILE" | ||
|
|
||
| # Read CI image | ||
| CI_IMAGE=$(yq '.ci_image' "$CONFIG_FILE") | ||
| echo "ci_image=$CI_IMAGE" >> $GITHUB_OUTPUT | ||
|
|
||
| # Read runner labels and format as JSON array | ||
| RUNS_ON=$(yq '.runner_labels | tojson(0)' "$CONFIG_FILE") | ||
| echo "runs_on=$RUNS_ON" >> $GITHUB_OUTPUT | ||
|
|
||
| # Read container volumes and format as JSON array | ||
| VOLUMES=$(yq '.container_volumes | tojson(0)' "$CONFIG_FILE") | ||
| echo "container_volumes=$VOLUMES" >> $GITHUB_OUTPUT | ||
|
|
||
| # Read container options | ||
| OPTIONS=$(yq '.container_options' "$CONFIG_FILE") | ||
| echo "container_options=$OPTIONS" >> $GITHUB_OUTPUT | ||
|
|
||
| # Read device types | ||
| DEVICE_TYPES=$(yq '.device_types | tojson(0)' "$CONFIG_FILE") | ||
| echo "device_types=$DEVICE_TYPES" >> $GITHUB_OUTPUT | ||
|
|
||
| # Read test matrix for training | ||
| TRAIN_MATRIX=$(yq '.test_matrix.functional.train | tojson(0)' "$CONFIG_FILE") | ||
| echo "train_test_matrix=$TRAIN_MATRIX" >> $GITHUB_OUTPUT | ||
|
|
||
| # Read ignored tests list from test_matrix.unit (default to empty array if not defined) | ||
| IGNORED_TESTS=$(yq '.test_matrix.unit.ignored_tests // [] | tojson(0)' "$CONFIG_FILE") | ||
| echo "ignored_tests=$IGNORED_TESTS" >> $GITHUB_OUTPUT | ||
|
|
||
| unit_tests: | ||
| needs: checkout_and_config | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| device: ${{ fromJson(needs.checkout_and_config.outputs.device_types) }} | ||
| uses: ./.github/workflows/unit_tests_common.yml | ||
| name: unit_tests | ||
| with: | ||
| platform: ${{ inputs.platform }} | ||
| device: ${{ matrix.device }} | ||
| image: ${{ needs.checkout_and_config.outputs.ci_image }} | ||
| runs_on: ${{ needs.checkout_and_config.outputs.runs_on }} | ||
| container_volumes: ${{ needs.checkout_and_config.outputs.container_volumes }} | ||
| container_options: ${{ needs.checkout_and_config.outputs.container_options }} | ||
| ignored_tests: ${{ needs.checkout_and_config.outputs.ignored_tests }} | ||
|
|
||
| # arguments.py not compatible with megatron-core-fl | ||
| # functional_tests_train: | ||
| # needs: | ||
| # - checkout_and_config | ||
| # - unit_tests | ||
| # if: fromJson(needs.checkout_and_config.outputs.train_test_matrix)[0] != null | ||
| # uses: ./.github/workflows/functional_tests_train.yml | ||
| # with: | ||
| # platform: ${{ inputs.platform }} | ||
| # test_matrix: ${{ needs.checkout_and_config.outputs.train_test_matrix }} | ||
| # image: ${{ needs.checkout_and_config.outputs.ci_image }} | ||
| # runs_on: ${{ needs.checkout_and_config.outputs.runs_on }} | ||
| # container_volumes: ${{ needs.checkout_and_config.outputs.container_volumes }} | ||
| # container_options: ${{ needs.checkout_and_config.outputs.container_options }} | ||
|
|
||
|
|
||
| all_tests_complete: | ||
| defaults: | ||
| run: | ||
| shell: bash | ||
| needs: | ||
| - checkout_and_config | ||
| - unit_tests | ||
| # - functional_tests_train | ||
| runs-on: ubuntu-latest | ||
| if: always() | ||
| steps: | ||
| - name: Verify all tests passed | ||
| run: | | ||
| # Check all test jobs (skip if not run) | ||
| failed=false | ||
|
|
||
| if [ "${{ needs.unit_tests.result }}" != "success" ]; then | ||
| echo "❌ Unit tests failed" | ||
| failed=true | ||
| fi | ||
|
|
||
| # # Only check functional tests if they ran | ||
| # if [ "${{ needs.functional_tests_train.result }}" != "success" ] && \ | ||
| # [ "${{ needs.functional_tests_train.result }}" != "skipped" ]; then | ||
| # echo "❌ Training functional tests failed" | ||
| # failed=true | ||
| # fi | ||
|
|
||
| if [ "$failed" = "true" ]; then | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "✅ All tests completed successfully!" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| name: cuda_tests | ||
|
|
||
| on: | ||
| push: | ||
| branches: ["main"] | ||
| pull_request: | ||
| branches: ["main"] | ||
| workflow_dispatch: | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-${{ github.actor }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| run_tests: | ||
| # Package manager and environment settings are read from .github/configs/cuda.yml | ||
| uses: ./.github/workflows/all_tests_common.yml | ||
| with: | ||
| platform: cuda | ||
|
|
||
| all_tests: | ||
| needs: run_tests | ||
| runs-on: ubuntu-latest | ||
| if: always() | ||
| steps: | ||
| - name: Verify workflow status | ||
| run: | | ||
| if [ "${{ needs.run_tests.result }}" != "success" ]; then | ||
| echo "❌ Tests workflow failed" | ||
| exit 1 | ||
| fi | ||
| echo "✅ All tests passed!" |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.