Skip to content

Commit 80674bc

Browse files
Eliasj42Elias Josephsaienduri
authored
moved llama benchmark, sglang benchmark, sglang integration, and sdxl to ossci cluster (#971)
moved llama benchmark, sglang benchmark, sglang integration, and sdxl to ossci cluster --------- Signed-off-by: Elias Joseph <[email protected]> Co-authored-by: Elias Joseph <[email protected]> Co-authored-by: saienduri <[email protected]>
1 parent 84a2a3a commit 80674bc

File tree

6 files changed

+11
-14
lines changed

6 files changed

+11
-14
lines changed

.github/workflows/ci-llama-large-tests.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
matrix:
2929
version: [3.11]
3030
fail-fast: false
31-
runs-on: llama-mi300x-1
31+
runs-on: linux-mi300-1gpu-ossci
3232
defaults:
3333
run:
3434
shell: bash

.github/workflows/ci-sdxl.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ env:
3737
jobs:
3838
install-and-test:
3939
name: Install and test
40-
runs-on: mi300x-3
40+
runs-on: linux-mi300-1gpu-ossci
4141

4242
steps:
4343
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

.github/workflows/ci-sglang-benchmark.yml

+6-8
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
matrix:
4141
version: [3.11]
4242
fail-fast: false
43-
runs-on: mi300x-3
43+
runs-on: linux-mi300-1gpu-ossci
4444
defaults:
4545
run:
4646
shell: bash
@@ -82,7 +82,9 @@ jobs:
8282
8383
- name: Login to huggingface
8484
continue-on-error: true
85-
run: huggingface-cli login --token ${{ secrets.HF_TOKEN }}
85+
run: |
86+
pip install -U "huggingface_hub[cli]"
87+
huggingface-cli login --token ${{ secrets.HF_TOKEN }}
8688
8789
- name: Run Shortfin Benchmark Tests
8890
run: |
@@ -101,7 +103,7 @@ jobs:
101103
matrix:
102104
version: [3.11]
103105
fail-fast: false
104-
runs-on: mi300x-3
106+
runs-on: linux-mi300-1gpu-ossci
105107
defaults:
106108
run:
107109
shell: bash
@@ -187,15 +189,11 @@ jobs:
187189
needs: benchmark_sglang
188190
name: "Docker Cleanup"
189191
if: always()
190-
runs-on: mi300x-3
192+
runs-on: linux-mi300-1gpu-ossci
191193
steps:
192194
- name: Stop sglang-server
193195
run: docker stop sglang-server || true # Stop container if it's running
194196

195-
# Deleting image after run due to large disk space requirement (83 GB)
196-
- name: Cleanup SGLang Image
197-
run: docker image rm lmsysorg/sglang:v0.3.5.post1-rocm620
198-
199197
merge_and_upload_reports:
200198
name: "Merge and upload benchmark reports"
201199
needs: [benchmark_shortfin, benchmark_sglang]

.github/workflows/ci-sglang-integration-tests.yml

+1-2
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
matrix:
3030
version: [3.11]
3131
fail-fast: false
32-
runs-on: mi300x-3
32+
runs-on: linux-mi300-1gpu-ossci
3333
defaults:
3434
run:
3535
shell: bash
@@ -69,7 +69,6 @@ jobs:
6969
pip install sentence_transformers
7070
7171
pip freeze
72-
7372
- name: Run Integration Tests
7473
run: |
7574
source ${VENV_DIR}/bin/activate

app_tests/benchmark_tests/llm/sglang_benchmarks/shortfin_benchmark_test.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ def test_shortfin_benchmark(
6060
request,
6161
):
6262
# TODO: Remove when multi-device is fixed
63-
os.environ["ROCR_VISIBLE_DEVICES"] = "1"
63+
os.environ["ROCR_VISIBLE_DEVICES"] = "0"
6464

6565
process, port = server
6666

app_tests/integration_tests/llm/sglang/conftest.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ def model_artifacts(request, tmp_path_factory):
5454

5555
@pytest.fixture(scope="module")
5656
def start_server(request, model_artifacts):
57-
os.environ["ROCR_VISIBLE_DEVICES"] = "1"
57+
os.environ["ROCR_VISIBLE_DEVICES"] = "0"
5858
device_settings = request.param["device_settings"]
5959

6060
server_config = ServerConfig(

0 commit comments

Comments
 (0)