Add Sglang Inference Examples by ybalbert001 · Pull Request #1128 · awslabs/awsome-distributed-ai

ybalbert001 · 2026-06-11T00:56:36Z

Purpose

Provide more references for LLM inference on AWS — add a set of self-contained SGLang serving examples for Amazon EKS / SageMaker HyperPod EKS, covering both prefill/decode (PD) disaggregation and unified
serving across H200 and B300 hardware.

Changes

Kimi2.6 on SGLang (node-level 1P1D) — examples/inference/sglang/kimi2.6-h200-1p1d/. Prefill + decode StatefulSets across 2× ml.p5en.48xlarge (16× H200), KV cache transferred over NIXL (LIBFABRIC/EFA),
fronted by the SGLang router. Image pinned to SGLang v0.5.12.post1. README documents the known NIXL 1.2.0 KV-transfer issue.
DeepSeek V4 Pro on B300 — examples/inference/sglang/dsv4pro-b300-single-node/. Unified (non-PD) baseline on a single B300 (8 GPU) node.
Qwen3.5-27B on B300 (intra-node PD) — examples/inference/sglang/qwen3.5-27b-b300-intra-pd/. 6 prefill + 2 decode in one pod on a single B300 node, NIXL, with an SGLang router sidecar.
Shared SGLang helpers — model pre-staging to local NVMe (download-model.sh + daemonset), DCGM exporter daemonset, and reusable AMP monitoring (setup-amp-monitoring.sh + prometheus-agent-amp.yaml):
in-cluster Prometheus agent → Amazon Managed Prometheus, read by Amazon Managed Grafana.
Infra fix — repoint 7 CloudFormation 1-click deploy templates at the renamed awsome-distributed-ai S3 bucket.

Test Plan

Environment:

AWS Service: SageMaker HyperPod EKS (1.33+)
Instance type: ml.p5en.48xlarge (H200), ml.p6-b300.48xlarge (B300)
Number of nodes: 2 (Kimi 1P1D); 1 (DeepSeek V4 Pro, Qwen3.5-27B)

Test Results

▎ ⚠️ Not yet measured. The per-model READMEs currently carry placeholder (TBD) latency/throughput tables. Benchmark numbers (sglang.bench_serving sweep — burst RPS, P50 TTFT/TPOT, tok/s/GPU) need to be filled
▎ in before these are quoted downstream. The Kimi 1P1D example is also marked [draft-1] pending a working NIXL 1.2.0 KV-transfer path.

Directory Structure

examples/inference/
└── sglang/
├── README.md # engine overview + shared helpers
├── download-model.sh / *-daemonset.yaml
├── dcgm-exporter-daemonset.yaml
├── setup-amp-monitoring.sh / prometheus-agent-amp.yaml
├── kimi2.6-h200-1p1d/ # Dockerfile, build-image.sh, README, kimi-pd-deploy.yaml
├── dsv4pro-b300-single-node/ # README, dsv4pro-deploy.yaml
└── qwen3.5-27b-b300-intra-pd/ # README, qwen-pd-deploy.yaml

Checklist

I have read the contributing guidelines (https://github.com/awslabs/awsome-distributed-training/blob/main/CONTRIBUTING.md).
I am working against the latest main branch.
I have searched existing open and recently merged PRs to confirm this is not a duplicate.
The contribution is self-contained with documentation and scripts.
External dependencies are pinned to a specific version or tag (SGLang v0.5.12.post1; no latest).
A README is included or updated with prerequisites, instructions, and known issues.
New test cases follow the expected directory structure (#directory-structure).

- Switch base image from the rolling `lmsysorg/sglang:dev-cu13` nightly to the pinned `lmsysorg/sglang:v0.5.12.post1-cu130` release. The nightly shipped NIXL 1.2.0, whose LIBFABRIC GPU HMEM path made prefill->decode KV cache transfer unreliable; the release pins NIXL 1.1.0, which transfers correctly. - Fix build-image.sh: dockerfilename was `dockerfile` (lowercase), which fails to match `Dockerfile` on case-sensitive Linux, so the image never rebuilt against the edited Dockerfile. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Move the Prometheus-agent → Amazon Managed Prometheus (AMP) setup out of the kimi2.6-h200-1p1d deploy YAML and into shared, reusable files at the sglang/ level so any sample can opt in: - prometheus-agent-amp.yaml: in-cluster Prometheus agent that scrapes sglang-metrics / dcgm-metrics pods and remote-writes to AMP via SigV4. - setup-amp-monitoring.sh: idempotent one-shot — create/reuse AMP workspace, enable OIDC, create the ingest IAM role + ServiceAccount, render and apply the agent manifest. - README: rewrite the "GPU metrics" section into a full Monitoring section (AMP agent, DCGM exporter, Amazon Managed Grafana data source). Document the nvidia.com/gpu.present node-label prerequisite — HyperPod nodes don't carry it by default, so the DCGM DaemonSet stays at DESIRED 0 until labeled. - kimi2.6-h200-1p1d: drop the inlined Prometheus-agent block (-193 lines) and point the README at the shared manifests instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…issue Add Indicative results (placeholder), Prerequisites (Cluster/Software), and a Quick start, matching the dsv3-uccl-nixl README structure. Correct the Dockerfile base image (v0.5.12.post1-cu130, not dev-cu13) and drop the stale download-model repo_id edit step. Note that the SGLang nightly's NIXL 1.2.0 breaks prefill->decode KV-cache transfer over EFA, which is why the image is pinned to v0.5.12.post1 (NIXL 1.1.0). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

KeitaW

Review Batch 1/5 — Structure & Repository Hygiene

Themed batch covering structure, reuse, and image hygiene. Two inline comments accompany this batch (the :latest tags); the rest are cross-cutting and live here in the body.

Observability duplicates an existing repo asset — reuse it instead of shipping a parallel stack

Files: prometheus-agent-amp.yaml, setup-amp-monitoring.sh, dcgm-exporter-daemonset.yaml, READMEs

This PR adds ~400 lines of bespoke monitoring (in-cluster Prometheus agent → AMP via SigV4, an IAM/workspace setup script, a DCGM DaemonSet). The repo already solves this at 4.validation_and_observability/4.prometheus-grafana/eks-managed-observability/ via an ADOT collector (adot-collector-prometheus.yaml, deploy-obs.sh/cleanup-obs.sh, DCGM configs, dashboards). Per the checklist's "Reuse existing assets," I'd drop the bespoke files and have the READMEs link to that as a prerequisite.

Two even-lower-maintenance managed paths, which I'd recommend over both:

Plain EKS: the AWS managed collector / agentless scraper (aws amp create-scraper) — AWS runs the scraper outside the cluster (no in-cluster agent to deploy/patch/HA). It's the only option in the EKS console's "Turn on Prometheus metrics" wizard, and covers everything this agent does for SGLang serving.
HyperPod EKS: the SageMaker HyperPod observability add-on — managed DCGM/metrics + AMP + Grafana dashboards (add one scrape job for SGLang's :30000/metrics).

Adopting any managed path also retires four findings on prometheus-agent-amp.yaml (the server-mode bug, the :latest pin, the DCGM root context, and the monitoring share of the privileged concern).

Use one shared SGLang image across all three models instead of three divergent ones

Files: dsv4pro-b300-single-node/dsv4pro-deploy.yaml, qwen3.5-27b-b300-intra-pd/qwen-pd-deploy.yaml, kimi2.6-h200-1p1d/{Dockerfile,build-image.sh,kimi-pd-deploy.yaml}

The three examples ship three different engine images for the same SGLang runtime: Qwen → lmsysorg/sglang:v0.5.12.post1-cu130 (pinned, no EFA layer); Kimi → a custom ECR build (pinned base + EFA installer); DeepSeek → lmsysorg/sglang:deepseek-v4-b300 (opaque, unversioned tag). The model is a runtime arg (--model-path), so one image serves all three.

deepseek-v4-b300 is worth flagging on its own: it's not a version, it was pushed 2026-04-29 — a month before the v0.5.12.post1-cu130 release the PR pins to, and it's larger (18.1 GB vs 13.0 GB). So the DeepSeek example runs an older, non-pinned build with an unknown NIXL version, quietly contradicting the PR's "pinned for NIXL 1.1.0" rationale.

Suggestion: promote the Kimi Dockerfile (pinned base + EFA installer) to the shared examples/inference/sglang/ level, build once, and have all three manifests reference it — selecting the model via --model-path. Bonus: gives Qwen the EFA/LIBFABRIC path it currently lacks. If deepseek-v4-b300 carries a needed patch, please state what.

`build-image.sh` is missing its shebang and the MIT-0 license header

File: kimi2.6-h200-1p1d/build-image.sh

It starts straight at algorithm_name=sgl-dev-cu13 — no #!/usr/bin/env bash, no copyright header (and no set -euo pipefail, see the Deployment batch). The sibling download-model.sh / setup-amp-monitoring.sh get this right. Suggested top-of-file:

#!/usr/bin/env bash
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0
set -euo pipefail

KeitaW · 2026-06-11T13:24:56Z

+            --enable-metrics
+      # ---- PD router (shares pod network → reaches engines on 127.0.0.1) ---
+      - name: router
+        image: lmsysorg/sgl-model-gateway:latest


:latest image tag on the router sidecar

lmsysorg/sgl-model-gateway:latest (and prom/prometheus:latest, and the DeepSeek router) pulls a moving tag. The repo convention is fixed tags everywhere so deployments are reproducible and air-gapped clusters don't break — I'd pin to a released tag (ideally a digest). The SGLang engine image is already pinned correctly; only the sidecars float.

KeitaW · 2026-06-11T13:24:56Z

+      serviceAccountName: amp-iamproxy-ingest-service-account
+      containers:
+      - name: prometheus
+        image: prom/prometheus:latest


prom/prometheus:latest

Same :latest concern — pin Prometheus to a released tag so the agent is reproducible. (This pin is also the root cause of the agent-runs-in-server-mode bug flagged in the Deployment batch: :latest is now 3.x, where the --enable-feature=agent flag was removed.)

KeitaW

Review Batch 2/5 — Deployment Pipeline & K8s Operational Correctness

Operational-correctness batch. Three inline comments accompany it (the Prometheus server-mode bug, the Qwen UCX note, the Kimi imagePullPolicy); the cross-cutting items are below.

`nodeSelector` pins the `ml.`-prefixed instance type — only matches HyperPod, not plain EKS

Files: kimi-pd-deploy.yaml (91, 200), dsv4pro-deploy.yaml (48), qwen-pd-deploy.yaml (51), download-model-daemonset.yaml (31, via ${INSTANCE_TYPE})

Every serving manifest pins node.kubernetes.io/instance-type: ml.p5en.48xlarge / ml.p6-b300.48xlarge. That ml. prefix is the HyperPod instance-group form; a plain EKS managed nodegroup labels the same key with the bare EC2 type (p6-b300.48xlarge). I verified the bare form on a real B300 EKS node and a HyperPod EKS system node — the pod sits Pending on plain EKS until changed. Since the READMEs advertise "EKS / HyperPod EKS," it should run on both. A nodeSelector can't express OR; nodeAffinity with In can (same key, two values):

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node.kubernetes.io/instance-type
                operator: In
                values:
                - p6-b300.48xlarge      # plain EKS managed nodegroup
                - ml.p6-b300.48xlarge   # SageMaker HyperPod instance group

(Use the p5en pair for Kimi.) I confirmed this scheduled+served on a plain-EKS B300 node. No single cross-environment label is reliable (sagemaker.amazonaws.com/instance-type is HyperPod-only; nvidia.com/gpu.product needs the GPU Operator, which HyperPod doesn't run by default).

A DaemonSet is the wrong primitive for model-weight download

Files: download-model-daemonset.yaml, download-model.sh

A DaemonSet can't model a one-shot "download once and stop": its pods must use restartPolicy: Always, so a download that exits 0 is restarted in a loop, and on node scale-out/replacement it re-downloads — with no completion signal beyond a manual kubectl delete daemonset. It's also novel here: on this base branch every model/dataset download is a Job or script (examples/training/{verl,nemo-rl,optimum-neuron}); the only DaemonSets are the EFA/health exporters. The sibling examples/inference/vllm/dsv3-uccl-nixl (which the Kimi README modeled on) uses no downloader at all — the serving pod downloads on first start to HF_HOME on local NVMe, with FSx for Lustre documented as the alternative (~680 GB loads in ~5 min local; +1–2 min on FSx first node). Recommend: single-node → download-on-startup or an initContainer; 2-node Kimi → initContainer/run-once Job, or FSx (download once, mount RO on both). Local-NVMe staging itself is fine — it's the DaemonSet wrapper that's wrong; "use a Job" is not the fix (a Job can't run one-pod-per-node).

`build-image.sh` lacks `set -euo pipefail` and quotes on some expansions

File: kimi2.6-h200-1p1d/build-image.sh

Without set -euo pipefail, a failed docker build/aws ecr doesn't stop the script — it continues to docker push a stale/missing image. A couple of expansions are unquoted (--region $region, -t ${algorithm_name}). Add strict mode (see the header suggestion in the Structure batch) and quote the expansions.

No `livenessProbe` on the serving deployments

Files: all four serving manifests

They define a readinessProbe but no livenessProbe. A wedged engine (CUDA hang, NIXL stuck in KVPoll.WaitingForInput) stops passing readiness and drops out of rotation, but nothing restarts the pod — it sits Running forever. Add a livenessProbe hitting /health on the foreground engine port.

KeitaW · 2026-06-11T13:25:00Z

+          - '--config.file=/etc/prometheus/prometheus.yml'
+          - '--storage.tsdb.path=/prometheus'
+          - '--storage.tsdb.retention.time=2h'
+          - '--enable-feature=agent'


Runs in server mode, not agent mode as written

prom/prometheus:latest is now 3.12.0, and Prometheus removed --enable-feature=agent at 3.0 (it became --agent). I ran it to confirm — on 3.12 the manifest logs WARN "Unknown option for --enable-feature" option=agent then Starting Prometheus Server mode=server. So the flag is silently ignored and Prometheus starts as a full server with a local TSDB (written to the emptyDir), not the lightweight agent the file's header describes. It doesn't crash (remote_write still works in server mode) so it looks healthy while doing the wrong thing. Pin a real version and fix the flag (also pin the image per the Structure batch):

Suggested change

- '--enable-feature=agent'

- '--agent'

The ConfigMap (scrape_configs + remote_write, no alerting/rules) is agent-mode-compatible. Adjacent notes: replicas: 1 is a SPOF (an HA pair needs cluster+__replica__ labels or AMP bills 2×), and the WAL on emptyDir loses buffered samples on eviction — both moot if you adopt a managed path.

KeitaW · 2026-06-11T13:25:00Z

+          capabilities:
+            add: ["SYS_NICE"]
+        env:
+        - name: NIXL_LOG_LEVEL


Benign "use LIBFABRIC" warning — worth one README line (nit)

Not a correctness issue. No SGLANG_DISAGGREGATION_NIXL_BACKEND is set, so NIXL uses UCX — which is correct here (single-node intra-node PD; the KV hop never crosses EFA). When I ran this on a B300 node it served correctly, but the engine logs 16 Amazon EFA(s) were detected, but the UCX backend was configured ... recommended to use the LIBFABRIC backend instead. A one-line README note ("intra-node PD uses UCX; the EFA-detected warning is expected/benign") saves the next person from chasing it. No manifest change needed.

KeitaW · 2026-06-11T13:25:00Z

+      containers:
+      - name: sglang
+        image: <YOUR_ECR_IMAGE>   # the URI build-image.sh printed, e.g. <account>.dkr.ecr.<region>.amazonaws.com/sgl-dev-cu13:<tag>
+        imagePullPolicy: Always


imagePullPolicy: Always forces a pull on every start

Both the prefill and decode StatefulSets use Always (also line 211), which re-pulls a multi-GB CUDA image on every pod start and breaks air-gapped clusters. The image is a pinned ECR tag, so IfNotPresent is right.

Suggested change

imagePullPolicy: Always

imagePullPolicy: IfNotPresent

KeitaW

Review Batch 3/5 — Infrastructure, NCCL & Container Security

Container-security and NCCL batch. One inline comment (DCGM root rationale); the rest below.

`privileged: true` on all SGLang serving containers

Files: dsv4pro-deploy.yaml (54), qwen-pd-deploy.yaml (58), both Kimi StatefulSets

Every engine container runs fully privileged — all capabilities, host device access, isolation off. The workload needs some elevation (EFA/RDMA pinned memory, GPUDirect via gdrdrv, SYS_NICE), but privileged: true is a much bigger hammer. I'd scope to the specific capabilities (IPC_LOCK, SYS_NICE) plus the device mounts already declared, and drop privileged. If full privilege is genuinely required for the NIXL/EFA path, a one-line comment stating why makes it a deliberate, reviewable choice.

`NCCL_SOCKET_IFNAME` is unset (minor, multi-node Kimi only)

File: kimi2.6-h200-1p1d/kimi-pd-deploy.yaml

Moot for single-node (TP on NVLink) and the cross-node KV hop uses NIXL not NCCL — low priority. But for the 2-node Kimi setup, if NCCL ever initializes a cross-node control channel, the default interface pick can land on the wrong NIC. Cheap guard: set NCCL_SOCKET_IFNAME=^lo (exclusion form, never positive selection like eth0). See the EFA cheatsheet.

KeitaW · 2026-06-11T13:25:04Z

+          hostPort: 9400
+        securityContext:
+          runAsNonRoot: false
+          runAsUser: 0


DCGM runs as root — add a one-line rationale

runAsNonRoot: false + runAsUser: 0 is the standard DCGM requirement (host GPU/driver access), so this is likely fine — but the checklist asks for a rationale comment so a future reader doesn't "fix" it. A short # DCGM requires root for host GPU access above the securityContext does it.

KeitaW

Review Batch 4/5 — Documentation Consistency

Documentation batch (no inline comments — both items are PR-level).

PR description lists a CloudFormation change that isn't in the diff

The PR body's change #5 says "repoint 7 CloudFormation 1-click deploy templates at the renamed awsome-distributed-ai S3 bucket," but the diff touches only files under examples/inference/sglang/ (15 files) — no CloudFormation templates. Either that work was dropped from this branch or belongs to a different PR; I'd reconcile the description with the diff so reviewers and the changelog aren't misled.

Base branch is `worktree-repo-reorg`, not `main`

This PR targets worktree-repo-reorg. If that's intentional stacking onto an in-flight reorg, all good — just worth confirming the merge target is what you want.

KeitaW

Review Batch 5/5 — Evaluation, Positives & Sources

Final batch — evaluation framing, what's great, and the source list.

Placeholder benchmark tables — framing is correct, just don't quote them yet

The per-model READMEs ship TBD latency/throughput tables but explicitly label them "Not yet measured" and the Kimi example "[draft-1]." That honest framing is exactly right — this is "validates the deployment runs," not "reproduces a published result," so no methodology obligation is triggered. Just make sure the TBD numbers aren't quoted downstream until filled. My e2e run confirms the Qwen example serves correctly end-to-end.

Things That Look Great

Directory placement is correct — examples/inference/sglang/<model>/ follows the RFC #1056 "library is the demo subject" rule, with shared helpers one level up.
Excellent shared-helper reuse — download-model.sh, the DCGM daemonset, and the AMP monitoring are factored out and reused across all three examples; extracting the inlined Prometheus block into shared manifests was the right call.
Exemplary engine-image pinning with rationale — lmsysorg/sglang:v0.5.12.post1-cu130 is pinned and the README explains why (NIXL 1.1.0 vs the nightly's 1.2.0).
download-model.sh / setup-amp-monitoring.sh have the shebang, MIT-0 header, set -euo pipefail, and a properly whitelisted envsubst.
Honest results framing — "Not yet measured" / "[draft-1]" instead of fabricated numbers.
Thorough READMEs — prerequisites, quick start, monitoring, and a genuinely useful "known issues" section (the DCGM nvidia.com/gpu.present node-label caveat is a real gotcha).
It works — I deployed qwen3.5-27b-b300-intra-pd (adapted for plain EKS) on a real B300 node; it came up 2/2 Ready in ~6 min and served a correct completion through the PD router. The intra-node prefill/decode + NIXL topology is sound.

Sources

Kubernetes: node affinity, probes, imagePullPolicy, SecurityContext, DaemonSet, Job, initContainers.
AWS: AMP ingest, managed scraper, HyperPod observability add-on, ADOT on EKS, FSx for Lustre CSI.
Prometheus: agent mode feature flag, 3.0 migration. Verified live: prom/prometheus:latest == 3.12.0 → mode=server on --enable-feature=agent (2026-06-11).
Repo precedent: eks-managed-observability, dsv3-uccl-nixl, download-Job precedents under examples/training/{verl,nemo-rl,optimum-neuron}.

Several findings here are evidence-backed by a live B300 deploy and a prom/prometheus:latest flag test, plus dedicated research on weight-staging and observability patterns.

KeitaW

Few comments

Li and others added 4 commits June 9, 2026 08:06

Add SGLang inference examples for HyperPod EKS [draft-1]

d656d07

KeitaW self-requested a review June 11, 2026 01:51

KeitaW reviewed Jun 11, 2026

View reviewed changes

KeitaW requested changes Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sglang Inference Examples #1128

Add Sglang Inference Examples #1128
ybalbert001 wants to merge 4 commits into
awslabs:worktree-repo-reorgfrom
ybalbert001:worktree-repo-reorg

ybalbert001 commented Jun 11, 2026

Uh oh!

KeitaW left a comment

Uh oh!

KeitaW Jun 11, 2026

Uh oh!

KeitaW Jun 11, 2026

Uh oh!

KeitaW left a comment

Uh oh!

KeitaW Jun 11, 2026

Uh oh!

KeitaW Jun 11, 2026

Uh oh!

KeitaW Jun 11, 2026

Uh oh!

KeitaW left a comment

Uh oh!

KeitaW Jun 11, 2026

Uh oh!

KeitaW left a comment

Uh oh!

KeitaW left a comment

Uh oh!

KeitaW left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ybalbert001 commented Jun 11, 2026

Purpose

Changes

Test Plan

Test Results

Directory Structure

Checklist

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Review Batch 1/5 — Structure & Repository Hygiene

Observability duplicates an existing repo asset — reuse it instead of shipping a parallel stack

Use one shared SGLang image across all three models instead of three divergent ones

build-image.sh is missing its shebang and the MIT-0 license header

Uh oh!

KeitaW Jun 11, 2026

Choose a reason for hiding this comment

:latest image tag on the router sidecar

Uh oh!

KeitaW Jun 11, 2026

Choose a reason for hiding this comment

prom/prometheus:latest

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Review Batch 2/5 — Deployment Pipeline & K8s Operational Correctness

nodeSelector pins the ml.-prefixed instance type — only matches HyperPod, not plain EKS

A DaemonSet is the wrong primitive for model-weight download

build-image.sh lacks set -euo pipefail and quotes on some expansions

No livenessProbe on the serving deployments

Uh oh!

KeitaW Jun 11, 2026

Choose a reason for hiding this comment

Runs in server mode, not agent mode as written

Uh oh!

KeitaW Jun 11, 2026

Choose a reason for hiding this comment

Benign "use LIBFABRIC" warning — worth one README line (nit)

Uh oh!

KeitaW Jun 11, 2026

Choose a reason for hiding this comment

imagePullPolicy: Always forces a pull on every start

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Review Batch 3/5 — Infrastructure, NCCL & Container Security

privileged: true on all SGLang serving containers

NCCL_SOCKET_IFNAME is unset (minor, multi-node Kimi only)

Uh oh!

KeitaW Jun 11, 2026

Choose a reason for hiding this comment

DCGM runs as root — add a one-line rationale

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Review Batch 4/5 — Documentation Consistency

PR description lists a CloudFormation change that isn't in the diff

Base branch is worktree-repo-reorg, not main

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Review Batch 5/5 — Evaluation, Positives & Sources

Placeholder benchmark tables — framing is correct, just don't quote them yet

Things That Look Great

Sources

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`build-image.sh` is missing its shebang and the MIT-0 license header

`:latest` image tag on the router sidecar

`prom/prometheus:latest`

`nodeSelector` pins the `ml.`-prefixed instance type — only matches HyperPod, not plain EKS

`build-image.sh` lacks `set -euo pipefail` and quotes on some expansions

No `livenessProbe` on the serving deployments

`imagePullPolicy: Always` forces a pull on every start

`privileged: true` on all SGLang serving containers

`NCCL_SOCKET_IFNAME` is unset (minor, multi-node Kimi only)

Base branch is `worktree-repo-reorg`, not `main`