test(mlx): add MLX backend E2E tests + macos-latest CI workflow by key4ng · Pull Request #1398 · lightseekorg/smg

key4ng · 2026-04-27T03:08:11Z

Description

Problem

#1099 added the MLX gRPC servicer (grpc_servicer/smg_grpc_servicer/mlx/). That PR was deliberately split — servicer-only — to keep the review focused. This PR follows up with the E2E test suite and CI workflow.

Solution

Five Apple-Silicon-only E2E tests exercising the SMG router → gRPC → MLX servicer pipeline, plus a macos-latest GitHub Actions workflow that runs them on PR.

Changes

Tests (`e2e_test/chat_completions/test_mlx_backend.py`)

Test	What it covers
`test_basic_chat`	Non-streaming chat completion through the full pipeline
`test_streaming_chat`	Streaming yields incremental delta chunks
`test_tool_calling`	Qwen3 native `<tool_call>` tags + SMG's `qwen` ToolParser
`test_reasoning_thinking_mode`	Qwen3 thinking mode populates `reasoning_content`
`test_max_tokens_finish_reason`	`max_tokens` reached → `finish_reason="length"`

Module-level skip on non-Apple-Silicon (sys.platform != "darwin" or platform.machine() != "arm64").

Test infra plumbing (`e2e_test/infra/`)

constants.py — Runtime.MLX, is_mlx() helper, RUNTIME_LABELS["mlx"], MLX added to LOCAL_RUNTIMES.
__init__.py — re-exports is_mlx.
model_specs.py — entry for mlx-community/Qwen3-0.6B-4bit (~400 MB; smallest model that combines native tool calling + thinking mode).
worker.py — _build_mlx_cmd(). gRPC-only (rejects HTTP — the MLX servicer doesn't expose an HTTP API).

CI (`.github/workflows/pr-test-mlx.yml`)

Runs on macos-latest (Apple Silicon) GitHub-hosted runners.

Triggers only on MLX-related path changes
Caches Rust target (Swatinem/rust-cache) and HF model bundle (actions/cache)
Uses maturin build --profile ci for SMG Python bindings (faster than release, fast enough for correctness tests)
Estimated: first run ~12-15 min, cached runs ~4-5 min

Test Plan

# Local on Apple Silicon
E2E_RUNTIME=mlx pytest e2e_test/chat_completions/test_mlx_backend.py -v

5/5 passes locally against the MLX servicer already on main.

Checklist

cargo +nightly fmt passes (no Rust changes)
cargo clippy --all-targets --all-features -- -D warnings passes (no Rust changes)
E2E test verified locally
(Optional) Documentation updated
(Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

New Features
- Added MLX as a supported inference runtime and included an MLX-only model in the test matrix.
Tests
- Added macOS Apple Silicon end-to-end tests for the MLX gRPC backend covering chat, streaming, function/tool calling, reasoning/thinking mode, and token-limit behavior.
Chores
- Added a dedicated CI workflow to run the MLX E2E tests on macOS runners with artifact capture on failure.

gemini-code-assist · 2026-04-27T03:08:16Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-04-27T03:08:24Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Introduces MLX (a local inference runtime) support to the end-to-end testing infrastructure by adding a dedicated CI workflow for macOS, extending runtime constants and predicates, updating model specifications with a Qwen3 model, wiring MLX gRPC command generation in the worker, and implementing comprehensive E2E tests validating chat, streaming, tool calling, reasoning, and token truncation behavior.

Changes

MLX E2E Addition

Layer / File(s)	Summary
Runtime constants & exports `e2e_test/infra/constants.py`, `e2e_test/infra/__init__.py`	Add `Runtime.MLX`, include it in `LOCAL_RUNTIMES`, add `is_mlx()` predicate, expand `RUNTIME_LABELS`, and re-export `is_mlx` via `e2e_test.infra`.
Model specs `e2e_test/infra/model_specs.py`	Register `mlx-community/Qwen3-0.6B-4bit` model spec with resolved model path, `tp=1`, and features: `chat`, `streaming`, `function_calling`, `reasoning`, `thinking`.
Worker & Command Building `e2e_test/infra/worker.py`	Add `engine=="mlx"` branch and `_build_mlx_cmd` enforcing GRPC-only mode, building `smg_grpc_servicer.mlx.server` invocation and applying optional `spec["mlx_args"]`.
MLX E2E Test Suite `e2e_test/mlx/test_mlx_backend.py`	Add Apple Silicon–gated tests with `WEATHER_TOOL` and `TestMlxBackend` covering non-streaming chat, streaming, tool calling (JSON args), thinking-mode, and max-token truncation checks.
CI Workflow `.github/workflows/pr-test-mlx.yml`	New GitHub Actions workflow to run MLX E2E on `macos-latest`: sets up Python 3.12 and Rust, installs build tools, builds/install SMG bindings, generates Python types, runs import checks and targeted pytest, and uploads `e2e-logs/` on failure/cancel with concurrency control.

Sequence Diagram

sequenceDiagram
    participant Test as Test Suite
    participant Worker as Worker
    participant Builder as Command Builder
    participant gRPC as gRPC Server
    participant MLX as MLX Engine

    Test->>Worker: initialize(engine="mlx", model_path, grpc)
    Worker->>Builder: _build_cmd("mlx", mode=GRPC)
    Builder->>Builder: _build_mlx_cmd(model_path, host, port)
    Builder-->>Worker: command ready
    Worker->>gRPC: spawn server process
    gRPC->>MLX: load model(Qwen3-0.6B-4bit)
    MLX-->>gRPC: model loaded
    gRPC-->>Worker: server listening
    Test->>gRPC: send chat request
    gRPC->>MLX: process request
    MLX-->>gRPC: return response
    gRPC-->>Test: return chat/stream response
    Test->>Test: validate output (tokens, finish_reason)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

lightseekorg/smg#1099: Overlapping MLX support changes across CI, runtime enums, model specs, worker wiring, and E2E tests.
feat: add Anthropic Messages API e2e tests #358: Adds a new runtime and matching E2E infra changes (similar pattern to this runtime addition).
lightseekorg/smg#411: Modifies MODEL_SPECS entries—related to model registration changes.

Suggested reviewers

slin1237
XinyueZhang369
CatherineSue

Poem

🐰 A tiny rabbit hops to see,
MLX tests on macOS tree,
gRPC whispers, models stride,
Tools and thinking verified,
Logs saved, CI runs with glee.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding MLX backend E2E tests and a macos-latest CI workflow, which directly corresponds to the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch keyang/mlx-e2e-tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-04-27T03:12:02Z

        elif self.engine == "trtllm":
            return self._build_trtllm_cmd(model_path, tp_size, spec)
+        elif self.engine == "mlx":
+            return self._build_mlx_cmd(model_path, spec)


🟡 Nit: _build_env (line 328) unconditionally sets CUDA_VISIBLE_DEVICES for every engine, including MLX which runs on Apple Metal — not CUDA. It's harmless on macOS (the var is simply ignored), but if you want to keep things tidy you could skip it for MLX:

if self.engine != "mlx": env["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, self.gpu_ids))

Not blocking — just a readability thing for the next person who reads _build_env.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pr-test-mlx.yml:
- Around line 6-16: The workflow's paths filter in pr-test-mlx.yml is missing
the e2e_test/infra/__init__.py file which allows changes there to skip the MLX
CI; update the "paths" array in that workflow to include
e2e_test/infra/__init__.py (or better, add the broader pattern
e2e_test/infra/**) so changes to the package initializer don't bypass the job.

In `@e2e_test/chat_completions/test_mlx_backend.py`:
- Around line 9-10: Update the runner wording in the comment that currently
reads "macos-14" to "macos-latest" so it matches the workflow configuration;
locate the string "macos-14" in test_mlx_backend.py (the comment referencing CI
runner) and replace it with "macos-latest" and optionally mention the workflow
file ".github/workflows/pr-test-mlx.yml" for clarity.

In `@e2e_test/infra/worker.py`:
- Around line 181-183: Update any docstrings and inline comments that enumerate
supported engine values to include the new "mlx" option: specifically update the
Worker.engine comment/annotation and the start_workers function argument docs to
mention "mlx" alongside existing engines (e.g., "local", "remote", etc.); search
for other user-facing comments or README sections that list engine options and
add "mlx" there so the documentation stays consistent with the new branch that
calls _build_mlx_cmd.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ec056812-4d40-44e7-9b43-d1f26cde1499

📥 Commits

Reviewing files that changed from the base of the PR and between e4d4452 and 987e121.

📒 Files selected for processing (6)

.github/workflows/pr-test-mlx.yml
e2e_test/chat_completions/test_mlx_backend.py
e2e_test/infra/__init__.py
e2e_test/infra/constants.py
e2e_test/infra/model_specs.py
e2e_test/infra/worker.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 987e121b39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T03:14:04Z

+      - "e2e_test/infra/model_specs.py"
+      - "e2e_test/infra/worker.py"
+      - "e2e_test/infra/constants.py"


Add infra init to MLX workflow path filters

The new workflow watches several MLX-related E2E infra files but omits e2e_test/infra/__init__.py, even though this commit changes that module and e2e_test/conftest.py imports from infra at startup. A future PR that only updates infra/__init__.py can break MLX test startup/import resolution without triggering this workflow, so regressions can merge untested.

Useful? React with 👍 / 👎.

coderabbitai

♻️ Duplicate comments (1)

e2e_test/chat_completions/test_mlx_backend.py (1)
9-10: ⚠️ Potential issue | 🟡 Minor

Align CI runner wording with workflow configuration.

Line 9 still says macos-14, while this PR’s workflow uses macos-latest. Please keep them consistent.
Suggested patch
-CI runs on macos-14 (Apple Silicon) with the smallest Qwen3 model (~400 MB).
+CI runs on macos-latest (Apple Silicon) with the smallest Qwen3 model (~400 MB).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e_test/chat_completions/test_mlx_backend.py` around lines 9 - 10, Update
the CI runner wording in the test file to match the workflow: replace the string
"macos-14" with "macos-latest" in e2e_test/chat_completions/test_mlx_backend.py
(the comment at the top referencing the CI runner) so the file text aligns with
.github/workflows/pr-test-mlx.yml.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@e2e_test/chat_completions/test_mlx_backend.py`:
- Around line 9-10: Update the CI runner wording in the test file to match the
workflow: replace the string "macos-14" with "macos-latest" in
e2e_test/chat_completions/test_mlx_backend.py (the comment at the top
referencing the CI runner) so the file text aligns with
.github/workflows/pr-test-mlx.yml.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 84db8469-5a0c-490c-9df3-e7dd9eec889c

📥 Commits

Reviewing files that changed from the base of the PR and between 987e121 and 54db5f4.

📒 Files selected for processing (1)

e2e_test/chat_completions/test_mlx_backend.py

CatherineSue · 2026-04-29T00:31:55Z

@@ -0,0 +1,165 @@
+"""MLX backend E2E tests (Apple Silicon only).


I'm not sure about the file location here. Our chat completions is pretty self contained to all backends at the moment. Introducing a new file here is kind orthogonal to the current structure.

Good call — moved in 1fca167. The file now lives at e2e_test/mlx/test_mlx_backend.py, mirroring the existing concern-scoped sibling dirs e2e_test/router/ and e2e_test/bindings_go/. Folding into the existing chat_completions/ engine parameterization would have required Apple-Silicon-specific infra surgery (different model, different runner, no GPU) that doesn't fit the @pytest.mark.engine("sglang", "vllm", "trtllm") shape, so a backend-scoped sibling dir is the right structural fix. The CI workflow path filters and the pytest invocation in .github/workflows/pr-test-mlx.yml were updated to the new path; module docstring carries the rationale for future readers.

@CatherineSue

Addresses @CatherineSue's review on PR #1398: e2e_test/chat_completions/ is organized by API surface (basic chat, function-calling, multimodal, structured-output, validation, …) and each file parameterizes over engines via @pytest.mark.engine("sglang", "vllm", "trtllm"). Dropping test_mlx_backend.py in there was orthogonal to that structure — the file is scoped to one backend that bundles several API behaviors, not one API behavior across engines. Move it to a sibling concern-scoped directory `e2e_test/mlx/`, mirroring the existing precedents `e2e_test/router/` and `e2e_test/bindings_go/` which are also top-level dirs scoped by concern rather than API surface. The MLX tests can't fold into the chat_completions/ engine parameterization without significant infra surgery anyway: they require Apple Silicon macOS, a different model (mlx-community/Qwen3-0.6B-4bit vs meta-llama/Llama-3.1-8B-Instruct), and a different runner. The CI is already separated via .github/workflows/pr-test-mlx.yml on macos-latest, so the test file's location should match its scope. Changes: - git mv e2e_test/chat_completions/test_mlx_backend.py e2e_test/mlx/test_mlx_backend.py - Add e2e_test/mlx/__init__.py - Update .github/workflows/pr-test-mlx.yml path filters and the pytest invocation to point at the new location - Module docstring updated with the rationale for living outside the API-surface dirs and the corrected `pytest e2e_test/mlx/...` command No code or test logic changes — only file location and path references. Signed-off-by: key4ng <rukeyang@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57a65e791c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-30T02:58:17Z

+        content = msg.content or ""
+        reasoning = getattr(msg, "reasoning_content", None) or ""
+        full = content + reasoning
+        assert "3" in full, (


Assert reasoning output in thinking-mode coverage

This test is intended to verify that MLX thinking mode populates reasoning_content, but the current assertion only checks whether '3' appears in content + reasoning. If the reasoning parser/regression causes reasoning_content to be empty (or never emitted), the test can still pass as long as the final answer includes 3, so CI would miss exactly the failure this case is supposed to catch.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

.github/workflows/pr-test-mlx.yml (1)

6-16: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Path filters still miss MLX infra initializer changes.

e2e_test/infra/__init__.py changes can still bypass this workflow. Please include it (or broaden to e2e_test/infra/**) in both trigger blocks.

Suggested trigger fix

   push:
     branches: [main]
     paths:
+      - "e2e_test/infra/__init__.py"
       - "crates/grpc_client/proto/mlx_engine.proto"
       - "crates/grpc_client/src/mlx_engine.rs"
       - "crates/grpc_client/python/**"
       - "grpc_servicer/smg_grpc_servicer/mlx/**"
       - "grpc_servicer/pyproject.toml"
       - "e2e_test/mlx/test_mlx_backend.py"
       - "e2e_test/infra/model_specs.py"
       - "e2e_test/infra/worker.py"
       - "e2e_test/infra/constants.py"
       - ".github/workflows/pr-test-mlx.yml"
   pull_request:
     branches: [main]
     types: [opened, synchronize, reopened]
     paths:
+      - "e2e_test/infra/__init__.py"
       - "crates/grpc_client/proto/mlx_engine.proto"
       - "crates/grpc_client/src/mlx_engine.rs"
       - "crates/grpc_client/python/**"
       - "grpc_servicer/smg_grpc_servicer/mlx/**"
       - "grpc_servicer/pyproject.toml"
       - "e2e_test/mlx/test_mlx_backend.py"
       - "e2e_test/infra/model_specs.py"
       - "e2e_test/infra/worker.py"
       - "e2e_test/infra/constants.py"
       - ".github/workflows/pr-test-mlx.yml"

Also applies to: 20-30

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pr-test-mlx.yml around lines 6 - 16, The workflow's path
filters miss changes to e2e_test/infra/__init__.py so update the workflow
triggers in .github/workflows/pr-test-mlx.yml: in both trigger blocks (push and
pull_request) include either the specific file "e2e_test/infra/__init__.py" or
broaden the rule to "e2e_test/infra/**" so modifications to the infra
initializer cannot bypass the MLX test workflow.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pr-test-mlx.yml:
- Around line 66-119: Add a cache step for the Hugging Face model cache
directory before the "Run MLX E2E tests" step: create a step named like "Cache
Hugging Face models" using actions/cache@v4 that caches ~/.cache/huggingface
(and subdirs like hub and transformers), with a stable key (e.g. "${{ runner.os
}}-hf-${{ hashFiles('**/requirements*.txt') }}-${{ github.ref }}" ) and sensible
restore-keys; ensure the job sets HF_HOME or exports the same cache path so the
MLX test run (the "Run MLX E2E tests" step) and earlier verification steps
("Verify imports") use the cached directory. This will reduce network fetches
and CI timeout variance when the MLX E2E test downloads HF models.

In `@e2e_test/mlx/test_mlx_backend.py`:
- Around line 131-155: The test test_reasoning_thinking_mode is too lax because
it accepts the digit "3" from either msg.content or msg.reasoning_content;
instead explicitly assert that msg.reasoning_content is present and contains the
reasoning/steps and the expected answer. Update the test to (1) read reasoning =
getattr(msg, "reasoning_content", None) and assert reasoning is not None and not
empty, and (2) assert "3" (or the expected final answer) appears in reasoning
(in addition to any existing check on content/full if you want), so failures
surface if reasoning extraction breaks.

---

Duplicate comments:
In @.github/workflows/pr-test-mlx.yml:
- Around line 6-16: The workflow's path filters miss changes to
e2e_test/infra/__init__.py so update the workflow triggers in
.github/workflows/pr-test-mlx.yml: in both trigger blocks (push and
pull_request) include either the specific file "e2e_test/infra/__init__.py" or
broaden the rule to "e2e_test/infra/**" so modifications to the infra
initializer cannot bypass the MLX test workflow.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9a96857c-82e0-4a99-a0c2-c6c0c3cc8bfd

📥 Commits

Reviewing files that changed from the base of the PR and between 18fc0ca and 57a65e7.

📒 Files selected for processing (3)

.github/workflows/pr-test-mlx.yml
e2e_test/mlx/__init__.py
e2e_test/mlx/test_mlx_backend.py

@CatherineSue

Addresses @CatherineSue's review on PR #1398: e2e_test/chat_completions/ is organized by API surface (basic chat, function-calling, multimodal, structured-output, validation, …) and each file parameterizes over engines via @pytest.mark.engine("sglang", "vllm", "trtllm"). Dropping test_mlx_backend.py in there was orthogonal to that structure — the file is scoped to one backend that bundles several API behaviors, not one API behavior across engines. Move it to a sibling concern-scoped directory `e2e_test/mlx/`, mirroring the existing precedents `e2e_test/router/` and `e2e_test/bindings_go/` which are also top-level dirs scoped by concern rather than API surface. The MLX tests can't fold into the chat_completions/ engine parameterization without significant infra surgery anyway: they require Apple Silicon macOS, a different model (mlx-community/Qwen3-0.6B-4bit vs meta-llama/Llama-3.1-8B-Instruct), and a different runner. The CI is already separated via .github/workflows/pr-test-mlx.yml on macos-latest, so the test file's location should match its scope. Changes: - git mv e2e_test/chat_completions/test_mlx_backend.py e2e_test/mlx/test_mlx_backend.py - Add e2e_test/mlx/__init__.py - Update .github/workflows/pr-test-mlx.yml path filters and the pytest invocation to point at the new location - Module docstring updated with the rationale for living outside the API-surface dirs and the corrected `pytest e2e_test/mlx/...` command No code or test logic changes — only file location and path references. Signed-off-by: key4ng <rukeyang@gmail.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

e2e_test/infra/constants.py (1)

55-70: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update runtime docs/comments to include mlx.

After adding MLX runtime support, these docs still enumerate only older values, which is misleading for contributors and fixture users.

Suggested doc fix

-ENV_RUNTIME = "E2E_RUNTIME"  # Runtime for gRPC tests: "sglang", "vllm", or "trtllm"
+ENV_RUNTIME = "E2E_RUNTIME"  # Runtime for gRPC tests: "sglang", "vllm", "trtllm", or "mlx"

 def get_runtime() -> str:
-    """Get the current test runtime (sglang or vllm).
+    """Get the current test runtime.
@@
-        Runtime name from E2E_RUNTIME environment variable, defaults to "sglang".
+        Runtime name from E2E_RUNTIME environment variable, defaults to "sglang".
     """

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e_test/infra/constants.py` around lines 55 - 70, The comment/docs around
runtime detection only list "sglang" and "vllm" but MLX was added; update the
strings and docstrings to include "mlx" everywhere: amend the inline comment on
ENV_RUNTIME (currently "sglang", "vllm", or "trtllm") to mention "mlx", and
update the get_runtime() docstring return description to list "sglang", "vllm",
and "mlx" (and any other supported values) so the constant ENV_RUNTIME and
function get_runtime() accurately reflect the new runtime option.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@e2e_test/infra/constants.py`:
- Around line 55-70: The comment/docs around runtime detection only list
"sglang" and "vllm" but MLX was added; update the strings and docstrings to
include "mlx" everywhere: amend the inline comment on ENV_RUNTIME (currently
"sglang", "vllm", or "trtllm") to mention "mlx", and update the get_runtime()
docstring return description to list "sglang", "vllm", and "mlx" (and any other
supported values) so the constant ENV_RUNTIME and function get_runtime()
accurately reflect the new runtime option.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 690018c3-8e54-418e-b50a-9060fe609ed0

📥 Commits

Reviewing files that changed from the base of the PR and between 57a65e7 and f7a6a29.

📒 Files selected for processing (7)

.github/workflows/pr-test-mlx.yml
e2e_test/infra/__init__.py
e2e_test/infra/constants.py
e2e_test/infra/model_specs.py
e2e_test/infra/worker.py
e2e_test/mlx/__init__.py
e2e_test/mlx/test_mlx_backend.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f7a6a2907b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T03:22:44Z

+      - "e2e_test/mlx/test_mlx_backend.py"
+      - "e2e_test/infra/model_specs.py"
+      - "e2e_test/infra/worker.py"
+      - "e2e_test/infra/constants.py"


Include MLX fixture files in path filters

The MLX test module is run through e2e_test/conftest.py and the setup_backend/api_client fixtures in e2e_test/fixtures/setup_backend.py, but this new workflow only watches the test file plus a few infra modules. A PR that changes those shared fixtures can break MLX startup/client wiring without triggering this macOS workflow; I checked the existing GPU workflow filters and they already treat e2e_test/conftest.py and e2e_test/fixtures/** as common E2E inputs, but this new workflow does not.

Useful? React with 👍 / 👎.

Adds the full E2E test surface for the MLX gRPC servicer that landed in #1099. Split off from #1099 to keep that review focused on the servicer itself. What's new ---------- - e2e_test/chat_completions/test_mlx_backend.py — 5 tests covering basic chat, streaming, tool calling (Qwen3 native <tool_call>), reasoning_content (Qwen3 thinking mode), and max_tokens finish reason. Module skips on non-Apple-Silicon hosts. - .github/workflows/pr-test-mlx.yml — runs the test suite on macos-latest (Apple Silicon) with HF model + Cargo target caching. Triggers only on MLX-related path changes. Plumbed through existing infra ------------------------------ - e2e_test/infra/constants.py: Runtime.MLX, is_mlx() helper, RUNTIME_LABELS entry, MLX added to LOCAL_RUNTIMES. - e2e_test/infra/__init__.py: re-exports is_mlx. - e2e_test/infra/model_specs.py: mlx-community/Qwen3-0.6B-4bit spec (~400 MB; smallest model that has both tool calling and thinking mode for combined coverage). - e2e_test/infra/worker.py: _build_mlx_cmd() — gRPC-only (rejects HTTP since the MLX servicer doesn't expose one). Test plan --------- Verified locally on Apple Silicon: 5/5 pass against the MLX servicer already on main. Signed-off-by: key4ng <rukeyang@gmail.com>

Signed-off-by: key4ng <rukeyang@gmail.com>

Qwen3-0.6B-4bit is ~400 MB and downloads in ~30s on the GitHub runner's network — not worth a separate cache entry against the shared 10 GB repo cache budget. The Rust cache is the dominant slow path; HF download is in the noise. Signed-off-by: key4ng <rukeyang@gmail.com>

@CatherineSue

Addresses @CatherineSue's review on PR #1398: e2e_test/chat_completions/ is organized by API surface (basic chat, function-calling, multimodal, structured-output, validation, …) and each file parameterizes over engines via @pytest.mark.engine("sglang", "vllm", "trtllm"). Dropping test_mlx_backend.py in there was orthogonal to that structure — the file is scoped to one backend that bundles several API behaviors, not one API behavior across engines. Move it to a sibling concern-scoped directory `e2e_test/mlx/`, mirroring the existing precedents `e2e_test/router/` and `e2e_test/bindings_go/` which are also top-level dirs scoped by concern rather than API surface. The MLX tests can't fold into the chat_completions/ engine parameterization without significant infra surgery anyway: they require Apple Silicon macOS, a different model (mlx-community/Qwen3-0.6B-4bit vs meta-llama/Llama-3.1-8B-Instruct), and a different runner. The CI is already separated via .github/workflows/pr-test-mlx.yml on macos-latest, so the test file's location should match its scope. Changes: - git mv e2e_test/chat_completions/test_mlx_backend.py e2e_test/mlx/test_mlx_backend.py - Add e2e_test/mlx/__init__.py - Update .github/workflows/pr-test-mlx.yml path filters and the pytest invocation to point at the new location - Module docstring updated with the rationale for living outside the API-surface dirs and the corrected `pytest e2e_test/mlx/...` command No code or test logic changes — only file location and path references. Signed-off-by: key4ng <rukeyang@gmail.com>

Compress the rationale from 22 lines to 9. Keeps the essential signal (why this lives in e2e_test/mlx/ rather than chat_completions/, the local run command, and the CI workflow pointer) and drops the prose. Signed-off-by: key4ng <rukeyang@gmail.com>

…infra/__init__.py path-filter gap * Wire `--reasoning-parser qwen3` into the MLX test gateway. Without it, the `<think>...</think>` block lands in content, so any assertion on reasoning_content was vacuously satisfied. * Make `test_reasoning_thinking_mode` actually test what its docstring claims: pass `enable_thinking=True` explicitly and assert `reasoning_content` is non-empty in addition to the existing answer-correctness check ("3" in content+reasoning). * Add `e2e_test/infra/__init__.py` to push/pull_request paths in pr-test-mlx.yml. This PR already modifies that file (adds is_mlx) and conftest.py imports from infra at startup, so future __init__.py-only edits could break MLX import resolution without triggering this workflow. Signed-off-by: key4ng <rukeyang@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d84ae03c62

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-14T05:02:36Z

+      - "crates/grpc_client/proto/mlx_engine.proto"
+      - "crates/grpc_client/src/mlx_engine.rs"
+      - "crates/grpc_client/python/**"
+      - "grpc_servicer/smg_grpc_servicer/mlx/**"


Include router-side MLX paths in workflow trigger

When a PR only changes the MLX branches in the router, e.g. model_gateway/src/routers/grpc/client.rs or proto_wrapper.rs, this workflow will not run because the pull_request.paths list here only covers the proto/client package, servicer, and E2E infra. I checked the regular PR GPU workflow and its reusable E2E matrix is for sglang, vllm, and trtllm, so those MLX router paths are not exercised there either; an MLX-specific routing regression can therefore merge without the new Apple Silicon E2E job.

Useful? React with 👍 / 👎.

key4ng requested review from CatherineSue, XinyueZhang369 and slin1237 as code owners April 27, 2026 03:08

github-actions Bot added ci CI/CD configuration changes tests Test changes labels Apr 27, 2026

claude Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread e2e_test/infra/model_specs.py

claude Bot approved these changes Apr 27, 2026

View reviewed changes

coderabbitai Bot requested changes Apr 27, 2026

View reviewed changes

Comment thread .github/workflows/pr-test-mlx.yml

Comment thread e2e_test/chat_completions/test_mlx_backend.py Outdated

Comment thread e2e_test/infra/worker.py

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

CatherineSue reviewed Apr 29, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 30, 2026

View reviewed changes

coderabbitai Bot requested changes Apr 30, 2026

View reviewed changes

Comment thread .github/workflows/pr-test-mlx.yml

Comment thread e2e_test/mlx/test_mlx_backend.py

key4ng force-pushed the keyang/mlx-e2e-tests branch from 57a65e7 to f7a6a29 Compare May 8, 2026 03:16

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

key4ng added 6 commits May 13, 2026 21:59

style(mlx): apply ruff format from pre-commit

7a7fbf4

Signed-off-by: key4ng <rukeyang@gmail.com>

key4ng force-pushed the keyang/mlx-e2e-tests branch from 7701a1b to d84ae03 Compare May 14, 2026 04:59

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

		@@ -0,0 +1,165 @@
		"""MLX backend E2E tests (Apple Silicon only).

Conversation

key4ng commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Tests (e2e_test/chat_completions/test_mlx_backend.py)

Test infra plumbing (e2e_test/infra/)

CI (.github/workflows/pr-test-mlx.yml)

Test Plan

Summary by CodeRabbit

Uh oh!

gemini-code-assist Bot commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

claude Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

key4ng May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

CatherineSue Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

key4ng Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

key4ng May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

key4ng commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

Tests (`e2e_test/chat_completions/test_mlx_backend.py`)

Test infra plumbing (`e2e_test/infra/`)

CI (`.github/workflows/pr-test-mlx.yml`)

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading