Skip to content

test(mlx): add MLX backend E2E tests + macos-latest CI workflow#1398

Open
key4ng wants to merge 6 commits into
mainfrom
keyang/mlx-e2e-tests
Open

test(mlx): add MLX backend E2E tests + macos-latest CI workflow#1398
key4ng wants to merge 6 commits into
mainfrom
keyang/mlx-e2e-tests

Conversation

@key4ng
Copy link
Copy Markdown
Collaborator

@key4ng key4ng commented Apr 27, 2026

Description

Problem

#1099 added the MLX gRPC servicer (grpc_servicer/smg_grpc_servicer/mlx/). That PR was deliberately split — servicer-only — to keep the review focused. This PR follows up with the E2E test suite and CI workflow.

Solution

Five Apple-Silicon-only E2E tests exercising the SMG router → gRPC → MLX servicer pipeline, plus a macos-latest GitHub Actions workflow that runs them on PR.

Changes

Tests (e2e_test/chat_completions/test_mlx_backend.py)

Test What it covers
test_basic_chat Non-streaming chat completion through the full pipeline
test_streaming_chat Streaming yields incremental delta chunks
test_tool_calling Qwen3 native <tool_call> tags + SMG's qwen ToolParser
test_reasoning_thinking_mode Qwen3 thinking mode populates reasoning_content
test_max_tokens_finish_reason max_tokens reached → finish_reason="length"

Module-level skip on non-Apple-Silicon (sys.platform != "darwin" or platform.machine() != "arm64").

Test infra plumbing (e2e_test/infra/)

  • constants.pyRuntime.MLX, is_mlx() helper, RUNTIME_LABELS["mlx"], MLX added to LOCAL_RUNTIMES.
  • __init__.py — re-exports is_mlx.
  • model_specs.py — entry for mlx-community/Qwen3-0.6B-4bit (~400 MB; smallest model that combines native tool calling + thinking mode).
  • worker.py_build_mlx_cmd(). gRPC-only (rejects HTTP — the MLX servicer doesn't expose an HTTP API).

CI (.github/workflows/pr-test-mlx.yml)

Runs on macos-latest (Apple Silicon) GitHub-hosted runners.

  • Triggers only on MLX-related path changes
  • Caches Rust target (Swatinem/rust-cache) and HF model bundle (actions/cache)
  • Uses maturin build --profile ci for SMG Python bindings (faster than release, fast enough for correctness tests)
  • Estimated: first run ~12-15 min, cached runs ~4-5 min

Test Plan

# Local on Apple Silicon
E2E_RUNTIME=mlx pytest e2e_test/chat_completions/test_mlx_backend.py -v

5/5 passes locally against the MLX servicer already on main.

Checklist
  • cargo +nightly fmt passes (no Rust changes)
  • cargo clippy --all-targets --all-features -- -D warnings passes (no Rust changes)
  • E2E test verified locally
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • New Features

    • Added MLX as a supported inference runtime and included an MLX-only model in the test matrix.
  • Tests

    • Added macOS Apple Silicon end-to-end tests for the MLX gRPC backend covering chat, streaming, function/tool calling, reasoning/thinking mode, and token-limit behavior.
  • Chores

    • Added a dedicated CI workflow to run the MLX E2E tests on macOS runners with artifact capture on failure.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added ci CI/CD configuration changes tests Test changes labels Apr 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces MLX (a local inference runtime) support to the end-to-end testing infrastructure by adding a dedicated CI workflow for macOS, extending runtime constants and predicates, updating model specifications with a Qwen3 model, wiring MLX gRPC command generation in the worker, and implementing comprehensive E2E tests validating chat, streaming, tool calling, reasoning, and token truncation behavior.

Changes

MLX E2E Addition

Layer / File(s) Summary
Runtime constants & exports
e2e_test/infra/constants.py, e2e_test/infra/__init__.py
Add Runtime.MLX, include it in LOCAL_RUNTIMES, add is_mlx() predicate, expand RUNTIME_LABELS, and re-export is_mlx via e2e_test.infra.
Model specs
e2e_test/infra/model_specs.py
Register mlx-community/Qwen3-0.6B-4bit model spec with resolved model path, tp=1, and features: chat, streaming, function_calling, reasoning, thinking.
Worker & Command Building
e2e_test/infra/worker.py
Add engine=="mlx" branch and _build_mlx_cmd enforcing GRPC-only mode, building smg_grpc_servicer.mlx.server invocation and applying optional spec["mlx_args"].
MLX E2E Test Suite
e2e_test/mlx/test_mlx_backend.py
Add Apple Silicon–gated tests with WEATHER_TOOL and TestMlxBackend covering non-streaming chat, streaming, tool calling (JSON args), thinking-mode, and max-token truncation checks.
CI Workflow
.github/workflows/pr-test-mlx.yml
New GitHub Actions workflow to run MLX E2E on macos-latest: sets up Python 3.12 and Rust, installs build tools, builds/install SMG bindings, generates Python types, runs import checks and targeted pytest, and uploads e2e-logs/ on failure/cancel with concurrency control.

Sequence Diagram

sequenceDiagram
    participant Test as Test Suite
    participant Worker as Worker
    participant Builder as Command Builder
    participant gRPC as gRPC Server
    participant MLX as MLX Engine

    Test->>Worker: initialize(engine="mlx", model_path, grpc)
    Worker->>Builder: _build_cmd("mlx", mode=GRPC)
    Builder->>Builder: _build_mlx_cmd(model_path, host, port)
    Builder-->>Worker: command ready
    Worker->>gRPC: spawn server process
    gRPC->>MLX: load model(Qwen3-0.6B-4bit)
    MLX-->>gRPC: model loaded
    gRPC-->>Worker: server listening
    Test->>gRPC: send chat request
    gRPC->>MLX: process request
    MLX-->>gRPC: return response
    gRPC-->>Test: return chat/stream response
    Test->>Test: validate output (tokens, finish_reason)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • slin1237
  • XinyueZhang369
  • CatherineSue

Poem

🐰 A tiny rabbit hops to see,
MLX tests on macOS tree,
gRPC whispers, models stride,
Tools and thinking verified,
Logs saved, CI runs with glee.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding MLX backend E2E tests and a macos-latest CI workflow, which directly corresponds to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch keyang/mlx-e2e-tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread e2e_test/infra/worker.py
elif self.engine == "trtllm":
return self._build_trtllm_cmd(model_path, tp_size, spec)
elif self.engine == "mlx":
return self._build_mlx_cmd(model_path, spec)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: _build_env (line 328) unconditionally sets CUDA_VISIBLE_DEVICES for every engine, including MLX which runs on Apple Metal — not CUDA. It's harmless on macOS (the var is simply ignored), but if you want to keep things tidy you could skip it for MLX:

if self.engine != "mlx":
    env["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, self.gpu_ids))

Not blocking — just a readability thing for the next person who reads _build_env.

Comment thread e2e_test/infra/model_specs.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pr-test-mlx.yml:
- Around line 6-16: The workflow's paths filter in pr-test-mlx.yml is missing
the e2e_test/infra/__init__.py file which allows changes there to skip the MLX
CI; update the "paths" array in that workflow to include
e2e_test/infra/__init__.py (or better, add the broader pattern
e2e_test/infra/**) so changes to the package initializer don't bypass the job.

In `@e2e_test/chat_completions/test_mlx_backend.py`:
- Around line 9-10: Update the runner wording in the comment that currently
reads "macos-14" to "macos-latest" so it matches the workflow configuration;
locate the string "macos-14" in test_mlx_backend.py (the comment referencing CI
runner) and replace it with "macos-latest" and optionally mention the workflow
file ".github/workflows/pr-test-mlx.yml" for clarity.

In `@e2e_test/infra/worker.py`:
- Around line 181-183: Update any docstrings and inline comments that enumerate
supported engine values to include the new "mlx" option: specifically update the
Worker.engine comment/annotation and the start_workers function argument docs to
mention "mlx" alongside existing engines (e.g., "local", "remote", etc.); search
for other user-facing comments or README sections that list engine options and
add "mlx" there so the documentation stays consistent with the new branch that
calls _build_mlx_cmd.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ec056812-4d40-44e7-9b43-d1f26cde1499

📥 Commits

Reviewing files that changed from the base of the PR and between e4d4452 and 987e121.

📒 Files selected for processing (6)
  • .github/workflows/pr-test-mlx.yml
  • e2e_test/chat_completions/test_mlx_backend.py
  • e2e_test/infra/__init__.py
  • e2e_test/infra/constants.py
  • e2e_test/infra/model_specs.py
  • e2e_test/infra/worker.py

Comment thread .github/workflows/pr-test-mlx.yml
Comment thread e2e_test/chat_completions/test_mlx_backend.py Outdated
Comment thread e2e_test/infra/worker.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 987e121b39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +13 to +15
- "e2e_test/infra/model_specs.py"
- "e2e_test/infra/worker.py"
- "e2e_test/infra/constants.py"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add infra init to MLX workflow path filters

The new workflow watches several MLX-related E2E infra files but omits e2e_test/infra/__init__.py, even though this commit changes that module and e2e_test/conftest.py imports from infra at startup. A future PR that only updates infra/__init__.py can break MLX test startup/import resolution without triggering this workflow, so regressions can merge untested.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
e2e_test/chat_completions/test_mlx_backend.py (1)

9-10: ⚠️ Potential issue | 🟡 Minor

Align CI runner wording with workflow configuration.

Line 9 still says macos-14, while this PR’s workflow uses macos-latest. Please keep them consistent.

Suggested patch
-CI runs on macos-14 (Apple Silicon) with the smallest Qwen3 model (~400 MB).
+CI runs on macos-latest (Apple Silicon) with the smallest Qwen3 model (~400 MB).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e_test/chat_completions/test_mlx_backend.py` around lines 9 - 10, Update
the CI runner wording in the test file to match the workflow: replace the string
"macos-14" with "macos-latest" in e2e_test/chat_completions/test_mlx_backend.py
(the comment at the top referencing the CI runner) so the file text aligns with
.github/workflows/pr-test-mlx.yml.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@e2e_test/chat_completions/test_mlx_backend.py`:
- Around line 9-10: Update the CI runner wording in the test file to match the
workflow: replace the string "macos-14" with "macos-latest" in
e2e_test/chat_completions/test_mlx_backend.py (the comment at the top
referencing the CI runner) so the file text aligns with
.github/workflows/pr-test-mlx.yml.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 84db8469-5a0c-490c-9df3-e7dd9eec889c

📥 Commits

Reviewing files that changed from the base of the PR and between 987e121 and 54db5f4.

📒 Files selected for processing (1)
  • e2e_test/chat_completions/test_mlx_backend.py

@@ -0,0 +1,165 @@
"""MLX backend E2E tests (Apple Silicon only).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the file location here. Our chat completions is pretty self contained to all backends at the moment. Introducing a new file here is kind orthogonal to the current structure.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — moved in 1fca167. The file now lives at e2e_test/mlx/test_mlx_backend.py, mirroring the existing concern-scoped sibling dirs e2e_test/router/ and e2e_test/bindings_go/. Folding into the existing chat_completions/ engine parameterization would have required Apple-Silicon-specific infra surgery (different model, different runner, no GPU) that doesn't fit the @pytest.mark.engine("sglang", "vllm", "trtllm") shape, so a backend-scoped sibling dir is the right structural fix. The CI workflow path filters and the pytest invocation in .github/workflows/pr-test-mlx.yml were updated to the new path; module docstring carries the rationale for future readers.

key4ng added a commit that referenced this pull request Apr 30, 2026
Addresses @CatherineSue's review on PR #1398: e2e_test/chat_completions/
is organized by API surface (basic chat, function-calling, multimodal,
structured-output, validation, …) and each file parameterizes over
engines via @pytest.mark.engine("sglang", "vllm", "trtllm"). Dropping
test_mlx_backend.py in there was orthogonal to that structure — the
file is scoped to one backend that bundles several API behaviors, not
one API behavior across engines.

Move it to a sibling concern-scoped directory `e2e_test/mlx/`, mirroring
the existing precedents `e2e_test/router/` and `e2e_test/bindings_go/`
which are also top-level dirs scoped by concern rather than API surface.

The MLX tests can't fold into the chat_completions/ engine
parameterization without significant infra surgery anyway: they require
Apple Silicon macOS, a different model (mlx-community/Qwen3-0.6B-4bit
vs meta-llama/Llama-3.1-8B-Instruct), and a different runner. The CI
is already separated via .github/workflows/pr-test-mlx.yml on
macos-latest, so the test file's location should match its scope.

Changes:
 - git mv e2e_test/chat_completions/test_mlx_backend.py
       e2e_test/mlx/test_mlx_backend.py
 - Add e2e_test/mlx/__init__.py
 - Update .github/workflows/pr-test-mlx.yml path filters and the
   pytest invocation to point at the new location
 - Module docstring updated with the rationale for living outside
   the API-surface dirs and the corrected `pytest e2e_test/mlx/...`
   command

No code or test logic changes — only file location and path references.

Signed-off-by: key4ng <rukeyang@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57a65e791c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread e2e_test/mlx/test_mlx_backend.py Outdated
Comment on lines +150 to +153
content = msg.content or ""
reasoning = getattr(msg, "reasoning_content", None) or ""
full = content + reasoning
assert "3" in full, (
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Assert reasoning output in thinking-mode coverage

This test is intended to verify that MLX thinking mode populates reasoning_content, but the current assertion only checks whether '3' appears in content + reasoning. If the reasoning parser/regression causes reasoning_content to be empty (or never emitted), the test can still pass as long as the final answer includes 3, so CI would miss exactly the failure this case is supposed to catch.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
.github/workflows/pr-test-mlx.yml (1)

6-16: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Path filters still miss MLX infra initializer changes.

e2e_test/infra/__init__.py changes can still bypass this workflow. Please include it (or broaden to e2e_test/infra/**) in both trigger blocks.

Suggested trigger fix
   push:
     branches: [main]
     paths:
+      - "e2e_test/infra/__init__.py"
       - "crates/grpc_client/proto/mlx_engine.proto"
       - "crates/grpc_client/src/mlx_engine.rs"
       - "crates/grpc_client/python/**"
       - "grpc_servicer/smg_grpc_servicer/mlx/**"
       - "grpc_servicer/pyproject.toml"
       - "e2e_test/mlx/test_mlx_backend.py"
       - "e2e_test/infra/model_specs.py"
       - "e2e_test/infra/worker.py"
       - "e2e_test/infra/constants.py"
       - ".github/workflows/pr-test-mlx.yml"
   pull_request:
     branches: [main]
     types: [opened, synchronize, reopened]
     paths:
+      - "e2e_test/infra/__init__.py"
       - "crates/grpc_client/proto/mlx_engine.proto"
       - "crates/grpc_client/src/mlx_engine.rs"
       - "crates/grpc_client/python/**"
       - "grpc_servicer/smg_grpc_servicer/mlx/**"
       - "grpc_servicer/pyproject.toml"
       - "e2e_test/mlx/test_mlx_backend.py"
       - "e2e_test/infra/model_specs.py"
       - "e2e_test/infra/worker.py"
       - "e2e_test/infra/constants.py"
       - ".github/workflows/pr-test-mlx.yml"

Also applies to: 20-30

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pr-test-mlx.yml around lines 6 - 16, The workflow's path
filters miss changes to e2e_test/infra/__init__.py so update the workflow
triggers in .github/workflows/pr-test-mlx.yml: in both trigger blocks (push and
pull_request) include either the specific file "e2e_test/infra/__init__.py" or
broaden the rule to "e2e_test/infra/**" so modifications to the infra
initializer cannot bypass the MLX test workflow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pr-test-mlx.yml:
- Around line 66-119: Add a cache step for the Hugging Face model cache
directory before the "Run MLX E2E tests" step: create a step named like "Cache
Hugging Face models" using actions/cache@v4 that caches ~/.cache/huggingface
(and subdirs like hub and transformers), with a stable key (e.g. "${{ runner.os
}}-hf-${{ hashFiles('**/requirements*.txt') }}-${{ github.ref }}" ) and sensible
restore-keys; ensure the job sets HF_HOME or exports the same cache path so the
MLX test run (the "Run MLX E2E tests" step) and earlier verification steps
("Verify imports") use the cached directory. This will reduce network fetches
and CI timeout variance when the MLX E2E test downloads HF models.

In `@e2e_test/mlx/test_mlx_backend.py`:
- Around line 131-155: The test test_reasoning_thinking_mode is too lax because
it accepts the digit "3" from either msg.content or msg.reasoning_content;
instead explicitly assert that msg.reasoning_content is present and contains the
reasoning/steps and the expected answer. Update the test to (1) read reasoning =
getattr(msg, "reasoning_content", None) and assert reasoning is not None and not
empty, and (2) assert "3" (or the expected final answer) appears in reasoning
(in addition to any existing check on content/full if you want), so failures
surface if reasoning extraction breaks.

---

Duplicate comments:
In @.github/workflows/pr-test-mlx.yml:
- Around line 6-16: The workflow's path filters miss changes to
e2e_test/infra/__init__.py so update the workflow triggers in
.github/workflows/pr-test-mlx.yml: in both trigger blocks (push and
pull_request) include either the specific file "e2e_test/infra/__init__.py" or
broaden the rule to "e2e_test/infra/**" so modifications to the infra
initializer cannot bypass the MLX test workflow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9a96857c-82e0-4a99-a0c2-c6c0c3cc8bfd

📥 Commits

Reviewing files that changed from the base of the PR and between 18fc0ca and 57a65e7.

📒 Files selected for processing (3)
  • .github/workflows/pr-test-mlx.yml
  • e2e_test/mlx/__init__.py
  • e2e_test/mlx/test_mlx_backend.py

Comment thread .github/workflows/pr-test-mlx.yml
Comment thread e2e_test/mlx/test_mlx_backend.py
key4ng added a commit that referenced this pull request May 8, 2026
Addresses @CatherineSue's review on PR #1398: e2e_test/chat_completions/
is organized by API surface (basic chat, function-calling, multimodal,
structured-output, validation, …) and each file parameterizes over
engines via @pytest.mark.engine("sglang", "vllm", "trtllm"). Dropping
test_mlx_backend.py in there was orthogonal to that structure — the
file is scoped to one backend that bundles several API behaviors, not
one API behavior across engines.

Move it to a sibling concern-scoped directory `e2e_test/mlx/`, mirroring
the existing precedents `e2e_test/router/` and `e2e_test/bindings_go/`
which are also top-level dirs scoped by concern rather than API surface.

The MLX tests can't fold into the chat_completions/ engine
parameterization without significant infra surgery anyway: they require
Apple Silicon macOS, a different model (mlx-community/Qwen3-0.6B-4bit
vs meta-llama/Llama-3.1-8B-Instruct), and a different runner. The CI
is already separated via .github/workflows/pr-test-mlx.yml on
macos-latest, so the test file's location should match its scope.

Changes:
 - git mv e2e_test/chat_completions/test_mlx_backend.py
       e2e_test/mlx/test_mlx_backend.py
 - Add e2e_test/mlx/__init__.py
 - Update .github/workflows/pr-test-mlx.yml path filters and the
   pytest invocation to point at the new location
 - Module docstring updated with the rationale for living outside
   the API-surface dirs and the corrected `pytest e2e_test/mlx/...`
   command

No code or test logic changes — only file location and path references.

Signed-off-by: key4ng <rukeyang@gmail.com>
@key4ng key4ng force-pushed the keyang/mlx-e2e-tests branch from 57a65e7 to f7a6a29 Compare May 8, 2026 03:16
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
e2e_test/infra/constants.py (1)

55-70: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update runtime docs/comments to include mlx.

After adding MLX runtime support, these docs still enumerate only older values, which is misleading for contributors and fixture users.

Suggested doc fix
-ENV_RUNTIME = "E2E_RUNTIME"  # Runtime for gRPC tests: "sglang", "vllm", or "trtllm"
+ENV_RUNTIME = "E2E_RUNTIME"  # Runtime for gRPC tests: "sglang", "vllm", "trtllm", or "mlx"

 def get_runtime() -> str:
-    """Get the current test runtime (sglang or vllm).
+    """Get the current test runtime.
@@
-        Runtime name from E2E_RUNTIME environment variable, defaults to "sglang".
+        Runtime name from E2E_RUNTIME environment variable, defaults to "sglang".
     """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e_test/infra/constants.py` around lines 55 - 70, The comment/docs around
runtime detection only list "sglang" and "vllm" but MLX was added; update the
strings and docstrings to include "mlx" everywhere: amend the inline comment on
ENV_RUNTIME (currently "sglang", "vllm", or "trtllm") to mention "mlx", and
update the get_runtime() docstring return description to list "sglang", "vllm",
and "mlx" (and any other supported values) so the constant ENV_RUNTIME and
function get_runtime() accurately reflect the new runtime option.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@e2e_test/infra/constants.py`:
- Around line 55-70: The comment/docs around runtime detection only list
"sglang" and "vllm" but MLX was added; update the strings and docstrings to
include "mlx" everywhere: amend the inline comment on ENV_RUNTIME (currently
"sglang", "vllm", or "trtllm") to mention "mlx", and update the get_runtime()
docstring return description to list "sglang", "vllm", and "mlx" (and any other
supported values) so the constant ENV_RUNTIME and function get_runtime()
accurately reflect the new runtime option.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 690018c3-8e54-418e-b50a-9060fe609ed0

📥 Commits

Reviewing files that changed from the base of the PR and between 57a65e7 and f7a6a29.

📒 Files selected for processing (7)
  • .github/workflows/pr-test-mlx.yml
  • e2e_test/infra/__init__.py
  • e2e_test/infra/constants.py
  • e2e_test/infra/model_specs.py
  • e2e_test/infra/worker.py
  • e2e_test/mlx/__init__.py
  • e2e_test/mlx/test_mlx_backend.py

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f7a6a2907b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +26 to +29
- "e2e_test/mlx/test_mlx_backend.py"
- "e2e_test/infra/model_specs.py"
- "e2e_test/infra/worker.py"
- "e2e_test/infra/constants.py"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include MLX fixture files in path filters

The MLX test module is run through e2e_test/conftest.py and the setup_backend/api_client fixtures in e2e_test/fixtures/setup_backend.py, but this new workflow only watches the test file plus a few infra modules. A PR that changes those shared fixtures can break MLX startup/client wiring without triggering this macOS workflow; I checked the existing GPU workflow filters and they already treat e2e_test/conftest.py and e2e_test/fixtures/** as common E2E inputs, but this new workflow does not.

Useful? React with 👍 / 👎.

key4ng added 6 commits May 13, 2026 21:59
Adds the full E2E test surface for the MLX gRPC servicer that landed
in #1099. Split off from #1099 to keep that review focused on the
servicer itself.

What's new
----------
- e2e_test/chat_completions/test_mlx_backend.py — 5 tests covering
  basic chat, streaming, tool calling (Qwen3 native <tool_call>),
  reasoning_content (Qwen3 thinking mode), and max_tokens finish
  reason. Module skips on non-Apple-Silicon hosts.
- .github/workflows/pr-test-mlx.yml — runs the test suite on
  macos-latest (Apple Silicon) with HF model + Cargo target caching.
  Triggers only on MLX-related path changes.

Plumbed through existing infra
------------------------------
- e2e_test/infra/constants.py: Runtime.MLX, is_mlx() helper,
  RUNTIME_LABELS entry, MLX added to LOCAL_RUNTIMES.
- e2e_test/infra/__init__.py: re-exports is_mlx.
- e2e_test/infra/model_specs.py: mlx-community/Qwen3-0.6B-4bit
  spec (~400 MB; smallest model that has both tool calling and
  thinking mode for combined coverage).
- e2e_test/infra/worker.py: _build_mlx_cmd() — gRPC-only
  (rejects HTTP since the MLX servicer doesn't expose one).

Test plan
---------
Verified locally on Apple Silicon: 5/5 pass against the MLX servicer
already on main.

Signed-off-by: key4ng <rukeyang@gmail.com>
Signed-off-by: key4ng <rukeyang@gmail.com>
Qwen3-0.6B-4bit is ~400 MB and downloads in ~30s on the GitHub
runner's network — not worth a separate cache entry against the
shared 10 GB repo cache budget. The Rust cache is the dominant
slow path; HF download is in the noise.

Signed-off-by: key4ng <rukeyang@gmail.com>
Addresses @CatherineSue's review on PR #1398: e2e_test/chat_completions/
is organized by API surface (basic chat, function-calling, multimodal,
structured-output, validation, …) and each file parameterizes over
engines via @pytest.mark.engine("sglang", "vllm", "trtllm"). Dropping
test_mlx_backend.py in there was orthogonal to that structure — the
file is scoped to one backend that bundles several API behaviors, not
one API behavior across engines.

Move it to a sibling concern-scoped directory `e2e_test/mlx/`, mirroring
the existing precedents `e2e_test/router/` and `e2e_test/bindings_go/`
which are also top-level dirs scoped by concern rather than API surface.

The MLX tests can't fold into the chat_completions/ engine
parameterization without significant infra surgery anyway: they require
Apple Silicon macOS, a different model (mlx-community/Qwen3-0.6B-4bit
vs meta-llama/Llama-3.1-8B-Instruct), and a different runner. The CI
is already separated via .github/workflows/pr-test-mlx.yml on
macos-latest, so the test file's location should match its scope.

Changes:
 - git mv e2e_test/chat_completions/test_mlx_backend.py
       e2e_test/mlx/test_mlx_backend.py
 - Add e2e_test/mlx/__init__.py
 - Update .github/workflows/pr-test-mlx.yml path filters and the
   pytest invocation to point at the new location
 - Module docstring updated with the rationale for living outside
   the API-surface dirs and the corrected `pytest e2e_test/mlx/...`
   command

No code or test logic changes — only file location and path references.

Signed-off-by: key4ng <rukeyang@gmail.com>
Compress the rationale from 22 lines to 9. Keeps the essential signal
(why this lives in e2e_test/mlx/ rather than chat_completions/, the
local run command, and the CI workflow pointer) and drops the prose.

Signed-off-by: key4ng <rukeyang@gmail.com>
…infra/__init__.py path-filter gap

* Wire `--reasoning-parser qwen3` into the MLX test gateway. Without
  it, the `<think>...</think>` block lands in content, so any assertion
  on reasoning_content was vacuously satisfied.
* Make `test_reasoning_thinking_mode` actually test what its docstring
  claims: pass `enable_thinking=True` explicitly and assert
  `reasoning_content` is non-empty in addition to the existing
  answer-correctness check ("3" in content+reasoning).
* Add `e2e_test/infra/__init__.py` to push/pull_request paths in
  pr-test-mlx.yml. This PR already modifies that file (adds is_mlx)
  and conftest.py imports from infra at startup, so future
  __init__.py-only edits could break MLX import resolution without
  triggering this workflow.

Signed-off-by: key4ng <rukeyang@gmail.com>
@key4ng key4ng force-pushed the keyang/mlx-e2e-tests branch from 7701a1b to d84ae03 Compare May 14, 2026 04:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d84ae03c62

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +22 to +25
- "crates/grpc_client/proto/mlx_engine.proto"
- "crates/grpc_client/src/mlx_engine.rs"
- "crates/grpc_client/python/**"
- "grpc_servicer/smg_grpc_servicer/mlx/**"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include router-side MLX paths in workflow trigger

When a PR only changes the MLX branches in the router, e.g. model_gateway/src/routers/grpc/client.rs or proto_wrapper.rs, this workflow will not run because the pull_request.paths list here only covers the proto/client package, servicer, and E2E infra. I checked the regular PR GPU workflow and its reusable E2E matrix is for sglang, vllm, and trtllm, so those MLX router paths are not exercised there either; an MLX-specific routing regression can therefore merge without the new Apple Silicon E2E job.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci CI/CD configuration changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants