test(mlx): add MLX backend E2E tests + macos-latest CI workflow#1398
test(mlx): add MLX backend E2E tests + macos-latest CI workflow#1398key4ng wants to merge 6 commits into
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces MLX (a local inference runtime) support to the end-to-end testing infrastructure by adding a dedicated CI workflow for macOS, extending runtime constants and predicates, updating model specifications with a Qwen3 model, wiring MLX gRPC command generation in the worker, and implementing comprehensive E2E tests validating chat, streaming, tool calling, reasoning, and token truncation behavior. ChangesMLX E2E Addition
Sequence DiagramsequenceDiagram
participant Test as Test Suite
participant Worker as Worker
participant Builder as Command Builder
participant gRPC as gRPC Server
participant MLX as MLX Engine
Test->>Worker: initialize(engine="mlx", model_path, grpc)
Worker->>Builder: _build_cmd("mlx", mode=GRPC)
Builder->>Builder: _build_mlx_cmd(model_path, host, port)
Builder-->>Worker: command ready
Worker->>gRPC: spawn server process
gRPC->>MLX: load model(Qwen3-0.6B-4bit)
MLX-->>gRPC: model loaded
gRPC-->>Worker: server listening
Test->>gRPC: send chat request
gRPC->>MLX: process request
MLX-->>gRPC: return response
gRPC-->>Test: return chat/stream response
Test->>Test: validate output (tokens, finish_reason)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
| elif self.engine == "trtllm": | ||
| return self._build_trtllm_cmd(model_path, tp_size, spec) | ||
| elif self.engine == "mlx": | ||
| return self._build_mlx_cmd(model_path, spec) |
There was a problem hiding this comment.
🟡 Nit: _build_env (line 328) unconditionally sets CUDA_VISIBLE_DEVICES for every engine, including MLX which runs on Apple Metal — not CUDA. It's harmless on macOS (the var is simply ignored), but if you want to keep things tidy you could skip it for MLX:
if self.engine != "mlx":
env["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, self.gpu_ids))Not blocking — just a readability thing for the next person who reads _build_env.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/pr-test-mlx.yml:
- Around line 6-16: The workflow's paths filter in pr-test-mlx.yml is missing
the e2e_test/infra/__init__.py file which allows changes there to skip the MLX
CI; update the "paths" array in that workflow to include
e2e_test/infra/__init__.py (or better, add the broader pattern
e2e_test/infra/**) so changes to the package initializer don't bypass the job.
In `@e2e_test/chat_completions/test_mlx_backend.py`:
- Around line 9-10: Update the runner wording in the comment that currently
reads "macos-14" to "macos-latest" so it matches the workflow configuration;
locate the string "macos-14" in test_mlx_backend.py (the comment referencing CI
runner) and replace it with "macos-latest" and optionally mention the workflow
file ".github/workflows/pr-test-mlx.yml" for clarity.
In `@e2e_test/infra/worker.py`:
- Around line 181-183: Update any docstrings and inline comments that enumerate
supported engine values to include the new "mlx" option: specifically update the
Worker.engine comment/annotation and the start_workers function argument docs to
mention "mlx" alongside existing engines (e.g., "local", "remote", etc.); search
for other user-facing comments or README sections that list engine options and
add "mlx" there so the documentation stays consistent with the new branch that
calls _build_mlx_cmd.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ec056812-4d40-44e7-9b43-d1f26cde1499
📒 Files selected for processing (6)
.github/workflows/pr-test-mlx.ymle2e_test/chat_completions/test_mlx_backend.pye2e_test/infra/__init__.pye2e_test/infra/constants.pye2e_test/infra/model_specs.pye2e_test/infra/worker.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 987e121b39
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - "e2e_test/infra/model_specs.py" | ||
| - "e2e_test/infra/worker.py" | ||
| - "e2e_test/infra/constants.py" |
There was a problem hiding this comment.
Add infra init to MLX workflow path filters
The new workflow watches several MLX-related E2E infra files but omits e2e_test/infra/__init__.py, even though this commit changes that module and e2e_test/conftest.py imports from infra at startup. A future PR that only updates infra/__init__.py can break MLX test startup/import resolution without triggering this workflow, so regressions can merge untested.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
e2e_test/chat_completions/test_mlx_backend.py (1)
9-10:⚠️ Potential issue | 🟡 MinorAlign CI runner wording with workflow configuration.
Line 9 still says
macos-14, while this PR’s workflow usesmacos-latest. Please keep them consistent.Suggested patch
-CI runs on macos-14 (Apple Silicon) with the smallest Qwen3 model (~400 MB). +CI runs on macos-latest (Apple Silicon) with the smallest Qwen3 model (~400 MB).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e_test/chat_completions/test_mlx_backend.py` around lines 9 - 10, Update the CI runner wording in the test file to match the workflow: replace the string "macos-14" with "macos-latest" in e2e_test/chat_completions/test_mlx_backend.py (the comment at the top referencing the CI runner) so the file text aligns with .github/workflows/pr-test-mlx.yml.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@e2e_test/chat_completions/test_mlx_backend.py`:
- Around line 9-10: Update the CI runner wording in the test file to match the
workflow: replace the string "macos-14" with "macos-latest" in
e2e_test/chat_completions/test_mlx_backend.py (the comment at the top
referencing the CI runner) so the file text aligns with
.github/workflows/pr-test-mlx.yml.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 84db8469-5a0c-490c-9df3-e7dd9eec889c
📒 Files selected for processing (1)
e2e_test/chat_completions/test_mlx_backend.py
| @@ -0,0 +1,165 @@ | |||
| """MLX backend E2E tests (Apple Silicon only). | |||
There was a problem hiding this comment.
I'm not sure about the file location here. Our chat completions is pretty self contained to all backends at the moment. Introducing a new file here is kind orthogonal to the current structure.
There was a problem hiding this comment.
Good call — moved in 1fca167. The file now lives at e2e_test/mlx/test_mlx_backend.py, mirroring the existing concern-scoped sibling dirs e2e_test/router/ and e2e_test/bindings_go/. Folding into the existing chat_completions/ engine parameterization would have required Apple-Silicon-specific infra surgery (different model, different runner, no GPU) that doesn't fit the @pytest.mark.engine("sglang", "vllm", "trtllm") shape, so a backend-scoped sibling dir is the right structural fix. The CI workflow path filters and the pytest invocation in .github/workflows/pr-test-mlx.yml were updated to the new path; module docstring carries the rationale for future readers.
Addresses @CatherineSue's review on PR #1398: e2e_test/chat_completions/ is organized by API surface (basic chat, function-calling, multimodal, structured-output, validation, …) and each file parameterizes over engines via @pytest.mark.engine("sglang", "vllm", "trtllm"). Dropping test_mlx_backend.py in there was orthogonal to that structure — the file is scoped to one backend that bundles several API behaviors, not one API behavior across engines. Move it to a sibling concern-scoped directory `e2e_test/mlx/`, mirroring the existing precedents `e2e_test/router/` and `e2e_test/bindings_go/` which are also top-level dirs scoped by concern rather than API surface. The MLX tests can't fold into the chat_completions/ engine parameterization without significant infra surgery anyway: they require Apple Silicon macOS, a different model (mlx-community/Qwen3-0.6B-4bit vs meta-llama/Llama-3.1-8B-Instruct), and a different runner. The CI is already separated via .github/workflows/pr-test-mlx.yml on macos-latest, so the test file's location should match its scope. Changes: - git mv e2e_test/chat_completions/test_mlx_backend.py e2e_test/mlx/test_mlx_backend.py - Add e2e_test/mlx/__init__.py - Update .github/workflows/pr-test-mlx.yml path filters and the pytest invocation to point at the new location - Module docstring updated with the rationale for living outside the API-surface dirs and the corrected `pytest e2e_test/mlx/...` command No code or test logic changes — only file location and path references. Signed-off-by: key4ng <rukeyang@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 57a65e791c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| content = msg.content or "" | ||
| reasoning = getattr(msg, "reasoning_content", None) or "" | ||
| full = content + reasoning | ||
| assert "3" in full, ( |
There was a problem hiding this comment.
Assert reasoning output in thinking-mode coverage
This test is intended to verify that MLX thinking mode populates reasoning_content, but the current assertion only checks whether '3' appears in content + reasoning. If the reasoning parser/regression causes reasoning_content to be empty (or never emitted), the test can still pass as long as the final answer includes 3, so CI would miss exactly the failure this case is supposed to catch.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
.github/workflows/pr-test-mlx.yml (1)
6-16:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPath filters still miss MLX infra initializer changes.
e2e_test/infra/__init__.pychanges can still bypass this workflow. Please include it (or broaden toe2e_test/infra/**) in both trigger blocks.Suggested trigger fix
push: branches: [main] paths: + - "e2e_test/infra/__init__.py" - "crates/grpc_client/proto/mlx_engine.proto" - "crates/grpc_client/src/mlx_engine.rs" - "crates/grpc_client/python/**" - "grpc_servicer/smg_grpc_servicer/mlx/**" - "grpc_servicer/pyproject.toml" - "e2e_test/mlx/test_mlx_backend.py" - "e2e_test/infra/model_specs.py" - "e2e_test/infra/worker.py" - "e2e_test/infra/constants.py" - ".github/workflows/pr-test-mlx.yml" pull_request: branches: [main] types: [opened, synchronize, reopened] paths: + - "e2e_test/infra/__init__.py" - "crates/grpc_client/proto/mlx_engine.proto" - "crates/grpc_client/src/mlx_engine.rs" - "crates/grpc_client/python/**" - "grpc_servicer/smg_grpc_servicer/mlx/**" - "grpc_servicer/pyproject.toml" - "e2e_test/mlx/test_mlx_backend.py" - "e2e_test/infra/model_specs.py" - "e2e_test/infra/worker.py" - "e2e_test/infra/constants.py" - ".github/workflows/pr-test-mlx.yml"Also applies to: 20-30
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/pr-test-mlx.yml around lines 6 - 16, The workflow's path filters miss changes to e2e_test/infra/__init__.py so update the workflow triggers in .github/workflows/pr-test-mlx.yml: in both trigger blocks (push and pull_request) include either the specific file "e2e_test/infra/__init__.py" or broaden the rule to "e2e_test/infra/**" so modifications to the infra initializer cannot bypass the MLX test workflow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/pr-test-mlx.yml:
- Around line 66-119: Add a cache step for the Hugging Face model cache
directory before the "Run MLX E2E tests" step: create a step named like "Cache
Hugging Face models" using actions/cache@v4 that caches ~/.cache/huggingface
(and subdirs like hub and transformers), with a stable key (e.g. "${{ runner.os
}}-hf-${{ hashFiles('**/requirements*.txt') }}-${{ github.ref }}" ) and sensible
restore-keys; ensure the job sets HF_HOME or exports the same cache path so the
MLX test run (the "Run MLX E2E tests" step) and earlier verification steps
("Verify imports") use the cached directory. This will reduce network fetches
and CI timeout variance when the MLX E2E test downloads HF models.
In `@e2e_test/mlx/test_mlx_backend.py`:
- Around line 131-155: The test test_reasoning_thinking_mode is too lax because
it accepts the digit "3" from either msg.content or msg.reasoning_content;
instead explicitly assert that msg.reasoning_content is present and contains the
reasoning/steps and the expected answer. Update the test to (1) read reasoning =
getattr(msg, "reasoning_content", None) and assert reasoning is not None and not
empty, and (2) assert "3" (or the expected final answer) appears in reasoning
(in addition to any existing check on content/full if you want), so failures
surface if reasoning extraction breaks.
---
Duplicate comments:
In @.github/workflows/pr-test-mlx.yml:
- Around line 6-16: The workflow's path filters miss changes to
e2e_test/infra/__init__.py so update the workflow triggers in
.github/workflows/pr-test-mlx.yml: in both trigger blocks (push and
pull_request) include either the specific file "e2e_test/infra/__init__.py" or
broaden the rule to "e2e_test/infra/**" so modifications to the infra
initializer cannot bypass the MLX test workflow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9a96857c-82e0-4a99-a0c2-c6c0c3cc8bfd
📒 Files selected for processing (3)
.github/workflows/pr-test-mlx.ymle2e_test/mlx/__init__.pye2e_test/mlx/test_mlx_backend.py
Addresses @CatherineSue's review on PR #1398: e2e_test/chat_completions/ is organized by API surface (basic chat, function-calling, multimodal, structured-output, validation, …) and each file parameterizes over engines via @pytest.mark.engine("sglang", "vllm", "trtllm"). Dropping test_mlx_backend.py in there was orthogonal to that structure — the file is scoped to one backend that bundles several API behaviors, not one API behavior across engines. Move it to a sibling concern-scoped directory `e2e_test/mlx/`, mirroring the existing precedents `e2e_test/router/` and `e2e_test/bindings_go/` which are also top-level dirs scoped by concern rather than API surface. The MLX tests can't fold into the chat_completions/ engine parameterization without significant infra surgery anyway: they require Apple Silicon macOS, a different model (mlx-community/Qwen3-0.6B-4bit vs meta-llama/Llama-3.1-8B-Instruct), and a different runner. The CI is already separated via .github/workflows/pr-test-mlx.yml on macos-latest, so the test file's location should match its scope. Changes: - git mv e2e_test/chat_completions/test_mlx_backend.py e2e_test/mlx/test_mlx_backend.py - Add e2e_test/mlx/__init__.py - Update .github/workflows/pr-test-mlx.yml path filters and the pytest invocation to point at the new location - Module docstring updated with the rationale for living outside the API-surface dirs and the corrected `pytest e2e_test/mlx/...` command No code or test logic changes — only file location and path references. Signed-off-by: key4ng <rukeyang@gmail.com>
57a65e7 to
f7a6a29
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
e2e_test/infra/constants.py (1)
55-70:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate runtime docs/comments to include
mlx.After adding MLX runtime support, these docs still enumerate only older values, which is misleading for contributors and fixture users.
Suggested doc fix
-ENV_RUNTIME = "E2E_RUNTIME" # Runtime for gRPC tests: "sglang", "vllm", or "trtllm" +ENV_RUNTIME = "E2E_RUNTIME" # Runtime for gRPC tests: "sglang", "vllm", "trtllm", or "mlx" def get_runtime() -> str: - """Get the current test runtime (sglang or vllm). + """Get the current test runtime. @@ - Runtime name from E2E_RUNTIME environment variable, defaults to "sglang". + Runtime name from E2E_RUNTIME environment variable, defaults to "sglang". """🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e_test/infra/constants.py` around lines 55 - 70, The comment/docs around runtime detection only list "sglang" and "vllm" but MLX was added; update the strings and docstrings to include "mlx" everywhere: amend the inline comment on ENV_RUNTIME (currently "sglang", "vllm", or "trtllm") to mention "mlx", and update the get_runtime() docstring return description to list "sglang", "vllm", and "mlx" (and any other supported values) so the constant ENV_RUNTIME and function get_runtime() accurately reflect the new runtime option.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@e2e_test/infra/constants.py`:
- Around line 55-70: The comment/docs around runtime detection only list
"sglang" and "vllm" but MLX was added; update the strings and docstrings to
include "mlx" everywhere: amend the inline comment on ENV_RUNTIME (currently
"sglang", "vllm", or "trtllm") to mention "mlx", and update the get_runtime()
docstring return description to list "sglang", "vllm", and "mlx" (and any other
supported values) so the constant ENV_RUNTIME and function get_runtime()
accurately reflect the new runtime option.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 690018c3-8e54-418e-b50a-9060fe609ed0
📒 Files selected for processing (7)
.github/workflows/pr-test-mlx.ymle2e_test/infra/__init__.pye2e_test/infra/constants.pye2e_test/infra/model_specs.pye2e_test/infra/worker.pye2e_test/mlx/__init__.pye2e_test/mlx/test_mlx_backend.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f7a6a2907b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - "e2e_test/mlx/test_mlx_backend.py" | ||
| - "e2e_test/infra/model_specs.py" | ||
| - "e2e_test/infra/worker.py" | ||
| - "e2e_test/infra/constants.py" |
There was a problem hiding this comment.
Include MLX fixture files in path filters
The MLX test module is run through e2e_test/conftest.py and the setup_backend/api_client fixtures in e2e_test/fixtures/setup_backend.py, but this new workflow only watches the test file plus a few infra modules. A PR that changes those shared fixtures can break MLX startup/client wiring without triggering this macOS workflow; I checked the existing GPU workflow filters and they already treat e2e_test/conftest.py and e2e_test/fixtures/** as common E2E inputs, but this new workflow does not.
Useful? React with 👍 / 👎.
Adds the full E2E test surface for the MLX gRPC servicer that landed in #1099. Split off from #1099 to keep that review focused on the servicer itself. What's new ---------- - e2e_test/chat_completions/test_mlx_backend.py — 5 tests covering basic chat, streaming, tool calling (Qwen3 native <tool_call>), reasoning_content (Qwen3 thinking mode), and max_tokens finish reason. Module skips on non-Apple-Silicon hosts. - .github/workflows/pr-test-mlx.yml — runs the test suite on macos-latest (Apple Silicon) with HF model + Cargo target caching. Triggers only on MLX-related path changes. Plumbed through existing infra ------------------------------ - e2e_test/infra/constants.py: Runtime.MLX, is_mlx() helper, RUNTIME_LABELS entry, MLX added to LOCAL_RUNTIMES. - e2e_test/infra/__init__.py: re-exports is_mlx. - e2e_test/infra/model_specs.py: mlx-community/Qwen3-0.6B-4bit spec (~400 MB; smallest model that has both tool calling and thinking mode for combined coverage). - e2e_test/infra/worker.py: _build_mlx_cmd() — gRPC-only (rejects HTTP since the MLX servicer doesn't expose one). Test plan --------- Verified locally on Apple Silicon: 5/5 pass against the MLX servicer already on main. Signed-off-by: key4ng <rukeyang@gmail.com>
Signed-off-by: key4ng <rukeyang@gmail.com>
Qwen3-0.6B-4bit is ~400 MB and downloads in ~30s on the GitHub runner's network — not worth a separate cache entry against the shared 10 GB repo cache budget. The Rust cache is the dominant slow path; HF download is in the noise. Signed-off-by: key4ng <rukeyang@gmail.com>
Addresses @CatherineSue's review on PR #1398: e2e_test/chat_completions/ is organized by API surface (basic chat, function-calling, multimodal, structured-output, validation, …) and each file parameterizes over engines via @pytest.mark.engine("sglang", "vllm", "trtllm"). Dropping test_mlx_backend.py in there was orthogonal to that structure — the file is scoped to one backend that bundles several API behaviors, not one API behavior across engines. Move it to a sibling concern-scoped directory `e2e_test/mlx/`, mirroring the existing precedents `e2e_test/router/` and `e2e_test/bindings_go/` which are also top-level dirs scoped by concern rather than API surface. The MLX tests can't fold into the chat_completions/ engine parameterization without significant infra surgery anyway: they require Apple Silicon macOS, a different model (mlx-community/Qwen3-0.6B-4bit vs meta-llama/Llama-3.1-8B-Instruct), and a different runner. The CI is already separated via .github/workflows/pr-test-mlx.yml on macos-latest, so the test file's location should match its scope. Changes: - git mv e2e_test/chat_completions/test_mlx_backend.py e2e_test/mlx/test_mlx_backend.py - Add e2e_test/mlx/__init__.py - Update .github/workflows/pr-test-mlx.yml path filters and the pytest invocation to point at the new location - Module docstring updated with the rationale for living outside the API-surface dirs and the corrected `pytest e2e_test/mlx/...` command No code or test logic changes — only file location and path references. Signed-off-by: key4ng <rukeyang@gmail.com>
Compress the rationale from 22 lines to 9. Keeps the essential signal (why this lives in e2e_test/mlx/ rather than chat_completions/, the local run command, and the CI workflow pointer) and drops the prose. Signed-off-by: key4ng <rukeyang@gmail.com>
…infra/__init__.py path-filter gap
* Wire `--reasoning-parser qwen3` into the MLX test gateway. Without
it, the `<think>...</think>` block lands in content, so any assertion
on reasoning_content was vacuously satisfied.
* Make `test_reasoning_thinking_mode` actually test what its docstring
claims: pass `enable_thinking=True` explicitly and assert
`reasoning_content` is non-empty in addition to the existing
answer-correctness check ("3" in content+reasoning).
* Add `e2e_test/infra/__init__.py` to push/pull_request paths in
pr-test-mlx.yml. This PR already modifies that file (adds is_mlx)
and conftest.py imports from infra at startup, so future
__init__.py-only edits could break MLX import resolution without
triggering this workflow.
Signed-off-by: key4ng <rukeyang@gmail.com>
7701a1b to
d84ae03
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d84ae03c62
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - "crates/grpc_client/proto/mlx_engine.proto" | ||
| - "crates/grpc_client/src/mlx_engine.rs" | ||
| - "crates/grpc_client/python/**" | ||
| - "grpc_servicer/smg_grpc_servicer/mlx/**" |
There was a problem hiding this comment.
Include router-side MLX paths in workflow trigger
When a PR only changes the MLX branches in the router, e.g. model_gateway/src/routers/grpc/client.rs or proto_wrapper.rs, this workflow will not run because the pull_request.paths list here only covers the proto/client package, servicer, and E2E infra. I checked the regular PR GPU workflow and its reusable E2E matrix is for sglang, vllm, and trtllm, so those MLX router paths are not exercised there either; an MLX-specific routing regression can therefore merge without the new Apple Silicon E2E job.
Useful? React with 👍 / 👎.
Description
Problem
#1099 added the MLX gRPC servicer (
grpc_servicer/smg_grpc_servicer/mlx/). That PR was deliberately split — servicer-only — to keep the review focused. This PR follows up with the E2E test suite and CI workflow.Solution
Five Apple-Silicon-only E2E tests exercising the SMG router → gRPC → MLX servicer pipeline, plus a
macos-latestGitHub Actions workflow that runs them on PR.Changes
Tests (
e2e_test/chat_completions/test_mlx_backend.py)test_basic_chattest_streaming_chattest_tool_calling<tool_call>tags + SMG'sqwenToolParsertest_reasoning_thinking_modereasoning_contenttest_max_tokens_finish_reasonmax_tokensreached →finish_reason="length"Module-level skip on non-Apple-Silicon (
sys.platform != "darwin" or platform.machine() != "arm64").Test infra plumbing (
e2e_test/infra/)constants.py—Runtime.MLX,is_mlx()helper,RUNTIME_LABELS["mlx"], MLX added toLOCAL_RUNTIMES.__init__.py— re-exportsis_mlx.model_specs.py— entry formlx-community/Qwen3-0.6B-4bit(~400 MB; smallest model that combines native tool calling + thinking mode).worker.py—_build_mlx_cmd(). gRPC-only (rejects HTTP — the MLX servicer doesn't expose an HTTP API).CI (
.github/workflows/pr-test-mlx.yml)Runs on
macos-latest(Apple Silicon) GitHub-hosted runners.Swatinem/rust-cache) and HF model bundle (actions/cache)maturin build --profile cifor SMG Python bindings (faster than release, fast enough for correctness tests)Test Plan
# Local on Apple Silicon E2E_RUNTIME=mlx pytest e2e_test/chat_completions/test_mlx_backend.py -v5/5 passes locally against the MLX servicer already on main.
Checklist
cargo +nightly fmtpasses (no Rust changes)cargo clippy --all-targets --all-features -- -D warningspasses (no Rust changes)Summary by CodeRabbit
New Features
Tests
Chores