bump mlx-lm to 0.31.3 and mlx-vlm to latest main#675
Open
Chedrian07 wants to merge 1 commit intojundot:mainfrom
Open
bump mlx-lm to 0.31.3 and mlx-vlm to latest main#675Chedrian07 wants to merge 1 commit intojundot:mainfrom
Chedrian07 wants to merge 1 commit intojundot:mainfrom
Conversation
mlx-lm: dcbf6e3 -> d9c63ff (v0.31.3 patch bump, #1124) mlx-vlm: 23e1dff -> 3472132 - Fix Gemma 4 tool parser to accept hyphenated function names (#963) - Fix Gemma 4 audio: mel preprocessing, weight loading, feature extractor (#931) - Fix race condition in TurboQuant fused fast-quantize kernels (#967) - Fix Gemma 4 quantized per-layer projection loading (#935) - Snapshot cache.offset to prevent alias mutation under batched caches (#966) Both ranges contain only bug fixes and the patch version bump; no API or signature changes. Updates pyproject.toml dependencies, the uv override-dependencies, and packaging/venvstacks.toml in lockstep.
|
I'm encountering issues with thread-local vs. thread-global behaviour from [email protected] on with Gemma 4. Is this PR well-tested? Do we have e2e integration tests? I'm not going into details here because I'm working on custom mlx-lm and mlx forks. Having a hard time to pinpoint the exact root cause, but mlx 0.31.2 seems to introduce changes to thread locality and oMLX isn't prepared? Ref tracking: ml-explore/mlx#3078 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
mlx-lmpin fromdcbf6e3tod9c63ff(v0.31.3 patch bump, #1124)mlx-vlmpin from23e1dffto3472132(5 upstream fixes, non-breaking)pyproject.tomldependencies,[tool.uv] override-dependencies, andpackaging/venvstacks.tomlin lockstepmlx-vlm changes pulled in
cache.offsetto prevent alias mutation under batched caches (fix(gemma4): snapshot cache.offset to prevent alias mutation under batched caches Blaizzy/mlx-vlm#966)mlx-lm changes pulled in
0.31.2→0.31.3patch bump only (Bump the patch version ml-explore/mlx-lm#1124)Both compare ranges (
dcbf6e3..d9c63ffand23e1dff..3472132) contain only bug fixes and a patch version bump. No public API, signature, or required-argument changes — so existing oMLX adapters (VLMModelAdapter,BatchedEngine, scheduler integration withBatchGenerator, Gemma 4 tool-call paths) should be fully compatible.Rationale for picking HEAD of both
mainbranches:mlx-vlm@3472132includes the Gemma 4 hyphenated-tool-name fix, which improves OpenAI-spec compliance for oMLX's Gemma 4 tool-calling path.mlx-vlm@3472132also fixes a TurboQuant kernel race that can affect quantized VLMs under oMLX continuous batching.mlx-lm@d9c63ffis just a patch-version bump; staying in lockstep keeps the override free of surprises.Test plan
pip install -e .resolves against the new git pins in a clean venvpytest -m "not slow"passes--max-concurrent-requests 8→ no TurboQuant racepackaging/build.py --skip-venvstill builds the app bundle without resolver errors