Summary
JSON validity of VLM structured extraction drops significantly under concurrent requests on oMLX v0.3.4. Even concurrency=2 shows consistent degradation.
Data
Three separate runs at each concurrency level, 20 photos each, Qwen3-VL-30B-A3B-Instruct-4bit, structured JSON extraction prompt, temperature: 0.1:
Concurrency Sweep (unpatched v0.3.4)
| Concurrency |
Run 1 |
Run 2 |
Run 3 |
Avg |
| 1 |
100% (20/20) |
— |
— |
100% |
| 2 |
65% (13/20) |
60% (12/20) |
45% (9/20) |
57% |
| 4 |
100% (20/20) |
— |
— |
100% |
| 8 |
100% (20/20) |
— |
— |
100% |
| 16 |
70% (14/20) |
— |
— |
70% |
Failure Types
conc=2: Output contains corrupted/merged JSON. Example: {"description": "A young girl in a pink, "description": "A y — two separate responses appear concatenated.
conc=16: content: null with completion_tokens: 0 — model fails to generate any content. 5 of 6 failures were this type.
Key Observations
- conc=2 is consistently degraded (45-65% across 3 runs). This is NOT noise.
- conc=4 and conc=8 show 100% — the degradation is NOT monotonic with concurrency, suggesting a specific scheduling/batching edge case at low concurrency.
- conc=16 failures are
content: null — the model produces no output at all.
- The corruption at conc=2 looks like cross-request contamination (merged JSON from different requests).
Correction
We previously reported 100% validity at conc=2 (on our review of PR #648). That measurement may have had different conditions or methodology. The data here is from a clean unpatched v0.3.4 install with 3 repeated runs.
Environment
- oMLX v0.3.4 (Homebrew, unpatched)
- Mac Studio M3 Ultra 96GB
- Qwen3-VL-30B-A3B-Instruct-4bit
- Structured JSON extraction prompt
- 20 diverse photos per run
Not related to #648
We tested the IOKit underflow fix from #648 and saw no change in this behavior. The deferred-clear race is a different issue.
Summary
JSON validity of VLM structured extraction drops significantly under concurrent requests on oMLX v0.3.4. Even concurrency=2 shows consistent degradation.
Data
Three separate runs at each concurrency level, 20 photos each, Qwen3-VL-30B-A3B-Instruct-4bit, structured JSON extraction prompt,
temperature: 0.1:Concurrency Sweep (unpatched v0.3.4)
Failure Types
conc=2: Output contains corrupted/merged JSON. Example:
{"description": "A young girl in a pink, "description": "A y— two separate responses appear concatenated.conc=16:
content: nullwithcompletion_tokens: 0— model fails to generate any content. 5 of 6 failures were this type.Key Observations
content: null— the model produces no output at all.Correction
We previously reported 100% validity at conc=2 (on our review of PR #648). That measurement may have had different conditions or methodology. The data here is from a clean unpatched v0.3.4 install with 3 repeated runs.
Environment
Not related to #648
We tested the IOKit underflow fix from #648 and saw no change in this behavior. The deferred-clear race is a different issue.