bug: VLM structured output quality degrades under concurrent requests (conc >= 2)

## Summary

JSON validity of VLM structured extraction drops significantly under concurrent requests on oMLX v0.3.4. Even concurrency=2 shows consistent degradation.

## Data

Three separate runs at each concurrency level, 20 photos each, Qwen3-VL-30B-A3B-Instruct-4bit, structured JSON extraction prompt, `temperature: 0.1`:

### Concurrency Sweep (unpatched v0.3.4)

| Concurrency | Run 1 | Run 2 | Run 3 | Avg |
|---|---|---|---|---|
| 1 | 100% (20/20) | — | — | 100% |
| 2 | 65% (13/20) | 60% (12/20) | 45% (9/20) | **57%** |
| 4 | 100% (20/20) | — | — | 100% |
| 8 | 100% (20/20) | — | — | 100% |
| 16 | 70% (14/20) | — | — | 70% |

### Failure Types

**conc=2:** Output contains corrupted/merged JSON. Example: `{"description": "A young girl in a pink, "description": "A y` — two separate responses appear concatenated.

**conc=16:** `content: null` with `completion_tokens: 0` — model fails to generate any content. 5 of 6 failures were this type.

## Key Observations

1. conc=2 is **consistently** degraded (45-65% across 3 runs). This is NOT noise.
2. conc=4 and conc=8 show 100% — the degradation is NOT monotonic with concurrency, suggesting a specific scheduling/batching edge case at low concurrency.
3. conc=16 failures are `content: null` — the model produces no output at all.
4. The corruption at conc=2 looks like cross-request contamination (merged JSON from different requests).

## Correction

We previously reported 100% validity at conc=2 (on our review of PR #648). That measurement may have had different conditions or methodology. The data here is from a clean unpatched v0.3.4 install with 3 repeated runs.

## Environment

- oMLX v0.3.4 (Homebrew, unpatched)
- Mac Studio M3 Ultra 96GB
- Qwen3-VL-30B-A3B-Instruct-4bit
- Structured JSON extraction prompt
- 20 diverse photos per run

## Not related to #648

We tested the IOKit underflow fix from #648 and saw no change in this behavior. The deferred-clear race is a different issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: VLM structured output quality degrades under concurrent requests (conc >= 2) #691

Summary

Data

Concurrency Sweep (unpatched v0.3.4)

Failure Types

Key Observations

Correction

Environment

Not related to #648

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Concurrency	Run 1	Run 2	Run 3	Avg
1	100% (20/20)	—	—	100%
2	65% (13/20)	60% (12/20)	45% (9/20)	57%
4	100% (20/20)	—	—	100%
8	100% (20/20)	—	—	100%
16	70% (14/20)	—	—	70%

bug: VLM structured output quality degrades under concurrent requests (conc >= 2) #691

Description

Summary

Data

Concurrency Sweep (unpatched v0.3.4)

Failure Types

Key Observations

Correction

Environment

Not related to #648

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions