Bogus output from Qwen30-Next-80B-A3B-Thinking-8bit

```
=== INFERENCE PIPELINE DIAGNOSTICS ===
Loading model: models--mlx-community--Qwen3-Next-80B-A3B-Thinking-8bit/snapshots/d093dbe8233828ca0cc420f75466133c542a1e96
Model loaded.

--- TEST 1: Basic GPU ops ---
[DIAG] matmul(ones, 2*ones) expect=8	   shape=(4,4) min=8.000000 max=8.000000 mean=8.000000 |mean|=8.000000
[VALS] matmul result: [8.0000, 8.0000, 8.0000, 8.0000, 8.0000, 8.0000, 8.0000, 8.0000]
[hipBLASLt] first call
[hipBLASLt] M=4 N=4 K=4 ta=0 tb=0 lda=4 ldb=4 ldc=4
[DIAG] bf16 matmul expect=8		   shape=(4,4) min=8.000000 max=8.000000 mean=8.000000 |mean|=8.000000

--- TEST 2: quantized_matmul vs dequant ---
[DIAG] q_proj weights not found (w=0 s=0 b=0)

--- TEST 3: RMS Norm ---
[DIAG] rms_norm([1,2,3,4])		   shape=(1,1,4) min=0.365148 max=1.460593 mean=0.912871 |mean|=0.912871
[VALS] rms_norm([1,2,3,4]) expect≈[.365,.730,1.095,1.461]: [0.3651, 0.7303, 1.0954, 1.4606]
[DIAG] rms_norm(rand bf16 4096)		   shape=(1,3,4096) min=-4.593750 max=3.890625 mean=0.007758 |mean|=0.797820

--- TEST 4: RoPE ---
[DIAG] rope(ones, off=0)		   shape=(1,1,1,128) min=1.000000 max=1.000000 mean=1.000000 |mean|=1.000000
[VALS] rope(ones, off=0): [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000]
[DIAG] rope(ones, off=100)		   shape=(1,1,1,128) min=-1.414062 max=1.406250 mean=0.586235 |mean|=0.953492
[VALS] rope(ones, off=100): [1.3672, 1.3438, -1.3672, -1.3516, 0.7305, -1.3828, -1.4062, -0.9219, 1.3594, -1.1719, 1.3750, -1.1094, -0.5898, 1.2109, 1.1406, -0.0040, -0.9805, -1.3906, -1.3516, -1.0781]

--- TEST 5: Full forward pass ---
[DIAG] logits(token=1)			   shape=(1,1,151936) min=0.000000 max=0.000000 mean=0.000000 |mean|=0.000000
[VALS] logits(token=1): [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]
[DIAG] Top-10:
  token=0 logit=0.0000
  token=0 logit=0.0000
  token=0 logit=0.0000
  token=0 logit=0.0000
  token=0 logit=0.0000
  token=0 logit=0.0000
  token=0 logit=0.0000
  token=0 logit=0.0000
  token=0 logit=0.0000
  token=0 logit=0.0000
[DIAG] logits(step2)			   shape=(1,1,151936) min=0.000000 max=0.000000 mean=0.000000 |mean|=0.000000

--- TEST 6: dequantize() sanity ---
[DIAG] dequant([0..7],s=1,b=0)		   shape=(1,8) min=0.000000 max=7.000000 mean=3.500000 |mean|=3.500000
[VALS] dequant expect=[0,1,2,3,4,5,6,7]: [0.0000, 1.0000, 2.0000, 3.0000, 4.0000, 5.0000, 6.0000, 7.0000]

--- TEST 6b: Warmup pass ---
[DIAG] warmup logits			   shape=(1,1,151936) min=0.000000 max=0.000000 mean=0.000000 |mean|=0.000000
[DIAG] Warmup complete

--- TEST 7: Token-level generation trace ---
[DIAG] encode("What is 2+2?") = [3838, 374, 220, 17, 10, 17, 30] (7 tokens)
[DIAG] Token-by-token decode:
  token 3838 -> "What"
  token 374 -> " is"
  token 220 -> " "
  token 17 -> "2"
  token 10 -> "+"
  token 17 -> "2"
  token 30 -> "?"
[DIAG] Chat template tokens (17): [151644, 872, 198, 3838, 374, 220, 17, 10, 17, 30, 151645, 198, 151644, 77091, 198, 151667, 198]
[DIAG] Chat template decoded: "<|im_start|>user
What is 2+2?<|im_end|>
<|im_start|>assistant
<think>
"
[DIAG] prefill logits			   shape=(1,17,151936) min=-14.937500 max=11.312500 mean=-1.203189 |mean|=2.082019
[DIAG] Generating 20 tokens (argmax):
  step=0 token=1406 text="


"
  step=1 token=148583 text="ᑋ"
  step=2 token=82434 text="okino"
  step=3 token=148583 text="ᑋ"
  step=4 token=82434 text="okino"
  step=5 token=148583 text="ᑋ"
  step=6 token=82434 text="okino"
  step=7 token=148583 text="ᑋ"
  step=8 token=82434 text="okino"
  step=9 token=148583 text="ᑋ"
  step=10 token=82434 text="okino"
  step=11 token=148583 text="ᑋ"
  step=12 token=82434 text="okino"
  step=13 token=148583 text="ᑋ"
  step=14 token=82434 text="okino"
  step=15 token=148583 text="ᑋ"
  step=16 token=82434 text="okino"
  step=17 token=148583 text="ᑋ"
  step=18 token=82434 text="okino"
  step=19 token=148583 text="ᑋ"
[DIAG] Full output (argmax): "


ᑋokinoᑋokinoᑋokinoᑋokinoᑋokinoᑋokinoᑋokinoᑋokinoᑋokinoᑋ"

[DIAG] Generating 20 tokens (categorical T=0.7):
  step=0 token=4122 text="ams"
  step=1 token=130051 text=" toán"
  step=2 token=82434 text="okino"
  step=3 token=14871 text="details"
  step=4 token=82434 text="okino"
  step=5 token=36384 text="(sb"
  step=6 token=82434 text="okino"
  step=7 token=37377 text="uded"
  step=8 token=82434 text="okino"
  step=9 token=71168 text="legt"
  step=10 token=82434 text="okino"
  step=11 token=142349 text="مراجعة"
  step=12 token=82434 text="okino"
  step=13 token=128513 text=" pháp"
  step=14 token=82434 text="okino"
  step=15 token=2645 text="ension"
  step=16 token=82434 text="okino"
  step=17 token=38401 text=" Sketch"
  step=18 token=82434 text="okino"
  step=19 token=8520 text="VICE"
[DIAG] Full output (categorical): "ams toánokinodetailsokino(sbokinoudedokinolegtokinoمراجعةokino phápokinoensionokino SketchokinoVICE"

[DIAG] Testing via generate_text (chat.cpp path):
  token=7563 text="-th"
  token=13400 text="_loc"
  token=140149 text=" hükü"
  token=19640 text=" Ministry"
  token=139383 text="öğretim"
  token=17332 text=" Dating"
  token=140149 text=" hükü"
  token=137604 text=" Вер"
  token=139383 text="öğretim"
  token=66380 text=" scouting"
  token=140149 text=" hükü"
  token=105557 text="煤炭"
  token=140149 text=" hükü"
  token=8959 text="з"
  token=140149 text=" hükü"
  token=125012 text="ính"
  token=140149 text=" hükü"
  token=105399 text="两个人"
  token=139383 text="öğretim"
  token=59012 text=" versatility"
[DIAG] generate_text output: "-th_loc hükü Ministryöğretim Dating hükü Верöğretim scouting hükü煤炭 hüküз hüküính hükü两个人öğretim versatility"
[DIAG] Prompt:	   17 tokens, 80.9736 tokens/s, 0.209945s
Generation: 20 tokens, 30.8932 tokens/s, 0.647393s

--- TEST 8: random::categorical ---
[DIAG] categorical([..., 10, ...]) = 2 (expect 2)
[DIAG] categorical([..., 10, ...]) = 2 (expect 2)
[DIAG] categorical([..., 10, ...]) = 2 (expect 2)
[DIAG] categorical([..., 10, ...]) = 2 (expect 2)
[DIAG] categorical([..., 10, ...]) = 2 (expect 2)
[DIAG] categorical(peak@17, V=151936) = 17 (expect 17)
[DIAG] categorical(peak@17, V=151936) = 17 (expect 17)
[DIAG] categorical(peak@17, V=151936) = 17 (expect 17)
[DIAG] Testing categorical with real model logits...
[DIAG] argmax = 99706
[DIAG] categorical(T=0.7) = 19034
[DIAG] categorical(T=0.7) = 112762
[DIAG] categorical(T=0.7) = 38164
[DIAG] categorical(T=0.7) = 101202
[DIAG] categorical(T=0.7) = 112762

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bogus output from Qwen30-Next-80B-A3B-Thinking-8bit #8

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Bogus output from Qwen30-Next-80B-A3B-Thinking-8bit #8

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions