add more models #57

GandalfTea · 2025-11-25T10:43:38Z

Summary

Add support for new model architectures:

Olmo3
GLM4
Qwen3-MoE

Changes

 src/dnet/core/models/__init__.py  |   6 +++
 src/dnet/core/models/glm4.py      | 119 ++++++++++++++++++++++++++++++++++++++++++
 src/dnet/core/models/olmo3.py     | 120 ++++++++++++++++++++++++++++++++++++++++++
 src/dnet/core/models/qwen3_moe.py | 187 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 432 insertions(+)

Testing

  Olmo 3
! mlx-community/Olmo-3-1025-7B-4bit          [FAIL] (junk output) 
+ mlx-community/Olmo-3-7B-Think-4bit
+ mlx-community/Olmo-3-7B-Think-SFT-4bit
+ mlx-community/Olmo-3-7B-Instruct-4bit
+ mlx-community/Olmo-3-7B-Instruct-SFT-4bit

+ mlx-community/Olmo-3-1025-7B-8bit
+ mlx-community/Olmo-3-7B-Think-8bit
+ mlx-community/Olmo-3-7B-Think-SFT-8bit
+ mlx-community/Olmo-3-7B-Instruct-8bit
+ mlx-community/Olmo-3-7B-Instruct-SFT-8bit

! mlx-community/Olmo-3-7B-Instruct-bf16.        [FAIL] (bf16 fails sampling)
! mlx-community/Olmo-3-7B-Instruct-SFT-bfloat16 [FAIL] 
! mlx-community/Olmo-3-7B-Think-bfloat16        [FAIL] 
! mlx-community/Olmo-3-7B-Think-SFT-bfloat16
! mlx-community/Olmo-3-1025-7B-bfloat16         [FAIL] 

+ mlx-community/Olmo-3-1125-32B-4bit
+ mlx-community/Olmo-3-1125-32B-8bit


  GLM
+ mlx-community/GLM-4-9B-0414-4bit
+ mlx-community/GLM-Z1-9B-0414-4bit
+ mlx-community/GLM-4-9B-0414-8bit 
+ mlx-community/GLM-Z1-9B-0414-8bit
+ mlx-community/GLM-4-32B-0414-4bit
+ mlx-community/GLM-Z1-32B-0414-4bit
! mlx-community/GLM-Z1-9B-0414-bf16 [FAIL] (failed sampling)
! mlx-community/GLM-4-9B-0414-bf16  [FAIL] (failed sampling)

  Qwen3-MoE
+ mlx-community/Qwen3-30B-A3B-4bit
+ mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
+ mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit

Also modified existent catalogue entries:

- mlx-community/Qwen3-8B-bf16 (failed sampling)

Dependencies

Commit is dependent on distilp PRs: firstbatchxyz/distilp#18 and firstbatchxyz/distilp#17

GandalfTea · 2025-11-25T15:54:24Z

mlx-community/GLM-Z1-9B-0414-bf16 fails with:

2025-11-25 07:52:38,633 - dnet - ERROR - fit_in_memory.py:164 - End-shard sampling failed: [matmul] Last dimension of first input with shape (1,14,4096) must match second to last dimension of second input with shape (151552,4096).

andthattoo · 2025-11-25T16:40:01Z

Models are now tracked within catalogue. See catalog

GandalfTea · 2025-11-25T18:07:09Z

Added the working models to the catalogue. I'll look into 6bit and bf16 quantization problems

ShivaThomas · 2025-12-15T05:18:05Z

I'm curious. Are you not using the native MLX engine that already supports so many more model architectures? It would be wonderfull to be able to use any MLX supported model. Currently the project don't make much sense with my 128GB Mac. Only one supported model would be bigger than what I can run conventionally: Hermes-4-405B-MLX-4bit, at 228GB. But MOE like Qwen3 235B-A22B (132B at 4bit), or GLM 4.6 355B-A32B (198GB at 4bit, 154GB at 3bit) would be much more relevant. Large dense model are too slow for inference on Apple Silicon.

andthattoo · 2025-12-15T09:08:13Z

I'm curious. Are you not using the native MLX engine that already supports so many more model architectures? It would be wonderfull to be able to use any MLX supported model. Currently the project don't make much sense with my 128GB Mac. Only one supported model would be bigger than what I can run conventionally: Hermes-4-405B-MLX-4bit, at 228GB. But MOE like Qwen3 235B-A22B (132B at 4bit), or GLM 4.6 355B-A32B (198GB at 4bit, 154GB at 3bit) would be much more relevant. Large dense model are too slow for inference on Apple Silicon.

You're right on this. Reason we started very minimum is to test and expand the software itself. Although we are using MLX, models are not directly usable, we need to update model scripts and test them accordingly. In short, we'll add all the models supported by MLX.

We are also working on many optimizations including MoE runtime routing, expert assignments and sparsity.

GandalfTea force-pushed the oto/add-models branch 2 times, most recently from bb41004 to a282bef Compare November 25, 2025 14:54

GandalfTea marked this pull request as ready for review November 25, 2025 18:07

GandalfTea marked this pull request as draft November 26, 2025 05:08

GandalfTea added 5 commits November 26, 2025 21:54

add support for qwen3_moe model

76162fc

add support for glm4 model

16bd540

add olmo3 model

0adfb07

ruff format

37e7ebe

add working models to catalog

069aa50

GandalfTea force-pushed the oto/add-models branch from ae1e76a to a2e676b Compare November 27, 2025 08:26

GandalfTea added 2 commits November 27, 2025 00:34

fix qwen3_moe switch_glu object

451190c

update catalogue

d04d04e

GandalfTea force-pushed the oto/add-models branch from a2e676b to d04d04e Compare November 27, 2025 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add more models #57

add more models #57

Uh oh!

GandalfTea commented Nov 25, 2025 •

edited

Loading

Uh oh!

GandalfTea commented Nov 25, 2025

Uh oh!

andthattoo commented Nov 25, 2025

Uh oh!

GandalfTea commented Nov 25, 2025

Uh oh!

ShivaThomas commented Dec 15, 2025

Uh oh!

andthattoo commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add more models #57

Are you sure you want to change the base?

add more models #57

Uh oh!

Conversation

GandalfTea commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Dependencies

Uh oh!

GandalfTea commented Nov 25, 2025

Uh oh!

andthattoo commented Nov 25, 2025

Uh oh!

GandalfTea commented Nov 25, 2025

Uh oh!

ShivaThomas commented Dec 15, 2025

Uh oh!

andthattoo commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GandalfTea commented Nov 25, 2025 •

edited

Loading