Skip to content

Conversation

@GandalfTea
Copy link
Contributor

@GandalfTea GandalfTea commented Nov 25, 2025

Summary

Add support for new model architectures:

  • Olmo3
  • GLM4
  • Qwen3-MoE

Changes

 src/dnet/core/models/__init__.py  |   6 +++
 src/dnet/core/models/glm4.py      | 119 ++++++++++++++++++++++++++++++++++++++++++
 src/dnet/core/models/olmo3.py     | 120 ++++++++++++++++++++++++++++++++++++++++++
 src/dnet/core/models/qwen3_moe.py | 187 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 432 insertions(+)

Testing

  Olmo 3
! mlx-community/Olmo-3-1025-7B-4bit          [FAIL] (junk output) 
+ mlx-community/Olmo-3-7B-Think-4bit
+ mlx-community/Olmo-3-7B-Think-SFT-4bit
+ mlx-community/Olmo-3-7B-Instruct-4bit
+ mlx-community/Olmo-3-7B-Instruct-SFT-4bit

+ mlx-community/Olmo-3-1025-7B-8bit
+ mlx-community/Olmo-3-7B-Think-8bit
+ mlx-community/Olmo-3-7B-Think-SFT-8bit
+ mlx-community/Olmo-3-7B-Instruct-8bit
+ mlx-community/Olmo-3-7B-Instruct-SFT-8bit

! mlx-community/Olmo-3-7B-Instruct-bf16.        [FAIL] (bf16 fails sampling)
! mlx-community/Olmo-3-7B-Instruct-SFT-bfloat16 [FAIL] 
! mlx-community/Olmo-3-7B-Think-bfloat16        [FAIL] 
! mlx-community/Olmo-3-7B-Think-SFT-bfloat16
! mlx-community/Olmo-3-1025-7B-bfloat16         [FAIL] 

+ mlx-community/Olmo-3-1125-32B-4bit
+ mlx-community/Olmo-3-1125-32B-8bit


  GLM
+ mlx-community/GLM-4-9B-0414-4bit
+ mlx-community/GLM-Z1-9B-0414-4bit
+ mlx-community/GLM-4-9B-0414-8bit 
+ mlx-community/GLM-Z1-9B-0414-8bit
+ mlx-community/GLM-4-32B-0414-4bit
+ mlx-community/GLM-Z1-32B-0414-4bit
! mlx-community/GLM-Z1-9B-0414-bf16 [FAIL] (failed sampling)
! mlx-community/GLM-4-9B-0414-bf16  [FAIL] (failed sampling)

  Qwen3-MoE
+ mlx-community/Qwen3-30B-A3B-4bit
+ mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
+ mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit

Also modified existent catalogue entries:

- mlx-community/Qwen3-8B-bf16 (failed sampling)

Dependencies

Commit is dependent on distilp PRs: firstbatchxyz/distilp#18 and firstbatchxyz/distilp#17

@GandalfTea GandalfTea force-pushed the oto/add-models branch 2 times, most recently from bb41004 to a282bef Compare November 25, 2025 14:54
@GandalfTea
Copy link
Contributor Author

mlx-community/GLM-Z1-9B-0414-bf16 fails with:

2025-11-25 07:52:38,633 - dnet - ERROR - fit_in_memory.py:164 - End-shard sampling failed: [matmul] Last dimension of first input with shape (1,14,4096) must match second to last dimension of second input with shape (151552,4096).

@andthattoo
Copy link
Member

Models are now tracked within catalogue. See catalog

@GandalfTea
Copy link
Contributor Author

Added the working models to the catalogue. I'll look into 6bit and bf16 quantization problems

@GandalfTea GandalfTea marked this pull request as ready for review November 25, 2025 18:07
@GandalfTea GandalfTea marked this pull request as draft November 26, 2025 05:08
@ShivaThomas
Copy link

I'm curious. Are you not using the native MLX engine that already supports so many more model architectures? It would be wonderfull to be able to use any MLX supported model. Currently the project don't make much sense with my 128GB Mac. Only one supported model would be bigger than what I can run conventionally: Hermes-4-405B-MLX-4bit, at 228GB. But MOE like Qwen3 235B-A22B (132B at 4bit), or GLM 4.6 355B-A32B (198GB at 4bit, 154GB at 3bit) would be much more relevant. Large dense model are too slow for inference on Apple Silicon.

@andthattoo
Copy link
Member

I'm curious. Are you not using the native MLX engine that already supports so many more model architectures? It would be wonderfull to be able to use any MLX supported model. Currently the project don't make much sense with my 128GB Mac. Only one supported model would be bigger than what I can run conventionally: Hermes-4-405B-MLX-4bit, at 228GB. But MOE like Qwen3 235B-A22B (132B at 4bit), or GLM 4.6 355B-A32B (198GB at 4bit, 154GB at 3bit) would be much more relevant. Large dense model are too slow for inference on Apple Silicon.

You're right on this. Reason we started very minimum is to test and expand the software itself. Although we are using MLX, models are not directly usable, we need to update model scripts and test them accordingly. In short, we'll add all the models supported by MLX.

We are also working on many optimizations including MoE runtime routing, expert assignments and sparsity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants