-
Notifications
You must be signed in to change notification settings - Fork 7
add more models #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
add more models #57
Conversation
bb41004 to
a282bef
Compare
|
mlx-community/GLM-Z1-9B-0414-bf16 fails with:
|
|
Models are now tracked within catalogue. See catalog |
|
Added the working models to the catalogue. I'll look into 6bit and bf16 quantization problems |
ae1e76a to
a2e676b
Compare
a2e676b to
d04d04e
Compare
|
I'm curious. Are you not using the native MLX engine that already supports so many more model architectures? It would be wonderfull to be able to use any MLX supported model. Currently the project don't make much sense with my 128GB Mac. Only one supported model would be bigger than what I can run conventionally: Hermes-4-405B-MLX-4bit, at 228GB. But MOE like Qwen3 235B-A22B (132B at 4bit), or GLM 4.6 355B-A32B (198GB at 4bit, 154GB at 3bit) would be much more relevant. Large dense model are too slow for inference on Apple Silicon. |
You're right on this. Reason we started very minimum is to test and expand the software itself. Although we are using MLX, models are not directly usable, we need to update model scripts and test them accordingly. In short, we'll add all the models supported by MLX. We are also working on many optimizations including MoE runtime routing, expert assignments and sparsity. |
Summary
Add support for new model architectures:
Changes
Testing
Also modified existent catalogue entries:
Dependencies
Commit is dependent on
distilpPRs: firstbatchxyz/distilp#18 and firstbatchxyz/distilp#17