feat: add GLM-4.7-Flash (glm4_moe_lite) model support by Realmhang · Pull Request #104 · flagos-ai/vllm-plugin-FL

Realmhang · 2026-03-24T10:00:02Z

Core

New Features

Add support for GLM-4.7-Flash (glm4_moe_lite) model, which combines MLA (Multi-head Latent Attention) from DeepSeek V2/V3 with MoE
architecture from GLM-4 MoE
Add config bridge Glm4MoeLiteConfig extending Glm4MoeConfig with MLA fields, DSA Indexer fields, and MTP support
Add model implementation (Glm4MoeLiteForCausalLM) inheriting from both glm4_moe and deepseek_v2 components
Register model and config in plugin entry point

CLAassistant · 2026-03-24T10:00:13Z

All committers have signed the CLA.

github-actions bot added the core label Mar 24, 2026

feat: add GLM-4.7-Flash (glm4_moe_lite) model support

6a1425d

Realmhang force-pushed the feat/glm47_flash branch from 9065572 to 6a1425d Compare March 25, 2026 12:29

Provide feedback