Skip to content

Add custom paths for qwen3 and 3.5 dense#2164

Open
faresobeid wants to merge 2 commits intomainfrom
qwen3-3.5-dense
Open

Add custom paths for qwen3 and 3.5 dense#2164
faresobeid wants to merge 2 commits intomainfrom
qwen3-3.5-dense

Conversation

@faresobeid
Copy link
Copy Markdown
Contributor

@faresobeid faresobeid commented Apr 1, 2026

Doing this to support quack rms norm and custom selectice AC targets. Also in general cleaner to have custom paths for models we care about and serve even if non-MoE. Especially as we later adopt fp8/fp4 kernels to use


Note

Medium Risk
Adds new custom model implementations and changes model-config selection for Qwen3.5, which can affect training/inference correctness and attention masking/position-id behavior for these models. Main risk is regressions in Qwen3/Qwen3.5 loading and attention backends (SDPA/Flash/ring attention) rather than broader system impact.

Overview
Adds custom PrimeRL dense implementations for Qwen3 and text-only Qwen3.5, and wires them into AutoModelForCausalLMPrimeRL so impl=custom/auto-selection can instantiate these models.

Updates model loading to force Qwen3.5 text-only config when not doing VLM training (switching from composite config to text_config while preserving _attn_implementation and _name_or_path). Extends substitute_ring_attn to patch ring-attention _compute_attention for the new Qwen3/Qwen3.5 FlashAttention classes.

Written by Cursor Bugbot for commit 7d59e25. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant