Summary
Change norm weight (gamma/beta) tensor parameters from 2D shape [1, H] to 1D shape [H] in Qwen3 model examples, to match the actual weight layout in HuggingFace model checkpoints.
Motivation / Use Case
HuggingFace model weights for RMSNorm and LayerNorm are stored as 1D tensors of shape [hidden_size]. Currently, Qwen3 examples define these weights as 2D tensors with shape [1, hidden_size]. This mismatch means users must reshape weights when loading from HuggingFace checkpoints, adding unnecessary friction for model deployment.
Affected files:
examples/models/qwen3/qwen3_14b_decode.py
examples/models/qwen3/qwen3_14b_prefill.py
examples/models/qwen3/qwen3_32b_decode.py
examples/models/qwen3/qwen3_32b_prefill.py
examples/models/qwen3/qwen3_32b_training_draft.py
Proposed API / Behavior
Change norm weight parameter declarations from:
gamma: pl.Tensor[[1, HIDDEN], pl.FP32]
to:
gamma: pl.Tensor[[HIDDEN], pl.FP32]
This applies to all rms_norm_weight, input_norm_weight, post_norm_weight parameters in the Qwen3 model examples.
Summary
Change norm weight (gamma/beta) tensor parameters from 2D shape
[1, H]to 1D shape[H]in Qwen3 model examples, to match the actual weight layout in HuggingFace model checkpoints.Motivation / Use Case
HuggingFace model weights for RMSNorm and LayerNorm are stored as 1D tensors of shape
[hidden_size]. Currently, Qwen3 examples define these weights as 2D tensors with shape[1, hidden_size]. This mismatch means users must reshape weights when loading from HuggingFace checkpoints, adding unnecessary friction for model deployment.Affected files:
examples/models/qwen3/qwen3_14b_decode.pyexamples/models/qwen3/qwen3_14b_prefill.pyexamples/models/qwen3/qwen3_32b_decode.pyexamples/models/qwen3/qwen3_32b_prefill.pyexamples/models/qwen3/qwen3_32b_training_draft.pyProposed API / Behavior
Change norm weight parameter declarations from:
to:
This applies to all
rms_norm_weight,input_norm_weight,post_norm_weightparameters in the Qwen3 model examples.