Skip to content

[Feature] Use 1D weight tensors for norm layers to match HuggingFace conventions #151

@lyfne123

Description

@lyfne123

Summary

Change norm weight (gamma/beta) tensor parameters from 2D shape [1, H] to 1D shape [H] in Qwen3 model examples, to match the actual weight layout in HuggingFace model checkpoints.

Motivation / Use Case

HuggingFace model weights for RMSNorm and LayerNorm are stored as 1D tensors of shape [hidden_size]. Currently, Qwen3 examples define these weights as 2D tensors with shape [1, hidden_size]. This mismatch means users must reshape weights when loading from HuggingFace checkpoints, adding unnecessary friction for model deployment.

Affected files:

  • examples/models/qwen3/qwen3_14b_decode.py
  • examples/models/qwen3/qwen3_14b_prefill.py
  • examples/models/qwen3/qwen3_32b_decode.py
  • examples/models/qwen3/qwen3_32b_prefill.py
  • examples/models/qwen3/qwen3_32b_training_draft.py

Proposed API / Behavior

Change norm weight parameter declarations from:

gamma: pl.Tensor[[1, HIDDEN], pl.FP32]

to:

gamma: pl.Tensor[[HIDDEN], pl.FP32]

This applies to all rms_norm_weight, input_norm_weight, post_norm_weight parameters in the Qwen3 model examples.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions