[Prototype] Concatenated weights and linear layers #366

jlamypoirier · 2025-09-22T19:21:04Z

✨ Description

Handle concatenated weights, i.e. weights that are stored as one in memory for computation / optimization purpose but actually consist of multiple logically distinct ones. Ex. Key and value, gated MLP layer 1, MoE layers, SSM in_proj, etc.

This lets the engine know about the structure, and allow configuring each sub-parameter independently (lr_scale, initialization, weight decay, peft enabling. Bias enabling split possible but not implemented).

Prototype is working but postponing finalization in favor of higher priority content. Remaining:

Simplify linear layer concatenation and make it safer (especially peft)
Use for remaining linear layers
Make remaining tests pass.

jlamypoirier added 4 commits September 22, 2025 14:44

Concatenated meta

93aa81c

Merge remote-tracking branch 'origin/main' into jlp/concatenated_meta

84971ee

misc

92a6969

fix

26f9d23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Prototype] Concatenated weights and linear layers #366

[Prototype] Concatenated weights and linear layers #366

Uh oh!

jlamypoirier commented Sep 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Prototype] Concatenated weights and linear layers #366

Are you sure you want to change the base?

[Prototype] Concatenated weights and linear layers #366

Uh oh!

Conversation

jlamypoirier commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

Uh oh!

Uh oh!

jlamypoirier commented Sep 22, 2025 •

edited

Loading