molt implementation #124

MerlinRaptor · 2025-08-11T12:19:42Z

Aside from mesh_shape assignment in src/lm_saes/runners/train.py (train_sae) 96 line, there should be no changes outside of molt, so this is expected to be a safe merge. Please let me know if you notice any unintended changes outside of molt.

Things to do: 1. a better rank distribution config setting strategy for pivoting model size easily; 2. distributed training logic 3. aliginging the output of prepare_input 4. unit tests

…ecode einops

inplement low rank decomposed matrix multiplication; implement a tiny kernel fusion

…o transcoder

exist a bug that make distributed and non-dist training not aligned and that will fixed later

…s done right directly from_local(reconstruction) will not all reduce across devices for reconstructions; now reconstrution also support data parallelism and aligned with sae there probably be minor bugs regarding decoder_norm and init, will be fixed later if necessary

…t inference and dist training Distinct model_parallel_size_training form model_parallel_size_running

…kwargs

MerlinRaptor added 19 commits July 31, 2025 15:42

feat(molt): a staged version of molt

7a06ca4

feat(molt): most of molt should be done right

006280c

Things to do: 1. a better rank distribution config setting strategy for pivoting model size easily; 2. distributed training logic 3. aliginging the output of prepare_input 4. unit tests

refactor(molt): refactor rank distribution logic and a fix a bug in d…

5f50f3d

…ecode einops

fix(trainer): adapt log_info for situations where l_s doesn't exist

ab321bd

refactor(molt): refactor decode for a better effciency and vram usage

aa71656

inplement low rank decomposed matrix multiplication; implement a tiny kernel fusion

refactor(molt): refactor decode and achieve a comparable efficiency t…

82fc5d0

…o transcoder

refactor(molt): refactor decode into einsum oprations

05f9176

feat(molt): implement distributed molt

836efeb

exist a bug that make distributed and non-dist training not aligned and that will fixed later

fix(molt): fix a init bug in dist molt

f614978

feat(trainer): add per rank_group logging logic to molt

c10a296

refactor(molt): refactor rank assignments logic for disentangling dis…

a2a94e3

…t inference and dist training Distinct model_parallel_size_training form model_parallel_size_running

misc(trainer): add more logging logic for molt

b14ac14

fix(train): support sae for data prallel

cba63d8

misc: ruff fix

a042a49

refactor(abstract_sae): prepare_input should now also return decoder_…

b5793b4

…kwargs

fix(molt):fix a bug regarding rank assign introduced by a2a94e3

fb05597

feat(analyze): support molt analysis

ad2541b

misc: format changes for type check

37ebc08

MerlinRaptor requested review from Frankstein73, Hzfinfdu and dest1n1s and removed request for Hzfinfdu and dest1n1s August 11, 2025 12:37

misc: ruff fix

4a1e3ba

dest1n1s force-pushed the jxwu-molt branch from eabc3ec to 4a1e3ba Compare August 11, 2025 12:45

MerlinRaptor requested a review from Hzfinfdu August 11, 2025 12:49

dest1n1s approved these changes Sep 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

molt implementation #124

molt implementation #124

Uh oh!

MerlinRaptor commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

molt implementation #124

Are you sure you want to change the base?

molt implementation #124

Uh oh!

Conversation

MerlinRaptor commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants