+ learned attention, - fused loss, by RolandBERTINJOHANNET · Pull Request #194 · ruflab/shimmer

RolandBERTINJOHANNET · 2025-12-08T14:33:28Z

Adding the changes from the attention paper, where they are relevant !

the general idea is I :

removed the fused loss (it now counts as dcy)
we still log all the losses individually (by encoded and decoded modalities)
separated the cycle loss from the broadcast loss function (without re-encoding anything, the broadcast loss returns the elements required for cycling)
added the paper's attention version + a helper for the user to switch attention to trained after training a RandomSelection gw
added tests for stability.
we decided against including the branching between different selection mechanisms inside of the broadcast loss, which was only useful for the case where we were doing augmentation on MM-IMDB

RolandBERTINJOHANNET added 30 commits December 8, 2025 12:56

Add content-based selection and set as fusion default

fd4145a

Add helper to attach learned attention, keep fusion default random

2039528

Move learned attention helper to fusion class

6930d1d

Rebalance broadcast loss coefs and aggregates

a8004b1

Fix indentation in selection helper causing test import failure

e646158

Run ruff cleanups in selection

4f09afc

Format global_workspace with ruff

b31049a

Align selection signatures and move learned attention helper

95d054f

Apply ruff formatting to global workspace

7bca399

Update broadcasts docstrings

95b3b62

Remove fused loss handler

cc9dbc3

Rename learned attention module

1b2fd83

Add LearnedAttention coverage

c16fd1d

Fix LearnedAttention test types for mypy

4db5e55

Split cycle loss from broadcast path

bc53230

Format code with ruff

931df04

Move cycle reconstruction into cycle loss

9f15e82

Ensure LearnedAttention only builds selected key projection

3dca8a4

Annotate LearnedAttention key layers as optional for mypy

c192aac

Fix mypy issues in LearnedAttention and ckpt migration CLI

bf3f75c

Revert ckpt migration typing tweaks

664dc4f

Fix mypy for ckpt migration CLI by using string paths

6b66406

Add domain-latent key option to LearnedAttention

101f7f1

Format LearnedAttention per ruff

d1badae

Pass domain key options through init_learned_attention

02167d1

Drop duplicate missing-domain-dims check in attention tests

8388448

Warn before initializing LearnedAttention

553ab6f

Refactor broadcast naming and stop logging metrics

887cbcf

Warn when loss coef missing

17b467a

Add warning test for missing loss coef

c6506ef

RolandBERTINJOHANNET added 3 commits December 17, 2025 12:09

Clarify broadcast docstring

4dc4e7c

Run ruff format

8601c92

Restore broadcast metrics logging (minus aggregate)

3213179

rufinv approved these changes Jan 8, 2026

View reviewed changes

rufinv merged commit 5cce857 into main Jan 8, 2026
3 checks passed

RolandBERTINJOHANNET deleted the modules-pr-only branch January 26, 2026 10:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

+ learned attention, - fused loss, #194

+ learned attention, - fused loss, #194
rufinv merged 33 commits intomainfrom
modules-pr-only

RolandBERTINJOHANNET commented Dec 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RolandBERTINJOHANNET commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RolandBERTINJOHANNET commented Dec 8, 2025 •

edited

Loading