Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| Activation checkpointing discards intermediate activations during the forward pass and recomputes them during the backward pass, trading compute for memory. | ||
|
|
||
| To enable it, use: | ||
| If `trainer.model.ac` is unset, supported custom implementations default to selective AC on the cheapest target: |
There was a problem hiding this comment.
huh ? where is it define, I don't like this tbh
There was a problem hiding this comment.
would agree here. a config unset should not default do doing it anyway
There was a problem hiding this comment.
we can ofc default to selective ac if this is reasonable, but it should be explicit in the configs imo (e.g. an agent should see ac.mode = selective instead of ac.mode = None)
samsja
left a comment
There was a problem hiding this comment.
I don't like the magic logic that auto select the ac even when the cli doesn't enable it, lets just enable ac by default instead ??

Adds moe_act selective AC target and adds some docs on selective AC tuning
Note
Medium Risk
Changes the trainer’s default activation-checkpointing behavior for custom models (now implicitly enables selective AC), which can affect memory/throughput characteristics and recomputation. Adds a new selective AC hook in MoE expert code paths, so correctness/perf should be validated on representative MoE workloads.
Overview
Selective activation checkpointing is expanded and made easier to use for custom models. When
trainer.model.acis unset, the trainer now implicitly enables selective activation checkpointing (mode="selective",targets=["norm"]) for the custom implementation, while HF models still default to AC disabled; explicitly setting[trainer.model.ac](or--model.ac) continues to mean full-layer checkpointing.Adds a new selective target
moe_actfor MoE layers. The selective AC system can now checkpoint only the routed expert activation function, and it is automatically skipped whenrouted_expertsis also enabled to avoid nested/double checkpointing; MoE expert implementations were refactored to exposemoe_actas a hookable method.Docs and changelog were updated to describe the new defaulting behavior and provide selective-AC tuning guidance, and unit tests were added/updated to cover the new defaults and
moe_actpatching/subsumption behavior.Written by Cursor Bugbot for commit 2b0abaa. This will update automatically on new commits. Configure here.