the code:train/SeleKT/selekt.py calculate topk in every parameter, not global
the paper https://arxiv.org/pdf/2503.03656. 4. Robust Model Adaptation/Proposed Solution says
"we first compute dense gradients by doing full finetuning of the model θ, and the compute the top-k non-zero
entries (by magnitude) on the (accumulated) gradient vector
or the “task vector” θ − θbase. This also ensures that the
parameter selection is global and not confined to specific layers or other heuristics employed in earlier robust finetuning strategies (Lee et al., 2023)."
the code:train/SeleKT/selekt.py calculate topk in every parameter, not global
the paper https://arxiv.org/pdf/2503.03656. 4. Robust Model Adaptation/Proposed Solution says
"we first compute dense gradients by doing full finetuning of the model θ, and the compute the top-k non-zero
entries (by magnitude) on the (accumulated) gradient vector
or the “task vector” θ − θbase. This also ensures that the
parameter selection is global and not confined to specific layers or other heuristics employed in earlier robust finetuning strategies (Lee et al., 2023)."