code different with paper

the code:train/SeleKT/selekt.py calculate topk in every parameter, not global

the paper https://arxiv.org/pdf/2503.03656. 4. Robust Model Adaptation/Proposed Solution says 
"we first compute dense gradients by doing full finetuning of the model θ, and the compute the top-k non-zero
entries (by magnitude) on the (accumulated) gradient vector
or the “task vector” θ − θbase. **This also ensures that the
parameter selection is global** and not confined to specific layers or other heuristics employed in earlier robust finetuning strategies (Lee et al., 2023)."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code different with paper #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

code different with paper #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions