You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+3-2
Original file line number
Diff line number
Diff line change
@@ -10,8 +10,8 @@
10
10
11
11
## The reasons why you use `pytorch-optimizer`.
12
12
13
-
* Wide range of supported optimizers. Currently, **89 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
14
-
* Including many variants such as `Cautious`, `AdamD`, `Gradient Centrailiaztion`
13
+
* Wide range of supported optimizers. Currently, **90 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
14
+
* Including many variants such as `ADOPT`, `Cautious`, `AdamD`,`StableAdamW`, and`Gradient Centrailiaztion`
15
15
* Easy to use, clean, and tested codes
16
16
* Active maintenance
17
17
* Somewhat a bit more optimized compared to the original implementation
| Grams |*Gradient Descent with Adaptive Momentum Scaling*||<https://arxiv.org/abs/2412.17107>|[cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241217107C/exportcitation)|
199
199
| OrthoGrad |*Grokking at the Edge of Numerical Stability*|[github](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability)|<https://arxiv.org/abs/2501.04697>|[cite](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability?tab=readme-ov-file#citation)|
200
200
| Adam-ATAN2 |*Scaling Exponents Across Parameterizations and Optimizers*||<https://arxiv.org/abs/2407.05872>|[cite](https://ui.adsabs.harvard.edu/abs/2024arXiv240705872E/exportcitation)|
201
+
| SPAM |*Spike-Aware Adam with Momentum Reset for Stable LLM Training*|[github](https://github.com/TianjinYellow/SPAM-Optimizer)|<https://arxiv.org/abs/2501.06842>|[cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250106842H/exportcitation)|
Copy file name to clipboardexpand all lines: docs/index.md
+3-2
Original file line number
Diff line number
Diff line change
@@ -10,8 +10,8 @@
10
10
11
11
## The reasons why you use `pytorch-optimizer`.
12
12
13
-
* Wide range of supported optimizers. Currently, **89 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
14
-
* Including many variants such as `Cautious`, `AdamD`, `Gradient Centrailiaztion`
13
+
* Wide range of supported optimizers. Currently, **90 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
14
+
* Including many variants such as `ADOPT`, `Cautious`, `AdamD`,`StableAdamW`, and`Gradient Centrailiaztion`
15
15
* Easy to use, clean, and tested codes
16
16
* Active maintenance
17
17
* Somewhat a bit more optimized compared to the original implementation
| Grams |*Gradient Descent with Adaptive Momentum Scaling*||<https://arxiv.org/abs/2412.17107>|[cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241217107C/exportcitation)|
199
199
| OrthoGrad |*Grokking at the Edge of Numerical Stability*|[github](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability)|<https://arxiv.org/abs/2501.04697>|[cite](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability?tab=readme-ov-file#citation)|
200
200
| Adam-ATAN2 |*Scaling Exponents Across Parameterizations and Optimizers*||<https://arxiv.org/abs/2407.05872>|[cite](https://ui.adsabs.harvard.edu/abs/2024arXiv240705872E/exportcitation)|
201
+
| SPAM |*Spike-Aware Adam with Momentum Reset for Stable LLM Training*|[github](https://github.com/TianjinYellow/SPAM-Optimizer)|<https://arxiv.org/abs/2501.06842>|[cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250106842H/exportcitation)|
0 commit comments