You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+2-1
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@
10
10
11
11
## The reasons why you use `pytorch-optimizer`.
12
12
13
-
* Wide range of supported optimizers. Currently, **85 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13
+
* Wide range of supported optimizers. Currently, **86 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
14
14
* Including many variants such as `Cautious`, `AdamD`, `Gradient Centrailiaztion`
| LaProp |*Separating Momentum and Adaptivity in Adam*|[github](https://github.com/Z-T-WANG/LaProp-Optimizer)|<https://arxiv.org/abs/2002.04839>|[cite](https://github.com/Z-T-WANG/LaProp-Optimizer?tab=readme-ov-file#citation)|
195
195
| APOLLO |*SGD-like Memory, AdamW-level Performance*|[github](https://github.com/zhuhanqing/APOLLO)|<https://arxiv.org/abs/2412.05270>|[cite](https://github.com/zhuhanqing/APOLLO?tab=readme-ov-file#-citation)|
196
196
| MARS |*Unleashing the Power of Variance Reduction for Training Large Models*|[github](https://github.com/AGI-Arena/MARS)|<https://arxiv.org/abs/2411.10438>|[cite](https://github.com/AGI-Arena/MARS/tree/main?tab=readme-ov-file#citation)|
197
+
| SGDSaI |*No More Adam: Learning Rate Scaling at Initialization is All You Need*|[github](https://github.com/AnonymousAlethiometer/SGD_SaI)|<https://arxiv.org/abs/2411.10438>|[cite](https://github.com/AnonymousAlethiometer/SGD_SaI?tab=readme-ov-file#citation)|
Copy file name to clipboardexpand all lines: docs/index.md
+2-1
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@
10
10
11
11
## The reasons why you use `pytorch-optimizer`.
12
12
13
-
* Wide range of supported optimizers. Currently, **85 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13
+
* Wide range of supported optimizers. Currently, **86 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
14
14
* Including many variants such as `Cautious`, `AdamD`, `Gradient Centrailiaztion`
| LaProp |*Separating Momentum and Adaptivity in Adam*|[github](https://github.com/Z-T-WANG/LaProp-Optimizer)|<https://arxiv.org/abs/2002.04839>|[cite](https://github.com/Z-T-WANG/LaProp-Optimizer?tab=readme-ov-file#citation)|
195
195
| APOLLO |*SGD-like Memory, AdamW-level Performance*|[github](https://github.com/zhuhanqing/APOLLO)|<https://arxiv.org/abs/2412.05270>|[cite](https://github.com/zhuhanqing/APOLLO?tab=readme-ov-file#-citation)|
196
196
| MARS |*Unleashing the Power of Variance Reduction for Training Large Models*|[github](https://github.com/AGI-Arena/MARS)|<https://arxiv.org/abs/2411.10438>|[cite](https://github.com/AGI-Arena/MARS/tree/main?tab=readme-ov-file#citation)|
197
+
| SGDSaI |*No More Adam: Learning Rate Scaling at Initialization is All You Need*|[github](https://github.com/AnonymousAlethiometer/SGD_SaI)|<https://arxiv.org/abs/2411.10438>|[cite](https://github.com/AnonymousAlethiometer/SGD_SaI?tab=readme-ov-file#citation)|
0 commit comments