-
Notifications
You must be signed in to change notification settings - Fork 436
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Prerequisite
- I have searched Issues and Discussions but cannot get the expected help.
- The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmengine).
Environment
mmcv==2.1.0
mmdet==3.3.0
mmdet3d==1.4.0
mmengine==0.10.5
Reproduces the problem - code sample
During training, I want the learning rate of the image_backbone to remain at 0.1 times the base learning rate. Therefore, I set the following in the configuration file:
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01),
clip_grad=dict(max_norm=35, norm_type=2),
paramwise_cfg=dict(
custom_keys={
'img_backbone': dict(lr_mult=0.1),
}
)
)And set the param_scheduler:
param_scheduler = [
# learning rate scheduler
# During the first 8 epochs, learning rate increases from lr to lr * 100
# during the next 12 epochs, learning rate decreases from lr * 100 to lr
dict(
type='CosineAnnealingLR',
T_max=8,
eta_min=lr * 100,
begin=0,
end=8,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=12,
eta_min=lr,
begin=8,
end=20,
by_epoch=True,
convert_to_iter_based=True),
# momentum scheduler
# During the first 8 epochs, momentum increases from 0 to 0.85 / 0.95
# during the next 12 epochs, momentum increases from 0.85 / 0.95 to 1
dict(
type='CosineAnnealingMomentum',
T_max=8,
eta_min=0.85 / 0.95,
begin=0,
end=8,
by_epoch=True,
convert_to_iter_based=True),
dict(
type='CosineAnnealingMomentum',
T_max=12,
eta_min=1,
begin=8,
end=20,
by_epoch=True,
convert_to_iter_based=True)
]Reproduces the problem - command or script
Consistent with the above.
Reproduces the problem - error message
At the beginning, the learning rate of img_backbone is indeed 0.1 times the base learning rate:
2024/11/21 21:40:30 - mmengine - INFO - Epoch(train) [1][ 100/3517] base_lr: 5.0005e-05 lr: 5.0060e-06 eta: 20:29:23 time: 0.9889 data_time: 0.0563 memory: 32041 grad_norm: 52825.8107 loss: 5568.1427 task0.loss_heatmap: 49.9860 task0.loss_bbox: 0.9099 task1.loss_heatmap: 596.6443 task1.loss_bbox: 1.1417 task2.loss_heatmap: 2504.7168 task2.loss_bbox: 1.5418 task3.loss_heatmap: 620.9393 task3.loss_bbox: 0.8771 task4.loss_heatmap: 1612.4171 task4.loss_bbox: 0.9113 task5.loss_heatmap: 177.1250 task5.loss_bbox: 0.9324However, img_backbone's learning rate slowly caught up during the training process:
2024/11/21 23:45:30 - mmengine - INFO - Epoch(train) [3][ 100/3517] base_lr: 7.2556e-05 lr: 3.4323e-05 eta: 18:42:58 time: 1.0738 data_time: 0.0613 memory: 32031 grad_norm: 64.6705 loss: 14.8487 task0.loss_heatmap: 1.4181 task0.loss_bbox: 0.6505 task1.loss_heatmap: 2.0847 task1.loss_bbox: 0.7157 task2.loss_heatmap: 2.0074 task2.loss_bbox: 0.7194 task3.loss_heatmap: 1.4966 task3.loss_bbox: 0.5754 task4.loss_heatmap: 1.9814 task4.loss_bbox: 0.6894 task5.loss_heatmap: 1.8084 task5.loss_bbox: 0.7016
...
...
2024/11/22 01:50:03 - mmengine - INFO - Epoch(train) [5][ 100/3517] base_lr: 1.2583e-04 lr: 1.0358e-04 eta: 16:36:18 time: 1.0527 data_time: 0.0568 memory: 32069 grad_norm: 52.7803 loss: 15.1927 task0.loss_heatmap: 1.4274 task0.loss_bbox: 0.6234 task1.loss_heatmap: 2.1254 task1.loss_bbox: 0.6715 task2.loss_heatmap: 2.0836 task2.loss_bbox: 0.7248 task3.loss_heatmap: 1.9199 task3.loss_bbox: 0.6361 task4.loss_heatmap: 1.8900 task4.loss_bbox: 0.6479 task5.loss_heatmap: 1.7534 task5.loss_bbox: 0.6891It looks like lr_mult only works at the beginning to set the learning rate. How can I make lr_mult work throughout the training process?
Additional information
I think that after adding lr_mult, the learning rate of the image backbone during the entire training process should be 0.1 times the basic learning rate.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working