Skip to content

[Bug] Strange behavior of the lr_mult #1612

@AlphaPlusTT

Description

@AlphaPlusTT

Prerequisite

Environment

mmcv==2.1.0
mmdet==3.3.0
mmdet3d==1.4.0
mmengine==0.10.5

Reproduces the problem - code sample

During training, I want the learning rate of the image_backbone to remain at 0.1 times the base learning rate. Therefore, I set the following in the configuration file:

optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01),
    clip_grad=dict(max_norm=35, norm_type=2),
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
        }
    )
)

And set the param_scheduler:

param_scheduler = [
    # learning rate scheduler
    # During the first 8 epochs, learning rate increases from lr to lr * 100
    # during the next 12 epochs, learning rate decreases from lr * 100 to lr
    dict(
        type='CosineAnnealingLR',
        T_max=8,
        eta_min=lr * 100,
        begin=0,
        end=8,
        by_epoch=True,
        convert_to_iter_based=True),
    dict(
        type='CosineAnnealingLR',
        T_max=12,
        eta_min=lr,
        begin=8,
        end=20,
        by_epoch=True,
        convert_to_iter_based=True),
    # momentum scheduler
    # During the first 8 epochs, momentum increases from 0 to 0.85 / 0.95
    # during the next 12 epochs, momentum increases from 0.85 / 0.95 to 1
    dict(
        type='CosineAnnealingMomentum',
        T_max=8,
        eta_min=0.85 / 0.95,
        begin=0,
        end=8,
        by_epoch=True,
        convert_to_iter_based=True),
    dict(
        type='CosineAnnealingMomentum',
        T_max=12,
        eta_min=1,
        begin=8,
        end=20,
        by_epoch=True,
        convert_to_iter_based=True)
]

Reproduces the problem - command or script

Consistent with the above.

Reproduces the problem - error message

At the beginning, the learning rate of img_backbone is indeed 0.1 times the base learning rate:

2024/11/21 21:40:30 - mmengine - INFO - Epoch(train)  [1][ 100/3517]  base_lr: 5.0005e-05 lr: 5.0060e-06  eta: 20:29:23  time: 0.9889  data_time: 0.0563  memory: 32041  grad_norm: 52825.8107  loss: 5568.1427  task0.loss_heatmap: 49.9860  task0.loss_bbox: 0.9099  task1.loss_heatmap: 596.6443  task1.loss_bbox: 1.1417  task2.loss_heatmap: 2504.7168  task2.loss_bbox: 1.5418  task3.loss_heatmap: 620.9393  task3.loss_bbox: 0.8771  task4.loss_heatmap: 1612.4171  task4.loss_bbox: 0.9113  task5.loss_heatmap: 177.1250  task5.loss_bbox: 0.9324

However, img_backbone's learning rate slowly caught up during the training process:

2024/11/21 23:45:30 - mmengine - INFO - Epoch(train)  [3][ 100/3517]  base_lr: 7.2556e-05 lr: 3.4323e-05  eta: 18:42:58  time: 1.0738  data_time: 0.0613  memory: 32031  grad_norm: 64.6705  loss: 14.8487  task0.loss_heatmap: 1.4181  task0.loss_bbox: 0.6505  task1.loss_heatmap: 2.0847  task1.loss_bbox: 0.7157  task2.loss_heatmap: 2.0074  task2.loss_bbox: 0.7194  task3.loss_heatmap: 1.4966  task3.loss_bbox: 0.5754  task4.loss_heatmap: 1.9814  task4.loss_bbox: 0.6894  task5.loss_heatmap: 1.8084  task5.loss_bbox: 0.7016
...
...
2024/11/22 01:50:03 - mmengine - INFO - Epoch(train)  [5][ 100/3517]  base_lr: 1.2583e-04 lr: 1.0358e-04  eta: 16:36:18  time: 1.0527  data_time: 0.0568  memory: 32069  grad_norm: 52.7803  loss: 15.1927  task0.loss_heatmap: 1.4274  task0.loss_bbox: 0.6234  task1.loss_heatmap: 2.1254  task1.loss_bbox: 0.6715  task2.loss_heatmap: 2.0836  task2.loss_bbox: 0.7248  task3.loss_heatmap: 1.9199  task3.loss_bbox: 0.6361  task4.loss_heatmap: 1.8900  task4.loss_bbox: 0.6479  task5.loss_heatmap: 1.7534  task5.loss_bbox: 0.6891

It looks like lr_mult only works at the beginning to set the learning rate. How can I make lr_mult work throughout the training process?

Additional information

I think that after adding lr_mult, the learning rate of the image backbone during the entire training process should be 0.1 times the basic learning rate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions