[Bug] Strange behavior of the lr_mult

### Prerequisite

- [X] I have searched [Issues](https://github.com/open-mmlab/mmengine/issues) and [Discussions](https://github.com/open-mmlab/mmengine/discussions) but cannot get the expected help.
- [X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmengine).

### Environment

mmcv==2.1.0
mmdet==3.3.0
mmdet3d==1.4.0   
mmengine==0.10.5

### Reproduces the problem - code sample

During training, I want the learning rate of the image_backbone to remain at 0.1 times the base learning rate. Therefore, I set the following in the configuration file:
```python
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01),
    clip_grad=dict(max_norm=35, norm_type=2),
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
        }
    )
)
```
And set the param_scheduler:
```python
param_scheduler = [
    # learning rate scheduler
    # During the first 8 epochs, learning rate increases from lr to lr * 100
    # during the next 12 epochs, learning rate decreases from lr * 100 to lr
    dict(
        type='CosineAnnealingLR',
        T_max=8,
        eta_min=lr * 100,
        begin=0,
        end=8,
        by_epoch=True,
        convert_to_iter_based=True),
    dict(
        type='CosineAnnealingLR',
        T_max=12,
        eta_min=lr,
        begin=8,
        end=20,
        by_epoch=True,
        convert_to_iter_based=True),
    # momentum scheduler
    # During the first 8 epochs, momentum increases from 0 to 0.85 / 0.95
    # during the next 12 epochs, momentum increases from 0.85 / 0.95 to 1
    dict(
        type='CosineAnnealingMomentum',
        T_max=8,
        eta_min=0.85 / 0.95,
        begin=0,
        end=8,
        by_epoch=True,
        convert_to_iter_based=True),
    dict(
        type='CosineAnnealingMomentum',
        T_max=12,
        eta_min=1,
        begin=8,
        end=20,
        by_epoch=True,
        convert_to_iter_based=True)
]
```

### Reproduces the problem - command or script

Consistent with the above.

### Reproduces the problem - error message

At the beginning, the learning rate of img_backbone is indeed 0.1 times the base learning rate:
```shell
2024/11/21 21:40:30 - mmengine - INFO - Epoch(train)  [1][ 100/3517]  base_lr: 5.0005e-05 lr: 5.0060e-06  eta: 20:29:23  time: 0.9889  data_time: 0.0563  memory: 32041  grad_norm: 52825.8107  loss: 5568.1427  task0.loss_heatmap: 49.9860  task0.loss_bbox: 0.9099  task1.loss_heatmap: 596.6443  task1.loss_bbox: 1.1417  task2.loss_heatmap: 2504.7168  task2.loss_bbox: 1.5418  task3.loss_heatmap: 620.9393  task3.loss_bbox: 0.8771  task4.loss_heatmap: 1612.4171  task4.loss_bbox: 0.9113  task5.loss_heatmap: 177.1250  task5.loss_bbox: 0.9324
```
However, img_backbone's learning rate slowly caught up during the training process:
```shell
2024/11/21 23:45:30 - mmengine - INFO - Epoch(train)  [3][ 100/3517]  base_lr: 7.2556e-05 lr: 3.4323e-05  eta: 18:42:58  time: 1.0738  data_time: 0.0613  memory: 32031  grad_norm: 64.6705  loss: 14.8487  task0.loss_heatmap: 1.4181  task0.loss_bbox: 0.6505  task1.loss_heatmap: 2.0847  task1.loss_bbox: 0.7157  task2.loss_heatmap: 2.0074  task2.loss_bbox: 0.7194  task3.loss_heatmap: 1.4966  task3.loss_bbox: 0.5754  task4.loss_heatmap: 1.9814  task4.loss_bbox: 0.6894  task5.loss_heatmap: 1.8084  task5.loss_bbox: 0.7016
...
...
2024/11/22 01:50:03 - mmengine - INFO - Epoch(train)  [5][ 100/3517]  base_lr: 1.2583e-04 lr: 1.0358e-04  eta: 16:36:18  time: 1.0527  data_time: 0.0568  memory: 32069  grad_norm: 52.7803  loss: 15.1927  task0.loss_heatmap: 1.4274  task0.loss_bbox: 0.6234  task1.loss_heatmap: 2.1254  task1.loss_bbox: 0.6715  task2.loss_heatmap: 2.0836  task2.loss_bbox: 0.7248  task3.loss_heatmap: 1.9199  task3.loss_bbox: 0.6361  task4.loss_heatmap: 1.8900  task4.loss_bbox: 0.6479  task5.loss_heatmap: 1.7534  task5.loss_bbox: 0.6891
```
It looks like `lr_mult` only works at the beginning to set the learning rate. How can I make `lr_mult` work throughout the training process?

### Additional information

I think that after adding `lr_mult`, the learning rate of the image backbone during the entire training process should be 0.1 times the basic learning rate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Strange behavior of the lr_mult #1612

Prerequisite

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Strange behavior of the lr_mult #1612

Description

Prerequisite

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions