Skip to content
This repository has been archived by the owner on Mar 4, 2023. It is now read-only.

Reproduce SM on tinyperson dataset #16

Open
Hshuqin opened this issue Nov 5, 2021 · 15 comments
Open

Reproduce SM on tinyperson dataset #16

Hshuqin opened this issue Nov 5, 2021 · 15 comments

Comments

@Hshuqin
Copy link

Hshuqin commented Nov 5, 2021

Hello, I tried to run a few algorithms under that repository, but the performance of retinanet and fcos is better than sm(iou_thrs=[0.25, 0.5, 0.75]), I don't know if there is a problem with my configuration file, can you help me look at it? Or which parameters are not modified correctly?
configs file:
TOV_mmdetection-main/configs2/TinyPerson/scale_match/retinanet_r50_fpns4_1x_coco_sm_tinyperson.py

A few important changes are as follows.
Data:
data = dict(
samples_per_gpu=4, # 2
workers_per_gpu=1,
train=dict(
type=dataset_type,
# ann_file=data_root + 'erase_with_uncertain_dataset/annotations/corner/task/tiny_set_train_sw640_sh512_all.json',
ann_file=data_root + 'mini_annotations/tiny_set_train_sw640_sh512_all_erase.json', # same as last line
img_prefix=data_root + 'erase_with_uncertain_dataset/train/',
pipeline=train_pipeline,
# train_ignore_as_bg=False,
),
val=dict(
type=dataset_type,
# ann_file=data_root + 'annotations/corner/task/tiny_set_test_sw640_sh512_all.json',
ann_file=data_root + 'mini_annotations/tiny_set_test_all.json',

    img_prefix=data_root + 'test/',
    pipeline=test_pipeline),
test=dict(
    type=dataset_type,
    # ann_file=data_root + 'annotations/corner/task/tiny_set_test_sw640_sh512_all.json',
    ann_file=data_root + 'mini_annotations/tiny_set_test_all.json',
    
    img_prefix=data_root + 'test/',
    pipeline=test_pipeline)
    )

Evaluation:
evaluation = dict(interval=1, metric='bbox',
iou_thrs=[0.25, 0.5, 0.75], # set None mean use 0.5:1.0::0.05
proposal_nums=[200],
cocofmt_kwargs=dict(
ignore_uncertain=True,
use_ignore_attr=True,
use_iod_for_ignore=True,
iod_th_of_iou_f="lambda iou: iou", #"lambda iou: (2*iou)/(1+iou)",
cocofmt_param=dict(
evaluate_standard='tiny', # or 'coco'
# iouThrs=[0.25, 0.5, 0.75], # set this same as set evaluation.iou_thrs
# maxDets=[200], # set this same as set evaluation.proposal_nums
)
))

In the test pipeline, the img_scale was modified:
# img_scale=(333, 200),
img_scale=(640, 512),

In the train pipline, anno_file was modified:
anno_file="/home/xxxxx/data/tiny_set/mini_annotations/tiny_set_train_all_erase.json",

Other configurations follow the original settings.
Looking forward to your suggestions

@Vivek-23-Titan
Copy link

Hi Hshuqin, I am also doing some experiments with different config files. I would like to know what was the maximum mAP @Tiny50 that you were able to achieve?

@Hshuqin
Copy link
Author

Hshuqin commented Nov 9, 2021 via email

@yinglang
Copy link
Contributor

yinglang commented Nov 9, 2021

Can you provide the following information?

  • the performance of AP@tiny in iou=0.5 that is the main result used to compare. (with and without SM)
  • do you re-run retinanet with the checkpoint trained from retinanet_r50_fpns4_1x_coco_sm_tinyperson.py as configs2/TinyPerson/scale_match/ScaleMatch_TinyPerson.sh suggested.

@Vivek-23-Titan
Copy link

Sorry, maybe I wasn't precise enough. I meant the following mAP:

Average Precision (AP) @[ IoU=0.50 | area= tiny | maxDets=1000 ]

@Hshuqin
Copy link
Author

Hshuqin commented Nov 9, 2021 via email

@yinglang
Copy link
Contributor

yinglang commented Nov 9, 2021

  1. OK, thanks, there should be some problems for SM. But the basic retinanet should be right. We also find the number of GPU and batch size may Bring some performance shaking for retinanet.

  2. Can you provide how many GPUs are you used?

  3. SM have two step:

  • train SM COCO: training COCO to prepare the pretrained SM.
  • train on TinyPerson: load SM COCO pretrained weight to train on TinyPerson.

And how many GPUS used during training SM COCO? Can you given the performance of COCO val which should be printed while you train SM COCO.

@Vivek-23-Titan
Copy link

Okay cool! Thanks @Hshuqin

Also, @yinglang till now, I can only replicate the results for Faster RCNN-FPN (exp 2.1) but not for other configurations.

As given in the detector results, the Faster RCNN-FPN SM achieves 50.85 (exp 4.0) and the Adap Retinanet-c (exp 5.1) gets 51.78 mAP_{50}^{tiny}.

So what is the correct way to follow the experiments to replicate the above mAP_50^{tiny} results?

@yinglang
Copy link
Contributor

yinglang commented Nov 9, 2021

@Vivek-23-Titan Do you run with the same setting as corresponding *.sh file giving? Can you provide the performance of all experiemnt you have run with expx.x tag? So we can give a detail annalysis. Thanks very much.

@Vivek-23-Titan
Copy link

@yinglang just to make sure if I want to run exp4.0 then I can directly do the 2nd step i.e., train on TinyPerson with the Pretrain COCO under directory FPN_SM_tinyperson_b4 (instead of lastest.pth from the 1st step as you mentioned above)?

@yinglang
Copy link
Contributor

yinglang commented Nov 9, 2021

The both of two steps should be run.

Does the FPN_SM_tinyperson_b4 is the pretrained weight come from the old TinyBenchmark version.
If it is. Maybe it is the key point of the problem that can not reproduce.
I have not try training with these weights.

Maybe I need to upload the new pretrained weight for this mmdetection version if you need.

@Vivek-23-Titan
Copy link

Vivek-23-Titan commented Nov 9, 2021

Yes, the FPN_SM_tinyperson_b4 is from the old TinyBenchmark version. It would be really helpful if you would provide the new pre-trained weights for exp4.0 and exp5.1.

I would like to know what is the total time required for these scaled pre-trained COCO (assuming 2017) weights with 2 GPUs as given in the exp (or more if you have tried that).

Also, how to accurately replicate the results if I am using suppose 4 or 8 GPUs instead of 2 GPUs as given in the exps (like scaling the lr in proportion to the number of GPUs)?

@Hshuqin
Copy link
Author

Hshuqin commented Nov 9, 2021 via email

@yinglang
Copy link
Contributor

yinglang commented Nov 9, 2021

The link of the weights have upload as here said. But I'am not very sure that them are correct completely due to these experiments were runung long time ago. And for now, I have not enough time to re-run these experiments. So if there is any problem. Just let me konw, I will try my best to fix them.

For the setting about using different number of GPUS, normally, it is same as the paper "Bag of Tricks for Image Classification with Convolutional Neural Networks" said.
Linear scaling learning rate: learning_rate / (samples_per_gpu x num_gpus) should be fixed while compare.

Unfortunately, for RetinaNet, even Linear scaling learning rate was applied, the performance still shaking in out experiments. We don't know why that happend.

Yes, the FPN_SM_tinyperson_b4 is from the old TinyBenchmark version. It would be really helpful if you would provide the new pre-trained weights for exp4.0 and exp5.1.

I would like to know what is the total time required for these scaled pre-trained COCO (assuming 2017) weights with 2 GPUs as given in the exp (or more if you have tried that).

Also, how to accurately replicate the results if I am using suppose 4 or 8 GPUs instead of 2 GPUs as given in the exps (like scaling the lr in proportion to the number of GPUs)?

@Hshuqin
Copy link
Author

Hshuqin commented Nov 10, 2021 via email

@Vivek-23-Titan
Copy link

Thanks a lot for the info @yinglang! The new weight for faster_rcnn_r50_fpn_1x_coco_sm_tinyperson_lr0.01_8b2g_latest.pth worked like a charm!

However, I had tried running exp 5.1 with old weights and it seemed to run fine but with the new weight retinanet_r50_fpns4_1x_coco_sm_tinyperson_lr0.01_4b2g_latest.pth , it throws an error stating:
issue

And when I searched for this issue this was the response:
In my case, this error was caused by a corrupted saved file. So I switch to older checkpoints and the problem is gone.

(Reference: pytorch/pytorch#31620 (comment))

Can you please check if there is some issue with the model weights or perhaps loading/saving it?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants