Question about the hyperparamter used to train the model

Hi, first of all, congratulations on your work accepted to ICML!

I have a few questions about the code details to reproduce your reported results.

First, I have found some mismatches between the code and the paper.

1. It is written that \mathcal{L}_{lowlevel} is used with \alpha_2 of 0.016 in your paper. However, I have found that `--no_blurry_recon` argument is used in the accel.slurm file and the value of `blur_scale` is 0.5 in the file.
2. It is written that \alpha_1 value is 0.033 but the value of `clip_scale` is 1 in the accel.slurm file.
3. There is no scalar for \mathcal{L}_{prior} but the value of `prior_scale` is 30 in the accel.slurm file.
4. There is no scalar for \mathcal{L}_{softclip}(VIC, \hat VIC) but I have found that the loss is multiplied by 0.1
```
            if blurry_recon:     
                image_enc_pred, transformer_feats = blurry_image_enc_

                image_enc = autoenc.encode(2*image-1).latent_dist.mode() * 0.18215
                loss_blurry = l1(image_enc_pred, image_enc)
                loss_blurry_total += loss_blurry.item()

                if epoch < int(mixup_pct * num_epochs):
                    image_enc_shuf = image_enc[perm]
                    betas_shape = [-1] + [1]*(len(image_enc.shape)-1)
                    image_enc[select] = image_enc[select] * betas[select].reshape(*betas_shape) + \
                        image_enc_shuf[select] * (1 - betas[select]).reshape(*betas_shape)

                image_norm = (image - mean)/std
                image_aug = (blur_augs(image) - mean)/std
                _, cnx_embeds = cnx(image_norm)
                _, cnx_aug_embeds = cnx(image_aug)

                cont_loss = utils.soft_cont_loss(
                    nn.functional.normalize(transformer_feats.reshape(-1, transformer_feats.shape[-1]), dim=-1),
                    nn.functional.normalize(cnx_embeds.reshape(-1, cnx_embeds.shape[-1]), dim=-1),
                    nn.functional.normalize(cnx_aug_embeds.reshape(-1, cnx_embeds.shape[-1]), dim=-1),
                    temp=0.2)
                loss_blurry_cont_total += cont_loss.item()

                loss += (loss_blurry + 0.1*cont_loss) * blur_scale #/.18215
```

So, I wonder which one should I have to use to reproduce your results.

Secondly, I wonder if you adjusted learning rate or epochs to compensate the reduced iterations when using multiple gpus.
For example, it says that the batch_size is equal to 24 for single subject training in the accel.slurm file, but is it the same for the 4gpus? (in other words, is the global batch size equal to 24*4=96 for 4gpus?)
In addition, the value of `max_lr` is 3e-4, but does it multiplied by 4 for the 4gpus?

Thank you in advance.
Juhyeon Park


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the hyperparamter used to train the model #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the hyperparamter used to train the model #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions