For discriminative loss, is the true NCE batch size the number of masked patches? #26

hillup · 2023-10-18T11:29:30Z

In this piece of code, it seems that the loss is calculated at the granularity of samples.

hillup · 2023-10-18T11:31:26Z

So even if you increase the number of gpus, contrastive learning will not see more negative examples.

YuanGongND · 2023-12-16T22:00:01Z

For discriminative loss, is the true NCE batch size the number of masked patches?

In line 347 in your screenshot, NCE is accumulated to all batch of samples, but the negative samples are all from the same spectrogram. I.e., say B=12 (you have 12 spectrograms in a batch), each spectrogram has 512 patches and you mask 400 of them. Then the negative samples is always 400-1=399, but NCE won't update until it goes through all 12 spectrograms.

So even if you increase the number of gpus, contrastive learning will not see more negative examples.

The negative samples will always be #masked_patches-1.

YuanGongND added the question Further information is requested label Dec 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For discriminative loss, is the true NCE batch size the number of masked patches? #26

For discriminative loss, is the true NCE batch size the number of masked patches? #26

hillup commented Oct 18, 2023

hillup commented Oct 18, 2023

YuanGongND commented Dec 16, 2023

For discriminative loss, is the true NCE batch size the number of masked patches? #26

For discriminative loss, is the true NCE batch size the number of masked patches? #26

Comments

hillup commented Oct 18, 2023

hillup commented Oct 18, 2023

YuanGongND commented Dec 16, 2023