Question about generate_random_masks function in MaskViT #11

Bo396543018 · 2024-07-04T15:51:03Z

The paper mentions that Backbone uses the same ViT-base as UniHCP, but when looking at the code, generate_random_masks is used in vit to shuffle. I would like to ask what the purpose of this is and if there are any more literature on this approach.

Cohesion97 · 2024-12-04T18:19:58Z

During the project, we attempted to unify both supervised training and mask image modeling. However, we found that introducing the strategy of mask image modeling would decrease the performance, so we just removed this strategy. You can find that all the tokens are appended with positional embedding, then shuffled before the encoder, and will be reshuffled after the encoder. This is almost the same as not using shuffling because the positional information is restored in the positional embedding instead of their relative index in the self-attn module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about generate_random_masks function in MaskViT #11

Question about generate_random_masks function in MaskViT #11

Bo396543018 commented Jul 4, 2024

Cohesion97 commented Dec 4, 2024

Question about generate_random_masks function in MaskViT #11

Question about generate_random_masks function in MaskViT #11

Comments

Bo396543018 commented Jul 4, 2024

Cohesion97 commented Dec 4, 2024