Archive some interesting papers
-
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
-
Learning Transferable Visual Models From Natural Language Supervision CLIP: contrastive training, zero-shot classifer
-
ICLR 2021: AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE: transformer for image
-
ICLR 2021: PMI-Masking: Principled masking of correlated spans : a new type of masking stragety based the n-gram corpus statstics. Works better than Ranodm mask and whole word mask on SQUAD2.0 and GLUE data