Open
Description
The first image is your Table 1, and the second is DINOv2's Table 4.


The difference between 82.0% as reported in your paper versus 83.5% in DINOv2's is quite large. It's also apparent that DINOv2-L is a patch-size-14 model, not a 16. Are you accounting for this difference by resizing DINO's patch projector? Or just letting the different models have a different number of patches? Or are you feeding them different image sizes?
Metadata
Metadata
Assignees
Labels
No labels