Image Classification on the ImageNet-64 Dataset-Variant.
Imagenet-64 is a down-sampled variant of ImageNet. It comprises 1,281,167 training data and 50,000 test data with 1000 labels.
image source: https://patrykchrabaszcz.github.io/Imagenet32/
We want to follow a downsampled variant of imagenet as an alternative to the cifar datasets.
They use Wide Residual Networks WRN-N-k by Zagoruyko and Komodakis (2016), the original model and code is given in lua
Since the implementation of the Imagenet-64 authors uses the lasagne library it is not really compatible with our framework. Therefore we use the closest implementation from torchvision.
The best performing Configuration in the paper was N=36, k=5. The closest available implementation from torchvision was the wide_resnet_50_2
torchvision model, with N=50, k=2.
We also tried to use a vision transformer model, namely DaViT. However, this did not result in a satisfying performance.
We compare the top 1 accuracy. Our model achieved a top 1 accuracy of around 69
% (no exact result, experiments are still running). The search grid used to find the (currently) best hyperparameters can be found here.
The report a top 1 accuracy of 67.66
% using a WRN (N=36, k=5). We tried to replicate their performance by using the same hyperparameters, which can be found here