Task

Image Classification on the ImageNet-64 Dataset-Variant.

Dataset

Imagenet-64 is a down-sampled variant of ImageNet. It comprises 1,281,167 training data and 50,000 test data with 1000 labels.

image source: https://patrykchrabaszcz.github.io/Imagenet32/

Model

We want to follow a downsampled variant of imagenet as an alternative to the cifar datasets.

They use Wide Residual Networks WRN-N-k by Zagoruyko and Komodakis (2016), the original model and code is given in lua

Since the implementation of the Imagenet-64 authors uses the lasagne library it is not really compatible with our framework. Therefore we use the closest implementation from torchvision.

Wide ResNet

The best performing Configuration in the paper was N=36, k=5. The closest available implementation from torchvision was the wide_resnet_50_2 torchvision model, with N=50, k=2.

DaViT

We also tried to use a vision transformer model, namely DaViT. However, this did not result in a satisfying performance.

Performance

We compare the top 1 accuracy. Our model achieved a top 1 accuracy of around 69% (no exact result, experiments are still running). The search grid used to find the (currently) best hyperparameters can be found here.

Performance comparison

The report a top 1 accuracy of 67.66% using a WRN (N=36, k=5). We tried to replicate their performance by using the same hyperparameters, which can be found here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!