diff --git a/README.md b/README.md
index 929ebced..51c9caf2 100644
--- a/README.md
+++ b/README.md
@@ -59,7 +59,7 @@ The V-JEPA feature predictions are indeed grounded, and exhibit spatio-temporal
3072 |
VideoMix2M |
checkpoint |
- configs |
+ configs |
| ViT-H |
@@ -69,7 +69,7 @@ The V-JEPA feature predictions are indeed grounded, and exhibit spatio-temporal
3072 |
VideoMix2M |
checkpoint |
- configs |
+ configs |
| ViT-H |
@@ -79,7 +79,7 @@ The V-JEPA feature predictions are indeed grounded, and exhibit spatio-temporal
2400 |
VideoMix2M |
checkpoint |
- configs |
+ configs |
@@ -97,21 +97,21 @@ The V-JEPA feature predictions are indeed grounded, and exhibit spatio-temporal
224x224 |
80.8 |
attentive probe checkpoint |
- configs |
+ configs |
| ViT-H/16 |
224x224 |
82.0 |
attentive probe checkpoint |
- configs |
+ configs |
| ViT-H/16 |
384x384 |
81.9 |
attentive probe checkpoint |
- configs |
+ configs |
@@ -129,21 +129,21 @@ The V-JEPA feature predictions are indeed grounded, and exhibit spatio-temporal
224x224 |
69.5 |
attentive probe checkpoint |
- configs |
+ configs |
| ViT-H/16 |
224x224 |
71.4 |
attentive probe checkpoint |
- configs |
+ configs |
| ViT-H/16 |
384x384 |
72.2 |
attentive probe checkpoint |
- configs |
+ configs |
@@ -161,21 +161,21 @@ The V-JEPA feature predictions are indeed grounded, and exhibit spatio-temporal
224x224 |
74.8 |
attentive probe checkpoint |
- configs |
+ configs |
| ViT-H/16 |
224x224 |
75.9 |
attentive probe checkpoint |
- configs |
+ configs |
| ViT-H/16 |
384x384 |
77.4 |
attentive probe checkpoint |
- configs |
+ configs |
@@ -193,21 +193,21 @@ The V-JEPA feature predictions are indeed grounded, and exhibit spatio-temporal
224x224 |
60.3 |
attentive probe checkpoint |
- configs |
+ configs |
| ViT-H/16 |
224x224 |
61.7 |
attentive probe checkpoint |
- configs |
+ configs |
| ViT-H/16 |
384x384 |
62.8 |
attentive probe checkpoint |
- configs |
+ configs |
@@ -225,21 +225,21 @@ The V-JEPA feature predictions are indeed grounded, and exhibit spatio-temporal
224x224 |
67.8 |
attentive probe checkpoint |
- configs |
+ configs |
| ViT-H/16 |
224x224 |
67.9 |
attentive probe checkpoint |
- configs |
+ configs |
| ViT-H/16 |
384x384 |
72.6 |
attentive probe checkpoint |
- configs |
+ configs |
@@ -330,7 +330,7 @@ For example, suppose we have a directory called ``my_image_datasets``. We would
### Local training
If you wish to debug your code or setup before launching a distributed training run, we provide the functionality to do so by running the pretraining script locally on a multi-GPU (or single-GPU) machine, however, reproducing our results requires launching distributed training.
-The single-machine implementation starts from the [app/main.py](appmain.py), which parses the experiment config file and runs the pretraining locally on a multi-GPU (or single-GPU) machine.
+The single-machine implementation starts from the [app/main.py](app/main.py), which parses the experiment config file and runs the pretraining locally on a multi-GPU (or single-GPU) machine.
For example, to run V-JEPA pretraining on GPUs "0", "1", and "2" on a local machine using the config [configs/pretrain/vitl16.yaml](configs/pretrain/vitl16.yaml), type the command:
```bash
python -m app.main \
@@ -353,31 +353,31 @@ python -m app.main_distributed \
### Local training
If you wish to debug your eval code or setup before launching a distributed training run, we provide the functionality to do so by running the pretraining script locally on a multi-GPU (or single-GPU) machine, however, reproducing the full eval would require launching distributed training.
-The single-machine implementation starts from the [eval/main.py](eval/main.py), which parses the experiment config file and runs the eval locally on a multi-GPU (or single-GPU) machine.
+The single-machine implementation starts from the [evals/main.py](evals/main.py), which parses the experiment config file and runs the eval locally on a multi-GPU (or single-GPU) machine.
-For example, to run ImageNet image classification on GPUs "0", "1", and "2" on a local machine using the config [configs/eval/vitl16_in1k.yaml](configs/eval/vitl16_in1k.yaml), type the command:
+For example, to run ImageNet image classification on GPUs "0", "1", and "2" on a local machine using the config [configs/evals/vitl16_in1k.yaml](configs/evals/vitl16_in1k.yaml), type the command:
```bash
python -m evals.main \
- --fname configs/eval/vitl16_in1k.yaml \
+ --fname configs/evals/vitl16_in1k.yaml \
--devices cuda:0 cuda:1 cuda:2
```
### Distributed training
-To launch a distributed evaluation run, the implementation starts from [eval/main_distributed.py](eval/main_distributed.py), which, in addition to parsing the config file, also allows for specifying details about distributed training. For distributed training, we use the popular open-source [submitit](https://github.com/facebookincubator/submitit) tool and provide examples for a SLURM cluster.
+To launch a distributed evaluation run, the implementation starts from [evals/main_distributed.py](evals/main_distributed.py), which, in addition to parsing the config file, also allows for specifying details about distributed training. For distributed training, we use the popular open-source [submitit](https://github.com/facebookincubator/submitit) tool and provide examples for a SLURM cluster.
-For example, to launch a distributed ImageNet image classification experiment using the config [configs/eval/vitl16_in1k.yaml](configs/eval/vitl16_in1k.yaml), type the command:
+For example, to launch a distributed ImageNet image classification experiment using the config [configs/evals/vitl16_in1k.yaml](configs/evals/vitl16_in1k.yaml), type the command:
```bash
python -m evals.main_distributed \
- --fname configs/eval/vitl16_in1k.yaml \
+ --fname configs/evals/vitl16_in1k.yaml \
--folder $path_to_save_stderr_and_stdout \
--partition $slurm_partition
```
-Similarly, to launch a distributed K400 video classification experiment using the config [configs/eval/vitl16_k400.yaml](configs/eval/vitl16_k400.yaml), type the command:
+Similarly, to launch a distributed K400 video classification experiment using the config [configs/evals/vitl16_k400.yaml](configs/evals/vitl16_k400_16x8x3.yaml), type the command:
```bash
python -m evals.main_distributed \
- --fname configs/eval/vitl16_k400.yaml \
+ --fname configs/eval/vitl16_k400_16x8x3.yaml \
--folder $path_to_save_stderr_and_stdout \
--partition $slurm_partition
```
diff --git a/src/datasets/utils/video/randaugment.py b/src/datasets/utils/video/randaugment.py
index 4c80a990..8d1d6789 100644
--- a/src/datasets/utils/video/randaugment.py
+++ b/src/datasets/utils/video/randaugment.py
@@ -7,8 +7,8 @@
"""
This implementation is based on
-https://github.com/rwightman/pytorch-image-models/blob/master/timm/data/auto_augment.py
-pulished under an Apache License 2.0.
+https://github.com/huggingface/pytorch-image-models/blob/main/timm/data/auto_augment.py
+published under an Apache License 2.0.
"""
import math
diff --git a/src/datasets/utils/video/randerase.py b/src/datasets/utils/video/randerase.py
index d1f185c8..b073588c 100644
--- a/src/datasets/utils/video/randerase.py
+++ b/src/datasets/utils/video/randerase.py
@@ -7,8 +7,8 @@
"""
This implementation is based on
-https://github.com/rwightman/pytorch-image-models/blob/master/timm/data/random_erasing.py
-pulished under an Apache License 2.0.
+https://github.com/huggingface/pytorch-image-models/blob/main/timm/data/auto_augment.py
+published under an Apache License 2.0.
"""
import math
import random