Currently the run_experiments.ipynb notebook runs one of the baseline configurations, which takes something like 1.5 hours to complete on an A100.
We should be able to do a token training run that completes in under 10 minutes on a T4, and point to that by default instead.
Maybe call it /configs/quick_run.json or something like that.
So we just need to find some settings that achieve that. Running for fewer steps is probably the main thing, and then just making sure that it fits in the T4's memory (probably reduce pre-training batch size).