Here, you'll find instructions on how to prepare the environment, train, and run inference.
Training:
- Disk space: ~1Tb
- RAM: 200Gb
- GPUs: 1 node, 8x H100 80Gb HBM3
Inference:
- Disk space: ~50Gb
- RAM: 80Gb
- GPUs: 1 node, 1 V100 (16Gb VRAM)
You can probably train using a single GPU such as an A100 SXM4 40 GB, but the training time will be significantly longer and parameters such as batch size, learning rate, and devices may have to be adjusted.
You may need a bit more RAM for inference depending on segment size.
- Image size: 13.4 Gb
- Docker version
24.0.7
You can build the image by running:
docker build -t scroll-ink-det-gpu -f docker/scroll-ink-detection-gpu/Dockerfile .
You can run the following to create a new container with an interactive shell:
docker run -it --rm --gpus all -v /$(pwd)/data:/workspace/data scroll-ink-det-gpu
Feel free to change $(pwd)
to wherever you want to the data downloaded on the host.
Note: Depending on your system, you may need to increase the shared memory size for this container for training.
We had to use --shm-size=10g
, making the full docker run command:
docker run -it --rm --gpus all -v /$(pwd)/data:/workspace/data --shm-size=10g scroll-ink-det-gpu
- Preparing the labels
Unzip the labels.zip
file and place them in data/labels
(the mounted volume).
- Downloading the surface volumes
You can download the data after you have registered for the
Vesuvius Challenge (see data agreement here).
This will give you the username and password that you'll need for the next step. You must set the environment variables
USER
and PASS
. You can create an .env
file (see .env.example
for an example; run cp .env.example .env
to
create this file) and fill it out. You'd then have to set the environment variables, which can be done as such in a Linux shell:
export $(grep -v '^#' .env | tr -d '\r' | xargs)
From within the container's new shell session, you should download the surface volumes associated with the labels:
export LABEL_DIR=data/labels
./scripts/download-scroll-surface-vols-by-segment.sh 1 $(python3 tools/print_segment_ids_from_label_dir.py $LABEL_DIR)
- Generate sub-segments. Now we're ready to create sub-segments (cropped versions of the original segments). You can create them by running:
python3 vesuvius_challenge_rnd/scroll_ink_detection/evaluation/tools/create_all_subsegments.py --load_in_memory
You should now be able to find these sub-segments in the data/scrolls
directory with _C<i>
postfixes, where i
indicates the i
th column.
To train a model, you can execute the following:
python3 -m vesuvius_challenge_rnd.scroll_ink_detection fit --config path/to/config.yaml
You'll have to pass a config.yaml
file for each run.
To check that everything is working, you can run the following fast development run config:
python3 -m vesuvius_challenge_rnd.scroll_ink_detection fit --config vesuvius_challenge_rnd/scroll_ink_detection/experiment_runner/configs/unet3d_segformer/fast_dev_run_docker.yaml
To train on the full dataset, you can train the following models (one run per validation segment).
It will prompt you to sign in to Weights & Biases. Feel free to skip this. You can find all the submission
run configs under vesuvius_challenge_rnd/scroll_ink_detection/experiment_runner/configs/unet3d_segformer/submission
.
Run 1 (*3336_C3 as validation):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection fit --config vesuvius_challenge_rnd/scroll_ink_detection/experiment_runner/configs/unet3d_segformer/submission/val_3336_C3.yaml
Run 2 (*5753 as validation):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection fit --config vesuvius_challenge_rnd/scroll_ink_detection/experiment_runner/configs/unet3d_segformer/submission/val_5753.yaml
Run 3 (*4422_C2 as validation):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection fit --config vesuvius_challenge_rnd/scroll_ink_detection/experiment_runner/configs/unet3d_segformer/submission/val_4422_C2.yaml
Run 4 (*4422_C3 as validation):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection fit --config vesuvius_challenge_rnd/scroll_ink_detection/experiment_runner/configs/unet3d_segformer/submission/val_4422_C3.yaml
Run 5 (*1321 as validation):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection fit --config vesuvius_challenge_rnd/scroll_ink_detection/experiment_runner/configs/unet3d_segformer/submission/val_1321_no0901c3_no5351.yaml
You can find the saved model checkpoints under /workspace/lightning_logs/version_<v_num>/checkpoints/
. There should only be
one checkpoint per training run. Note that v_num
is an automatically generated experiment version ID and can be seen
in the progress bar. For each run, note the associated v_num
so that it can be used later on for inference.
To reproduce the run using *3336_C3 with a smaller dataset (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection fit --config vesuvius_challenge_rnd/scroll_ink_detection/experiment_runner/configs/unet3d_segformer/submission/val_3336_C3_reduced.yaml
With the model checkpoints, you can generate predictions by running:
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference path/to/checkpoint.ckpt <segment id> --infer_z_reversal
By default, it will write the output images to /workspace/outputs
, but you can change that using -o path/to/new/output/dir
.
Assuming you ran the experiments in the order above and have the checkpoint paths you can run the following to reproduce the predictions:
20231005123336_C3 predictions (run 1):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 1 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231005123336_C3 --infer_z_reversal -o outputs/20231005123336_C3
20230519215753 predictions (run 2):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 2 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20230519215753 --infer_z_reversal -o outputs/20230519215753
20231012184422_superseded_C2 predictions (run 3):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 3 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231012184422_superseded_C2 --infer_z_reversal -o outputs/20231012184422_superseded_C2
20231012184422_superseded_C3 predictions (run 4):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 4 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231012184422_superseded_C3 --infer_z_reversal -o outputs/20231012184422_superseded_C3
20231210121321 predictions (run 5):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 5 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231210121321 --infer_z_reversal -o outputs/20231210121321
20231022170901_C3 predictions (run 5):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 5 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231022170901_C3 --infer_z_reversal -o outputs/20231022170901_C3
20231106155351 predictions (run 5):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 5 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231106155351 --infer_z_reversal -o outputs/20231106155351
These will all be saved in outputs/<segment name>
.
To run predictions from run 6 (*3336 as validation with reduced dataset size), you'll need to first ensure any segment that was not included in training or validation is downloaded as such:
./scripts/download-scroll-surface-vols-by-segment.sh 1 20231221180251 20231007101619
You might need to generate sub-segments for *1619:
python3 vesuvius_challenge_rnd/scroll_ink_detection/evaluation/tools/create_all_subsegments.py --load_in_memory --skip_if_exists
Now, you are ready to run inference on the additional segments (saved to the same output directory).
20231221180251 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231221180251 --infer_z_reversal -o outputs/20231221180251
20231031143852 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231031143852 --infer_z_reversal -o outputs/20231031143852
20231005123336_C2 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231005123336_C2 --infer_z_reversal -o outputs/20231005123336_C2
20231022170901_C2 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231022170901_C2 --infer_z_reversal -o outputs/20231022170901_C2
20231022170901_C3 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231022170901_C3 --infer_z_reversal -o outputs/20231022170901_C3
20231022170901_C4 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231022170901_C4 --infer_z_reversal -o outputs/20231022170901_C4
20230929220926_C2 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20230929220926_C2 --infer_z_reversal -o outputs/20230929220926_C2
20230929220926_C3 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20230929220926_C3 --infer_z_reversal -o outputs/20230929220926_C3
20231007101619 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231007101619 --infer_z_reversal -o outputs/20231007101619
You can also run *1619 predictions as sub-segments if needed: 20231007101619_C1 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231007101619_C1 --infer_z_reversal -o outputs/20231007101619_C1
20231007101619_C2 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231007101619_C2 --infer_z_reversal -o outputs/20231007101619_C2
20231007101619_C3 predictions (run 6):
python3 -m vesuvius_challenge_rnd.scroll_ink_detection.evaluation.inference /workspace/lightning_logs/<Add v_num from run 6 here>/checkpoints/<ADD CHECKPOINT NAME HERE>.ckpt 20231007101619_C3 --infer_z_reversal -o outputs/20231007101619_C3