From 4c88ee14f9131df7aa008b6526f3593c3e14ce95 Mon Sep 17 00:00:00 2001
From: Dan Saattrup Nielsen <dan.nielsen@alexandra.dk>
Date: Thu, 24 Oct 2024 11:45:53 +0200
Subject: [PATCH] docs: Add ASR finetuning section to readme

---
 README.md | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 90916f4e..3c94907f 100644
--- a/README.md
+++ b/README.md
@@ -16,13 +16,68 @@ Developers:
 - Dan Saattrup Nielsen (dan.nielsen@alexandra.dk)
 
 
-## Quickstart
+## Installation
 
 1. Run `make install`, which installs Poetry (if it isn't already installed), sets up a
    virtual environment and all Python dependencies therein.
 2. Run `source .venv/bin/activate` to activate the virtual environment.
 3. Run `make` to see a list of available commands.
 
+
+## Usage
+
+### Finetuning an Acoustic Model for Automatic Speech Recognition (ASR)
+
+You can use the `finetune_asr_model` script to finetune your own ASR model:
+
+```bash
+python src/scripts/finetune_asr_model.py [key=value]...
+```
+
+Here are some of the more important available keys:
+
+- `model`: The base model to finetune. Supports the following values:
+  - `wav2vec2-small`
+  - `wav2vec2-medium`
+  - `wav2vec2-large`
+  - `whisper-xxsmall`
+  - `whisper-xsmall`
+  - `whisper-small`
+  - `whisper-medium`
+  - `whisper-large`
+  - `whisper-large-turbo`
+- `datasets`: The datasets to finetune the models on. Can be a single dataset or an
+  array of datasets (written like [dataset1,dataset2,...]). Supports the following
+  values:
+  - `coral`
+  - `common_voice_17`
+  - `common_voice_9`
+  - `fleurs`
+  - `ftspeech`
+  - `nota`
+  - `nst`
+- `dataset_probabilities`: In case you are finetuning on several datasets, you need to
+  specify the probability of sampling each one. This is an array of probabilities that
+  need to sum to 1. If not set, the datasets are sampled uniformly.
+- `model_id`: The model ID of the finetuned model. Defaults to the model type along with
+  a timestamp.
+- `push_to_hub`, `hub_organisation` and `private`: Whether to push the finetuned model
+  to the Hugging Face Hub, and if so, which organisation to push it to. If `private` is
+  set to `True`, the model will be private. The default is not to push the model to the
+  Hub.
+- `wandb`: Whether Weights and Biases should be used for monitoring during training.
+  Defaults to false.
+- `per_device_batch_size` and `dataloader_num_workers`: The batch size and number of
+  workers to use for training. Defaults to 8 and 4, respectively. Tweak these if you are
+  running out of GPU memory.
+- `learning_rate`, `total_batch_size`, `max_steps`, `warmup_steps`: Training parameters
+  that you can tweak, although it shouldn't really be needed.
+
+See all the finetuning options in the `config/asr_finetuning.yaml` file.
+
+
+## Troubleshooting
+
 If you're on MacOS and get an error saying something along the lines of "fatal error:
 'lzma.h' file not found" then try the following and rerun `make install` afterwards: