Worker that extracts audio from video (for further processing in other workers).
There are 2 ways in which the worker can be run:
- Check if Docker is installed
- Make sure you have the
.env.override
file in your local repo folder - Open your preferred terminal and navigate to the local repository folder
- To build the image, execute the following command:
docker build . -t audio-extraction-worker
- To run the worker, execute the following command:
docker compose up
All commands should be run within WSL if on Windows or within your terminal if on Linux.
- Follow the steps here (under "Adding
pyproject.toml
and generating apoetry.lock
based on it") to install Poetry and the dependencies required to run the worker - Make sure you have the
.env.override
file in your local repo folder - Install
ffmpeg
. You can run this command, for example:
apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
- Navigate inside the
scripts
folder then execute the following command:
./run.sh
The expected run of this worker should download the input video file if it isn't downloaded already in /data/input/
, run ffmpeg with the arguments specified in .env.override
, and output an audio file in /data/output/
. You can also configure the transfer of the output to an S3 bucket.
If you want to test the input file download, we recommend deleting the /data/input/
folder (NOT the /data
folder).
The variables unique to this worker affect the output and are the following:
-
AE_SAMPLERATE_HZ
: The sampling rate of the resulting audio file. Default value is0
which means the sampling rate of the input video file will be used -
AE_FILE_EXTENSION
: The file extension of the output audio file. Default iswav
-
AE_CONVERT_TO_MONO
: Whether the audio output should be converted to mono format. Defaults ton
(no/False)
They can all be modified through the .env.override
file and the full list of variables can be found in .env
.
You can find an example video input file in /data/input/
and the resutlting audio output file in /data/output/
.
The pipeline is as follows:
./run.sh
/docker compose up
-> main.py
main.py
checks if the configuration is correct and, if so, runs the pipeline
main.py
-> run_pipeline.py
run_pipeline.py
makes sure each step of the pipeline is executed successfully:
- Downloading the input file if it's not present ->
download.py
- Running the audio extraction of the input ->
transcode.py
- Transferring the output to S3 if configured