audio-extraction-worker

Worker that extracts audio from video (for further processing in other workers).

There are 2 ways in which the worker can be run:

1. Docker run (recommended)

Check if Docker is installed
Make sure you have the .env.override file in your local repo folder
Open your preferred terminal and navigate to the local repository folder
To build the image, execute the following command:

docker build . -t audio-extraction-worker

To run the worker, execute the following command:

docker compose up

2. Local run

All commands should be run within WSL if on Windows or within your terminal if on Linux.

Follow the steps here (under "Adding pyproject.toml and generating a poetry.lock based on it") to install Poetry and the dependencies required to run the worker
Make sure you have the .env.override file in your local repo folder
Install ffmpeg. You can run this command, for example:

apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg

Navigate inside the scripts folder then execute the following command:

./run.sh

Expected run

The expected run of this worker should download the input video file if it isn't downloaded already in /data/input/, run ffmpeg with the arguments specified in .env.override, and output an audio file in /data/output/. You can also configure the transfer of the output to an S3 bucket.

If you want to test the input file download, we recommend deleting the /data/input/ folder (NOT the /data folder).

Environment variables

The variables unique to this worker affect the output and are the following:

AE_SAMPLERATE_HZ: The sampling rate of the resulting audio file. Default value is 0 which means the sampling rate of the input video file will be used
AE_FILE_EXTENSION: The file extension of the output audio file. Default is wav
AE_CONVERT_TO_MONO: Whether the audio output should be converted to mono format. Defaults to n (no/False)

They can all be modified through the .env.override file and the full list of variables can be found in .env.

Example data

You can find an example video input file in /data/input/ and the resutlting audio output file in /data/output/.

Pipeline of the worker

The pipeline is as follows:

./run.sh/docker compose up -> main.py

main.py checks if the configuration is correct and, if so, runs the pipeline

main.py -> run_pipeline.py

run_pipeline.py makes sure each step of the pipeline is executed successfully:

Downloading the input file if it's not present -> download.py
Running the audio extraction of the input -> transcode.py
Transferring the output to S3 if configured

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github		.github
data		data
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env		.env
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
api.py		api.py
base_util.py		base_util.py
config.py		config.py
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
download.py		download.py
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run_pipeline.py		run_pipeline.py
s3_util.py		s3_util.py
transcode.py		transcode.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

audio-extraction-worker

1. Docker run (recommended)

2. Local run

Expected run

Environment variables

Example data

Pipeline of the worker

About

Releases

Packages

Contributors 5

Languages

License

beeldengeluid/audio-extraction-worker

Folders and files

Latest commit

History

Repository files navigation

audio-extraction-worker

1. Docker run (recommended)

2. Local run

Expected run

Environment variables

Example data

Pipeline of the worker

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages