Warning
This project is a work in progress. Do not believe any statements in this README. Do not use the code either for now.
Demucs is a state-of-the-art music source separation model, currently capable of separating drums, bass, and vocals from the rest of the accompaniment. Samples are available online for both Hybrid Demucs and Hybrid Transformer Demucs. Checkout the paper for more information.
Before installing Demucs, you need:
- Python 3.9 or later
- FFmpeg (required for audio processing):
- macOS:
brew install ffmpeg
- Ubuntu/Debian:
sudo apt-get install ffmpeg
- Windows: Download from FFmpeg.org
- macOS:
If you want to use Demucs as a command-line tool to separate audio tracks:
UV is a fast, modern Python package manager with isolated environments:
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows PowerShell
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
-
UV Tool Install (Recommended) - Creates an isolated environment:
uv tool install demucs-inference
-
Run Without Installing - For one-time or occasional use:
uvx --with demucs-inference demucs audio_file.mp3
-
Standard pip Install:
pip install demucs-inference
If you want to use Demucs as a library in your Python applications:
# Using UV (recommended)
uv pip install demucs-inference
# or
uv add demucs-inference
# Using standard pip
pip install demucs-inference
# If installed with uv tool
uv tool upgrade demucs-inference
# If installed with standard uv
uv pip install -U demucs-inference
# or
uv add -U demucs-inference
# If installed with pip
pip install -U demucs-inference
After installing Demucs, you can use it like the following:
# View all options
demucs --help
# Separate one audio file
demucs audio_file.mp3
# Separate multiple audio files
demucs audio_file_1.mp3 audio_file_2.mp3
# Separate all audio files in the current directory
demucs *.mp3
You can download and cache models before using them for offline use:
# Download the default model (htdemucs)
demucs download
# Download a specific model
demucs download mdx
# Download multiple models
demucs download mdx htdemucs_ft htdemucs_6s
# List all available models
demucs download --list-models
# Download all available models
demucs download --all
This is useful when:
- You want to prepare for offline use
- You want to avoid downloading during audio processing
- You're running batch jobs and want to ensure models are ready
If you want to use GPU acceleration:
- Minimum requirement: 3GB of GPU RAM (default settings need about 7GB)
- For devices with limited memory:
- Use
--segment SEGMENT
to reduce split length (set to integer seconds) - For 3GB GPU memory, try SEGMENT=8 (quality may be affected by smaller values)
- Hybrid Transformer models only support a maximum segment length of 7.8 seconds
- Set environment variable
PYTORCH_NO_CUDA_MEMORY_CACHING=1
to further reduce usage - For very limited memory (2GB or less), use
-d cpu
to run on CPU instead
- Use
- Processing time on CPU is roughly 1.5× the duration of the track
Separated tracks are stored in the separated/MODEL_NAME/TRACK_NAME
folder. There you will find four stereo wav files sampled at 44.1 kHz: drums.wav
, bass.wav
,
other.wav
, vocals.wav
(or .mp3
if you used the --mp3
option).
All audio formats supported by torchaudio
can be processed (i.e. wav, mp3, flac, ogg/vorbis on Linux/macOS, etc.). On Windows, torchaudio
has limited support, so we rely on ffmpeg
, which should support pretty much anything.
Audio is resampled on the fly if necessary.
- Default: WAV files encoded as int16
--float32
: Save as float32 WAV files--int24
: Save as 24-bit integer WAV files--mp3
: Save as MP3 files--mp3-bitrate
: Set MP3 bitrate in kbps (default is 320)
Demucs will automatically rescale each output stem to avoid clipping, which may affect relative volume between stems. Options:
--clip-mode clamp
: Use hard clipping if you prefer preserving relative volumes- Alternatively, try reducing the volume of the input mixture before processing
Select pre-trained models with the -n
flag:
htdemucs
: First version of Hybrid Transformer Demucs (default). Trained on MusDB + 800 songs.htdemucs_ft
: Fine-tuned version ofhtdemucs
. Better quality but 4× slower.htdemucs_6s
: 6-source version addingpiano
andguitar
(piano performance is limited).hdemucs_mmi
: Hybrid Demucs v3, retrained on MusDB + 800 songs.mdx
: Trained only on MusDB HQ. Winner on track A at the MDX challenge.mdx_extra
: Trained with extra data (including MusDB test set). Ranked 2nd on track B.mdx_q
,mdx_extra_q
: Quantized versions. Smaller size but slightly lower quality.
--two-stems=vocals
: Separate vocals from accompaniment (karaoke mode). Replace "vocals" with any source.--shifts=SHIFTS
: Perform multiple predictions with random shifts and average them. Makes processingSHIFTS
times slower (GPU recommended).--overlap
: Control overlap between prediction windows (default: 0.25). Can be reduced to 0.1 for faster processing.-j N
: Specify number of parallel jobs (e.g.,-j 2
). Multiplies RAM usage by the same amount.
Demucs provides an API that can be used to separate audio files programmatically.
from demucs.api import Separator, save_audio
# Initialize the separator with desired model and parameters
separator = Separator(model="htdemucs", device="cuda")
# Separate an audio file
original, separated_stems = separator.separate_audio_file("audio_file.mp3")
# Save the separated stems
for stem_name, stem_audio in separated_stems.items():
save_audio(stem_audio, f"{stem_name}.wav", samplerate=separator.samplerate)
View the API docs for more information on the Separator
class.
@inproceedings{rouard2022hybrid,
title={Hybrid Transformers for Music Source Separation},
author={Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
booktitle={ICASSP 23},
year={2023}
}
@inproceedings{defossez2021hybrid,
title={Hybrid Spectrogram and Waveform Source Separation},
author={D{\'e}fossez, Alexandre},
booktitle={Proceedings of the ISMIR 2021 Workshop on Music Source Separation},
year={2021}
}
Demucs is released under the MIT license as found in the LICENSE file.