Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio Source Separation Issue in DPRNN-TasNet Model #708

Closed
drAliMollaei opened this issue Feb 21, 2025 · 1 comment
Closed

Audio Source Separation Issue in DPRNN-TasNet Model #708

drAliMollaei opened this issue Feb 21, 2025 · 1 comment
Labels
question Further information is requested

Comments

@drAliMollaei
Copy link

drAliMollaei commented Feb 21, 2025

Hello everyone.
I am currently working on the audio source separation by using the DPRNNTasNet model, and I have encountered an issue with the output when performing separation. the training loss (using the PITLossWrapper with pairwise_neg_sisdr) reaches a value of -10 during training.

I am using the following setup:

Model architecture: DPRNNTasNet
Loss function: PITLossWrapper with pairwise_neg_sisdr
Checkpoint: Trained model from checkpoint
After performing the separation, the separated signals, both of them(separated_1wav and separated_2.wav) are similar. there may be an issue with how the separation is being performed or how the sources are being generated.

Would you be able to provide any insights or suggestions on why this might be happening? Perhaps there are specific parameters I need to adjust or additional steps I might have missed during the separation process?

Thanks

code : ------------------------------------------------------------------------
import torch
import soundfile as sf
from asteroid.models import DPRNNTasNet
from asteroid.utils import tensors_to_device

def separate_and_save(input_wav, model_path, output_dir, use_gpu=True):
"""
Separate the sources from a mixed audio file using a trained DPRNNTasNet model.

Args:
    input_wav (str): Path to the mixed audio file.
    model_path (str): Path to the trained model checkpoint.
    output_dir (str): Directory to save the separated audio files.
    use_gpu (bool): Whether to use GPU for inference or not.
"""
checkpoint = torch.load(model_path, map_location='cpu')
params = checkpoint['training_config']

model = DPRNNTasNet(
    n_src=params['masknet']['n_src'],
    n_repeats=params['masknet']['n_repeats'],
    bn_chan=params['masknet']['bn_chan'],
    hid_size=params['masknet']['hid_size'],
    chunk_size=params['masknet']['chunk_size'],
    hop_size=params['masknet']['hop_size'],
    mask_act=params['masknet']['mask_act'],
    bidirectional=params['masknet']['bidirectional'],
    dropout=params['masknet']['dropout'],
    in_chan=params['masknet']['in_chan'],
    out_chan=params['masknet']['out_chan'],
    n_filters=params['filterbank']['n_filters'],
    kernel_size=params['filterbank']['kernel_size'],
    stride=params['filterbank']['stride']
)
model.load_state_dict(checkpoint['state_dict'], strict=False)
model.eval()

device = torch.device("cuda" if use_gpu and torch.cuda.is_available() else "cpu")
model.to(device)

# Load the mixed audio file
print(f"Loading input audio from {input_wav}...")
mix_wav, sample_rate = sf.read(input_wav)
mix_tensor = torch.tensor(mix_wav, dtype=torch.float32).unsqueeze(0).to(device)

# Perform the separation
with torch.no_grad():
    print("Separating sources...")
    est_sources = model(mix_tensor)
    # est_sources = model.separate(input_wav)
    print("------------------------------------------------------------")
    print(est_sources)
    print("------------------------------------------------------------")
# Save the separated sources as WAV files
output_paths = [f"{output_dir}/separated_{i + 1}.wav" for i in range(est_sources.shape[1])]
for i, separated_signal in enumerate(est_sources[0]):
    separated_signal_np = separated_signal.cpu().numpy()

    # Save the separated audio
    sf.write(output_paths[i], separated_signal_np, sample_rate)
    print(f"Separated audio saved as {output_paths[i]}")

print("Separation completed!")

input_wav = "13.wav" # مثال: "input/test_mixed.wav"
output_dir = "./output" # مثال: "separated_output/"
model_path = "./exp/tmp/checkpoints/epoch=131-step=351252.ckpt"
separate_and_save(input_wav, model_path, output_dir)

@drAliMollaei drAliMollaei added the question Further information is requested label Feb 21, 2025
@drAliMollaei drAliMollaei changed the title Request for Help with Audio Source Separation Issue in DPRNN-TasNet Model Audio Source Separation Issue in DPRNN-TasNet Model Feb 21, 2025
@drAliMollaei
Copy link
Author

My problem is solved. In the above code, the trained weights of the model should also be loaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant