You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello everyone.
I am currently working on the audio source separation by using the DPRNNTasNet model, and I have encountered an issue with the output when performing separation. the training loss (using the PITLossWrapper with pairwise_neg_sisdr) reaches a value of -10 during training.
I am using the following setup:
Model architecture: DPRNNTasNet
Loss function: PITLossWrapper with pairwise_neg_sisdr
Checkpoint: Trained model from checkpoint
After performing the separation, the separated signals, both of them(separated_1wav and separated_2.wav) are similar. there may be an issue with how the separation is being performed or how the sources are being generated.
Would you be able to provide any insights or suggestions on why this might be happening? Perhaps there are specific parameters I need to adjust or additional steps I might have missed during the separation process?
Thanks
code : ------------------------------------------------------------------------
import torch
import soundfile as sf
from asteroid.models import DPRNNTasNet
from asteroid.utils import tensors_to_device
def separate_and_save(input_wav, model_path, output_dir, use_gpu=True):
"""
Separate the sources from a mixed audio file using a trained DPRNNTasNet model.
Args:
input_wav (str): Path to the mixed audio file.
model_path (str): Path to the trained model checkpoint.
output_dir (str): Directory to save the separated audio files.
use_gpu (bool): Whether to use GPU for inference or not.
"""
checkpoint = torch.load(model_path, map_location='cpu')
params = checkpoint['training_config']
model = DPRNNTasNet(
n_src=params['masknet']['n_src'],
n_repeats=params['masknet']['n_repeats'],
bn_chan=params['masknet']['bn_chan'],
hid_size=params['masknet']['hid_size'],
chunk_size=params['masknet']['chunk_size'],
hop_size=params['masknet']['hop_size'],
mask_act=params['masknet']['mask_act'],
bidirectional=params['masknet']['bidirectional'],
dropout=params['masknet']['dropout'],
in_chan=params['masknet']['in_chan'],
out_chan=params['masknet']['out_chan'],
n_filters=params['filterbank']['n_filters'],
kernel_size=params['filterbank']['kernel_size'],
stride=params['filterbank']['stride']
)
model.load_state_dict(checkpoint['state_dict'], strict=False)
model.eval()
device = torch.device("cuda" if use_gpu and torch.cuda.is_available() else "cpu")
model.to(device)
# Load the mixed audio file
print(f"Loading input audio from {input_wav}...")
mix_wav, sample_rate = sf.read(input_wav)
mix_tensor = torch.tensor(mix_wav, dtype=torch.float32).unsqueeze(0).to(device)
# Perform the separation
with torch.no_grad():
print("Separating sources...")
est_sources = model(mix_tensor)
# est_sources = model.separate(input_wav)
print("------------------------------------------------------------")
print(est_sources)
print("------------------------------------------------------------")
# Save the separated sources as WAV files
output_paths = [f"{output_dir}/separated_{i + 1}.wav" for i in range(est_sources.shape[1])]
for i, separated_signal in enumerate(est_sources[0]):
separated_signal_np = separated_signal.cpu().numpy()
# Save the separated audio
sf.write(output_paths[i], separated_signal_np, sample_rate)
print(f"Separated audio saved as {output_paths[i]}")
print("Separation completed!")
drAliMollaei
changed the title
Request for Help with Audio Source Separation Issue in DPRNN-TasNet Model
Audio Source Separation Issue in DPRNN-TasNet Model
Feb 21, 2025
Hello everyone.
I am currently working on the audio source separation by using the DPRNNTasNet model, and I have encountered an issue with the output when performing separation. the training loss (using the PITLossWrapper with pairwise_neg_sisdr) reaches a value of -10 during training.
I am using the following setup:
Model architecture: DPRNNTasNet
Loss function: PITLossWrapper with pairwise_neg_sisdr
Checkpoint: Trained model from checkpoint
After performing the separation, the separated signals, both of them(separated_1wav and separated_2.wav) are similar. there may be an issue with how the separation is being performed or how the sources are being generated.
Would you be able to provide any insights or suggestions on why this might be happening? Perhaps there are specific parameters I need to adjust or additional steps I might have missed during the separation process?
Thanks
code : ------------------------------------------------------------------------
import torch
import soundfile as sf
from asteroid.models import DPRNNTasNet
from asteroid.utils import tensors_to_device
def separate_and_save(input_wav, model_path, output_dir, use_gpu=True):
"""
Separate the sources from a mixed audio file using a trained DPRNNTasNet model.
input_wav = "13.wav" # مثال: "input/test_mixed.wav"
output_dir = "./output" # مثال: "separated_output/"
model_path = "./exp/tmp/checkpoints/epoch=131-step=351252.ckpt"
separate_and_save(input_wav, model_path, output_dir)
The text was updated successfully, but these errors were encountered: