Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] xtts voice generation is better in 0.24.3 than in 0.25 and above #228

Closed
C00reNUT opened this issue Dec 21, 2024 · 2 comments
Closed
Labels
bug Something isn't working question Further information is requested XTTS

Comments

@C00reNUT
Copy link

C00reNUT commented Dec 21, 2024

Describe the bug

Hello, thank you for maintaining this library, this is probably related to #198 - when i am using 0.24.3 version for inference of xtts model I get much better results than in 0.25.1 - there must be still some bug in the inference. I didn't try the exactly same generation with the same seed, but the quality difference is obvious.

I would provide some samples but I am using czech finetuned model and you couldn't really hear the difference unless you are native...

To Reproduce

import os
import torch
import torchaudio
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

print("Loading model...")
config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", use_deepspeed=True)
model.cuda()

print("Computing speaker latents...")
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=["reference.wav"])

print("Inference...")
out = model.inference(
"It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
"en",
gpt_cond_latent,
speaker_embedding,
temperature=0.7, # Add custom parameters here
)
torchaudio.save("xtts.wav", torch.tensor(out["wav"]).unsqueeze(0), 24000)

Expected behavior

The outputs shall be similar using same parameters, accounting for diffusion variability in outputs.

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3090"
        ],
        "available": true,
        "version": "12.4"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.5.1",
        "TTS": "0.24.3",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.11.9",
        "version": "#49-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov  4 02:06:24 UTC 2024"
    }
}

Additional context

No response

@C00reNUT C00reNUT added the bug Something isn't working label Dec 21, 2024
@eginhard
Copy link
Member

I just double-checked for some examples that 0.25.1 and 0.24.3 (and other previous versions) produce exactly the same output when fixing the seed. If you could share some samples and/or test with a fixed seed that would be helpful.

@eginhard eginhard added question Further information is requested XTTS labels Dec 22, 2024
@C00reNUT
Copy link
Author

I tried it once again and I was using wrong config for selected model version, sorry about the confusion, happy new year!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested XTTS
Projects
None yet
Development

No branches or pull requests

2 participants