Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audio time != mel time #589

Open
mathpopo opened this issue Nov 22, 2023 · 1 comment
Open

audio time != mel time #589

mathpopo opened this issue Nov 22, 2023 · 1 comment

Comments

@mathpopo
Copy link

mathpopo commented Nov 22, 2023

i use 16k mono ,6.7025s wav , mel 167 frame->167 *40 ms=6.68s
the same ,i use 6s wav , mel 147 5.58s why?
issue:audio also play , the mel have no

@xuxianren
Copy link

   # inference.py:328
    mel = audio.melspectrogram(wav)
    print(mel.shape)

    if np.isnan(mel.reshape(-1)).sum() > 0:
        raise ValueError(
            "Mel contains nan! Using a TTS voice? Add a small epsilon noise to the wav file and try again"
        )

    mel_chunks = []
    mel_idx_multiplier = 80.0 / fps
    i = 0
    while 1:
        start_idx = int(i * mel_idx_multiplier)
        if start_idx + mel_step_size > len(mel[0]):
            mel_chunks.append(mel[:, len(mel[0]) - mel_step_size :])
            break
        mel_chunks.append(mel[:, start_idx : start_idx + mel_step_size])
        i += 1

melspectrogram会进行填充,mel_chunks的生成逻辑中会忽略末尾的一些mel窗口,这些造成了时间不一致。
我估算的差异应该在15/80-1/25=0.1475s以内。不太明白为什么你的差异这么大。
你可以debug上面代码分析。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants