audio time != mel time #589

mathpopo · 2023-11-22T08:00:25Z

i use 16k mono ,6.7025s wav , mel 167 frame->167 *40 ms=6.68s
the same ,i use 6s wav , mel 147 5.58s why?
issue:audio also play , the mel have no

xuxianren · 2024-01-18T09:22:58Z

   # inference.py:328
    mel = audio.melspectrogram(wav)
    print(mel.shape)

    if np.isnan(mel.reshape(-1)).sum() > 0:
        raise ValueError(
            "Mel contains nan! Using a TTS voice? Add a small epsilon noise to the wav file and try again"
        )

    mel_chunks = []
    mel_idx_multiplier = 80.0 / fps
    i = 0
    while 1:
        start_idx = int(i * mel_idx_multiplier)
        if start_idx + mel_step_size > len(mel[0]):
            mel_chunks.append(mel[:, len(mel[0]) - mel_step_size :])
            break
        mel_chunks.append(mel[:, start_idx : start_idx + mel_step_size])
        i += 1

melspectrogram会进行填充，mel_chunks的生成逻辑中会忽略末尾的一些mel窗口，这些造成了时间不一致。
我估算的差异应该在15/80-1/25=0.1475s以内。不太明白为什么你的差异这么大。
你可以debug上面代码分析。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio time != mel time #589

audio time != mel time #589

mathpopo commented Nov 22, 2023 •

edited

Loading

xuxianren commented Jan 18, 2024

audio time != mel time #589

audio time != mel time #589

Comments

mathpopo commented Nov 22, 2023 • edited Loading

xuxianren commented Jan 18, 2024

mathpopo commented Nov 22, 2023 •

edited

Loading