We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i use 16k mono ,6.7025s wav , mel 167 frame->167 *40 ms=6.68s the same ,i use 6s wav , mel 147 5.58s why? issue:audio also play , the mel have no
The text was updated successfully, but these errors were encountered:
# inference.py:328 mel = audio.melspectrogram(wav) print(mel.shape) if np.isnan(mel.reshape(-1)).sum() > 0: raise ValueError( "Mel contains nan! Using a TTS voice? Add a small epsilon noise to the wav file and try again" ) mel_chunks = [] mel_idx_multiplier = 80.0 / fps i = 0 while 1: start_idx = int(i * mel_idx_multiplier) if start_idx + mel_step_size > len(mel[0]): mel_chunks.append(mel[:, len(mel[0]) - mel_step_size :]) break mel_chunks.append(mel[:, start_idx : start_idx + mel_step_size]) i += 1
melspectrogram会进行填充,mel_chunks的生成逻辑中会忽略末尾的一些mel窗口,这些造成了时间不一致。 我估算的差异应该在15/80-1/25=0.1475s以内。不太明白为什么你的差异这么大。 你可以debug上面代码分析。
15/80-1/25=0.1475s
Sorry, something went wrong.
No branches or pull requests
i use 16k mono ,6.7025s wav , mel 167 frame->167 *40 ms=6.68s
the same ,i use 6s wav , mel 147 5.58s why?
issue:audio also play , the mel have no
The text was updated successfully, but these errors were encountered: