Thank you for open-sourcing the Conan project!
I’ve been trying to reproduce the results following the paper and the released code, but I’m still unable to achieve the demo-level quality. Specifically:
Emformer accuracy: 63%
Generated waveform: The output from the main Conan model sounds noticeably degraded, as shown in the following training loss curve:
It seems that the only missing component from the repository is the data preprocessing stage (particularly the HuBERT token extraction). I suspect that my HuBERT tokens might be the issue.
Would you consider releasing the preprocessing scripts or providing a minimal example for HuBERT token generation?
Any guidance or clarification would be greatly appreciated!
Thank you for open-sourcing the Conan project!
I’ve been trying to reproduce the results following the paper and the released code, but I’m still unable to achieve the demo-level quality. Specifically:
Emformer accuracy: 63%
Generated waveform: The output from the main Conan model sounds noticeably degraded, as shown in the following training loss curve:
It seems that the only missing component from the repository is the data preprocessing stage (particularly the HuBERT token extraction). I suspect that my HuBERT tokens might be the issue.
Would you consider releasing the preprocessing scripts or providing a minimal example for HuBERT token generation?
Any guidance or clarification would be greatly appreciated!