Preprocessing

Thank you for open-sourcing the Conan project!
I’ve been trying to reproduce the results following the paper and the released code, but I’m still unable to achieve the demo-level quality. Specifically:

Emformer accuracy: 63%

Generated waveform: The output from the main Conan model sounds noticeably degraded, as shown in the following training loss curve:

<img width="2102" height="602" alt="Image" src="https://github.com/user-attachments/assets/66322bc5-c35b-44ef-82e9-73b9426f2c38" />

It seems that the only missing component from the repository is the data preprocessing stage (particularly the HuBERT token extraction). I suspect that my HuBERT tokens might be the issue.

Would you consider releasing the preprocessing scripts or providing a minimal example for HuBERT token generation?
Any guidance or clarification would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Preprocessing #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions