Thank you for the excellent work and sharing this implementation.
I tried to convert to ONNX and did the inference . However I have below issue/challenges. Appreciate any valuable suggestions .
ONNX input dimension remains fixed, As a result we need to pad additional Ids to the phoneme array. In the existing code, it replicates the phoneme till the ONNX input size length. This in turn creates repeated audios of the same content. Is there any specific Id, I can pad to avoid unwanted audio at the end. OR Is there a way to pass dynamic length phoneme array to ONNX model . Please clarify if I'm missing anything here and how to avoid this.
Thank you for the excellent work and sharing this implementation.
I tried to convert to ONNX and did the inference . However I have below issue/challenges. Appreciate any valuable suggestions .
ONNX input dimension remains fixed, As a result we need to pad additional Ids to the phoneme array. In the existing code, it replicates the phoneme till the ONNX input size length. This in turn creates repeated audios of the same content. Is there any specific Id, I can pad to avoid unwanted audio at the end. OR Is there a way to pass dynamic length phoneme array to ONNX model . Please clarify if I'm missing anything here and how to avoid this.