Enhance the generation experience and reduce the difficulty of script usage#16
Open
ppeev001 wants to merge 2 commits intoysharma3501:masterfrom
Open
Enhance the generation experience and reduce the difficulty of script usage#16ppeev001 wants to merge 2 commits intoysharma3501:masterfrom
ppeev001 wants to merge 2 commits intoysharma3501:masterfrom
Conversation
1 Skip Whisper recognition if text exists in speaker.yml; 2. Add 50ms/80ms silence via NumPy in post-processing; 3. Language-specific t_shift/guidance_scale by text proportion; 4. Independent Chinese speech rate with token padding coefficient.
Author
|
Readme is wrong. pls cancel that change. thx. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Skip Whisper recognition if text exists in speaker.yml:
Bad performance of Whisper ASR in Chinese recognition can affect the recognition results, but there's no need to worry about that now.
Add 50ms/80ms silence via NumPy in post-processing
This change can improve the naturalness of non-streaming TTS concatenation.
Language-specific t_shift/guidance_scale by text proportion;
Chinese and English have different requirements for parameters such as t_shift, and the same parameters can lead to varying synthesis effects. Now, Chinese and English automatically recognize and match reasonable parameters, resulting in better synthesis effects
Independent Chinese speech rate with token padding coefficient.
Optimized the calculation issue of Chinese tokens when using Emila, which resulted in a significantly faster processing speed for Chinese speech.