Skip to content

Enhance the generation experience and reduce the difficulty of script usage#16

Open
ppeev001 wants to merge 2 commits intoysharma3501:masterfrom
ppeev001:master
Open

Enhance the generation experience and reduce the difficulty of script usage#16
ppeev001 wants to merge 2 commits intoysharma3501:masterfrom
ppeev001:master

Conversation

@ppeev001
Copy link

  1. Skip Whisper recognition if text exists in speaker.yml:
    Bad performance of Whisper ASR in Chinese recognition can affect the recognition results, but there's no need to worry about that now.

  2. Add 50ms/80ms silence via NumPy in post-processing
    This change can improve the naturalness of non-streaming TTS concatenation.

  3. Language-specific t_shift/guidance_scale by text proportion;
    Chinese and English have different requirements for parameters such as t_shift, and the same parameters can lead to varying synthesis effects. Now, Chinese and English automatically recognize and match reasonable parameters, resulting in better synthesis effects

  4. Independent Chinese speech rate with token padding coefficient.
    Optimized the calculation issue of Chinese tokens when using Emila, which resulted in a significantly faster processing speed for Chinese speech.

1 Skip Whisper recognition if text exists in speaker.yml; 
2. Add 50ms/80ms silence via NumPy in post-processing; 
3. Language-specific t_shift/guidance_scale by text proportion; 
4. Independent Chinese speech rate with token padding coefficient.
@ppeev001
Copy link
Author

Readme is wrong. pls cancel that change. thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant