Wanna know how VITS works? #2620

p0p4k · 2023-05-17T10:04:56Z

p0p4k
May 17, 2023

I have drawn a simple diagram explaining VITS architecture.
Following VITS, NaturalSpeech-1 uses a very similar architecture (https://arxiv.org/pdf/2205.04421.pdf).
The DurationPredictor of VITS is also a very interesting architecture in its own. If anyone has any questions or wants to go indepth, I can help them.

p0p4k · 2023-05-17T10:14:13Z

p0p4k
May 17, 2023
Author

For alignment: We first calculate logprobab of z_p of every phoneme for every timestep in the spectrogram using the projections m,_p and logs_p. Then we use MAS to find a path which maximizes the total logprobab of chosen z_p at every timestep. So, now for each timestep we have a single m_p and logs_p. This m_p, logs_p of timestamp length vector is where our z_p is supposed to be sampled from. Thus, the KL loss.

3 replies

Oorgien Apr 9, 2024

Hi! Could you please give more details on how does the Duration Predictor works?

p0p4k Apr 11, 2024
Author

@Oorgien add me on discord p0p4k

CasonTsai Aug 6, 2024

For alignment: We first calculate logprobab of z_p of every phoneme for every timestep in the spectrogram using the projections m,_p and logs_p. Then we use MAS to find a path which maximizes the total logprobab of chosen z_p at every timestep. So, now for each timestep we have a single m_p and logs_p. This m_p, logs_p of timestamp length vector is where our z_p is supposed to be sampled from. Thus, the KL loss.

so can we get the phoneme duration time according to output of StochasticDurationPredictor/DurationPredictor in inference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wanna know how VITS works? #2620

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Wanna know how VITS works? #2620

p0p4k May 17, 2023

Replies: 1 comment · 3 replies

p0p4k May 17, 2023 Author

Oorgien Apr 9, 2024

p0p4k Apr 11, 2024 Author

CasonTsai Aug 6, 2024

p0p4k
May 17, 2023

Replies: 1 comment 3 replies

p0p4k
May 17, 2023
Author

p0p4k Apr 11, 2024
Author