Allow downloading extra formats in the demo #617

roedoejet · 2024-12-20T22:02:44Z

PR Goal?

This PR allows us to download any of the available output formats through the demo (TextGrid, ReadAlong, Spectrogram, etc). We can also optional disable certain output formats

Fixes?

This adds to fixing #607

Feedback sought?

Test it out, confirm it works as expected.

Priority?

medium

Tests added?

We currently don't test the demo - but we're moving to do that in @joanise 's PR

How to test?

spin up a demo and try synthesizing other formats

Confidence?

medium-high (I've tested it and it works well on my Mac)

Version change?

Related PRs?

…av in the demo

semanticdiff-com · 2024-12-20T22:02:46Z

Review changes with

Changed Files

File	Status
everyvoice/demo/app.py	3% smaller
everyvoice/cli.py	0% smaller
everyvoice/model/feature_prediction/FastSpeech2_lightning	0% smaller

codecov · 2024-12-20T22:05:35Z

Codecov Report

Attention: Patch coverage is 10.52632% with 34 lines in your changes missing coverage. Please review.

Project coverage is 76.07%. Comparing base (1c2aae6) to head (aa365d0).

Files with missing lines	Patch %	Lines
everyvoice/demo/app.py	10.52%	34 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #617      +/-   ##
==========================================
- Coverage   76.75%   76.07%   -0.69%     
==========================================
  Files          46       46              
  Lines        3446     3481      +35     
  Branches      470      479       +9     
==========================================
+ Hits         2645     2648       +3     
- Misses        700      732      +32     
  Partials      101      101

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-12-20T22:07:12Z

CLI load time: 0:00.35
Pull Request HEAD: aa365d07cdc949f8eab4aeca0473feeec5d81188
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
import time:      1054 |     103621 |     typer.main
import time:       291 |     122599 |   typer
import time:       244 |     101070 |       everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.cli.cli
import time:       179 |     101686 |     everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.cli
import time:        18 |     101704 |   everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.cli.preprocess
import time:      7947 |     258926 | everyvoice.cli

joanise

Nice work, though I have a number of comments and suggestions.

In general, I find it not very intuitive to have to re-synthesize if I change the output format. I guess from how things work, it's probably unavoidable, but it's not ideal UX. Maybe the File Output box could have a hint indicating that Output Format is where you change what's available for download here?

When you keep just the wav output format, it's confusing that the File Output box is present but you can't interact with it. I wanted to download my audio file, and I tried to find it in the File Output box. It took me a little while to locate the download button in the Audio box above.

joanise · 2025-01-06T21:39:52Z

everyvoice/cli.py

@@ -616,6 +616,12 @@ def demo(
        "-s",
        help="Specify speakers to be included in the demo. Example: everyvoice demo <path_to_text_to_spec_model> <path_to_spec_to_wav_model> --speaker speaker_1 --speaker Sue",
    ),
+    outputs: List[str] = typer.Option(


It would be helpful to enumerate the valid options in the help message.

Ditto for --accelerator, while I'm thinking about it... And for --language and --speaker we should state that they have to be language(s) and speaker(s) known to the model.

For this PR, please address listing valid values for --output-format, fixing the other help messages is gravy and could go into a separate PR or issue.

Ditto for --accelerator, while I'm thinking about it... And for --language and --speaker we should state that they have to be language(s) and speaker(s) known to the model.

I think they do get listed, don't they? Like if you type a speaker that doesn't exist, I thought the error message listed out all the possible speakers. The output formats are dependent on the version of everyvoice installed, so we could include that in the help message, but the language and speaker are model-dependent, so we wouldn't be able to include the lists of those in the help message, just in the error message.

What I mean is that the everyvoice demo -h message should say something like "valid values are the language(s) and speaker(s) the model was trained on", or something to that effect, maybe more concisely. As the documentation stands, if you're not familiar with things yet, it's a bit mysterious how you're supposed to know what values you can use there.
And I know if you're just trained things, it's going to be obvious, but the point the of the help message is to support you when the information is not already obvious to you.

joanise · 2025-01-06T21:55:49Z

everyvoice/demo/app.py

+            else:
+                print(
+                    f"Attention: This model is not able to produce '{output}' as an output. The '{output}' option will not be available for selection. Please choose from the following possible outputs: {', '.join(possible_outputs)}"
+                )


This needs to be a fatal error with an immediate exit, and the message is misleading: it's not that the model can't produce the requested output, it's that the software has no implementation for it. Right now, everyvoice demo -O foo fs2.ckpt voc.ckpt prints this message about foo and then continues anyway and crashes with an exception a few lines later.

This is really CLI error checking, it should happen much earlier in this function, in particular before we load any checkpoint, so the error is dumped right away without having to wait 20 seconds or more for models to load first. You might get all this for nearly free if you define the list of valid values for outputs in cli.py's demo() function as I already suggested elsewhere.

BTW, the RAS output specifiers are readalong_xml and readalong_html with an underscore instead of a hyphen like in everyvoice synthesize from-text. They should be unified, using hyphens here too.

joanise · 2025-01-06T22:08:06Z

everyvoice/demo/app.py

+    wav_output = wav_writer.get_filename(basename, speaker, language)
+    file_writer = None
+    file_output = None
+    if output_format == SynthesizeOutputFormats.readalong_html.name:


This and all the other callback constructor calls really ought to be able to be replaced by a single call to get_synthesis_output_callbacks, no?

I realize you need the wav writer to create the RAS_html writer, but that's already done too.

Refactoring suggestion: have get_synthesis_output_callbacks return a dict where the key are the output types, and the values are the writers. Then here you can use writers["wav"] and writers[output_format] to access the two writers you need, having passed ["wav", output_format] (or the proper Enum form if need be) as the output_type argument to get_synthesis_output_callbacks.

yea, this is a good idea. I would definitely be in favour of this refactor, which I think would clean up some of the acrobatics we're currently doing in order to get the wav writer separately.

roedoejet · 2025-01-07T01:37:51Z

In general, I find it not very intuitive to have to re-synthesize if I change the output format. I guess from how things work, it's probably unavoidable, but it's not ideal UX. Maybe the File Output box could have a hint indicating that Output Format is where you change what's available for download here?

I agree, but I'm not sure gradio provides a callback for when the output format gets changed. It's possible, but given how expensive it is to synthesize (depending on the underlying hardware), I'm not super opposed to this UX, despite agreeing that it's clunky, as you say.

When you keep just the wav output format, it's confusing that the File Output box is present but you can't interact with it. I wanted to download my audio file, and I tried to find it in the File Output box. It took me a little while to locate the download button in the Audio box above.

Yes, the problem is that I don't believe gradio allows you to dynamically change the interface once it's been rendered. So we might be stuck with that. Note that it doesn't get rendered if the only possible output format is "wav" (i.e. if the demo command disables all the other output formats).

roedoejet added 2 commits December 20, 2024 13:59

feat: add dropdown options for downloading other formats other than w…

a018e64

…av in the demo

chore: update submodule to allow downloading extra formats in demo

aa365d0

roedoejet mentioned this pull request Dec 20, 2024

Dev.ap/demo formats EveryVoiceTTS/FastSpeech2_lightning#105

Open

roedoejet requested review from joanise and marctessier December 20, 2024 22:03

joanise requested changes Jan 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow downloading extra formats in the demo #617

Allow downloading extra formats in the demo #617

roedoejet commented Dec 20, 2024

semanticdiff-com bot commented Dec 20, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading

github-actions bot commented Dec 20, 2024

joanise left a comment

joanise Jan 6, 2025

joanise Jan 6, 2025

joanise Jan 6, 2025

roedoejet Jan 7, 2025

joanise Jan 7, 2025

joanise Jan 6, 2025

joanise Jan 6, 2025

roedoejet Jan 7, 2025

joanise Jan 6, 2025

roedoejet Jan 7, 2025

roedoejet commented Jan 7, 2025

Allow downloading extra formats in the demo #617

Are you sure you want to change the base?

Allow downloading extra formats in the demo #617

Conversation

roedoejet commented Dec 20, 2024

PR Goal?

Fixes?

Feedback sought?

Priority?

Tests added?

How to test?

Confidence?

Version change?

Related PRs?

semanticdiff-com bot commented Dec 20, 2024 • edited Loading

codecov bot commented Dec 20, 2024 • edited Loading

Codecov Report

github-actions bot commented Dec 20, 2024

joanise left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roedoejet commented Jan 7, 2025

semanticdiff-com bot commented Dec 20, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading