Skip to content

Conversation

@zhenyujia23-crypto
Copy link

  • Add subtitle_enable parameter for streaming word-level timestamps
  • Remove emotion parameter and related processing logic
  • Keep exclude_aggregated_audio parameter as requested
  • Maintain backward compatibility with english_normalization deprecation
  • Add subtitle file processing and logging
  • Simplify code and remove complex validation

InputParams now has 9 parameters:

  • language, speed, volume, pitch
  • english_normalization (deprecated), text_normalization
  • latex_read, exclude_aggregated_audio, subtitle_enable

Provides streaming subtitles when subtitle_enable=True with word-level timestamps.

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

- Add subtitle_enable parameter for streaming word-level timestamps
- Remove emotion parameter and related processing logic
- Keep exclude_aggregated_audio parameter as requested
- Maintain backward compatibility with english_normalization deprecation
- Add subtitle file processing and logging
- Simplify code and remove complex validation

InputParams now has 9 parameters:
- language, speed, volume, pitch
- english_normalization (deprecated), text_normalization
- latex_read, exclude_aggregated_audio, subtitle_enable

Provides streaming subtitles when subtitle_enable=True with word-level timestamps.
if service_lang:
self._settings["language_boost"] = service_lang

# Add optional emotion if provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing emotion? This feature was just added in the last release? Removing it is a breaking change. You should instead deprecate and warn about it if it's no longer relevant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey markbackman, if we add emotion into the request body, the latency will be very high

text_normalization: Enable text normalization (Chinese/English).
latex_read: Enable LaTeX formula reading.
exclude_aggregated_audio: Whether to exclude aggregated audio in final chunk.
subtitle_enable: Enable subtitle generation with word-level timestamps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Word timestamps require a broader change in this class. The subclass needs to change to reflect the change. This would become a WordTTSService and would need to implement add_word_timestamps, etc. to work with the subclass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants