feat(minimax): add streaming word-level subtitles and simplify API #3159

zhenyujia23-crypto · 2025-11-28T11:08:22Z

Add subtitle_enable parameter for streaming word-level timestamps
Remove emotion parameter and related processing logic
Keep exclude_aggregated_audio parameter as requested
Maintain backward compatibility with english_normalization deprecation
Add subtitle file processing and logging
Simplify code and remove complex validation

InputParams now has 9 parameters:

language, speed, volume, pitch
english_normalization (deprecated), text_normalization
latex_read, exclude_aggregated_audio, subtitle_enable

Provides streaming subtitles when subtitle_enable=True with word-level timestamps.

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

- Add subtitle_enable parameter for streaming word-level timestamps - Remove emotion parameter and related processing logic - Keep exclude_aggregated_audio parameter as requested - Maintain backward compatibility with english_normalization deprecation - Add subtitle file processing and logging - Simplify code and remove complex validation InputParams now has 9 parameters: - language, speed, volume, pitch - english_normalization (deprecated), text_normalization - latex_read, exclude_aggregated_audio, subtitle_enable Provides streaming subtitles when subtitle_enable=True with word-level timestamps.

markbackman · 2025-11-30T15:28:55Z

src/pipecat/services/minimax/tts.py

            if service_lang:
                self._settings["language_boost"] = service_lang

-        # Add optional emotion if provided


Why are you removing emotion? This feature was just added in the last release? Removing it is a breaking change. You should instead deprecate and warn about it if it's no longer relevant.

Hey markbackman, if we add emotion into the request body, the latency will be very high

markbackman · 2025-11-30T15:30:09Z

src/pipecat/services/minimax/tts.py

            text_normalization: Enable text normalization (Chinese/English).
            latex_read: Enable LaTeX formula reading.
            exclude_aggregated_audio: Whether to exclude aggregated audio in final chunk.
+            subtitle_enable: Enable subtitle generation with word-level timestamps.


Word timestamps require a broader change in this class. The subclass needs to change to reflect the change. This would become a WordTTSService and would need to implement add_word_timestamps, etc. to work with the subclass.

markbackman reviewed Nov 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(minimax): add streaming word-level subtitles and simplify API #3159

feat(minimax): add streaming word-level subtitles and simplify API #3159

zhenyujia23-crypto commented Nov 28, 2025

Uh oh!

markbackman Nov 30, 2025

Uh oh!

zhenyujia23-crypto Dec 2, 2025

Uh oh!

markbackman Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(minimax): add streaming word-level subtitles and simplify API #3159

Are you sure you want to change the base?

feat(minimax): add streaming word-level subtitles and simplify API #3159

Conversation

zhenyujia23-crypto commented Nov 28, 2025

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

Uh oh!

markbackman Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

zhenyujia23-crypto Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

markbackman Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants