-
Notifications
You must be signed in to change notification settings - Fork 810
Timestamps in parakeet_runner
#16545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timestamps in parakeet_runner
#16545
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16545
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 55 PendingAs of commit 7ada1eb with merge base dbf3c37 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
831f714 to
0c9768d
Compare
| if (!num_rnn_layers_result.ok() || !pred_hidden_result.ok() || | ||
| !vocab_size_result.ok() || !blank_id_result.ok() || | ||
| !sample_rate_result.ok()) { | ||
| !sample_rate_result.ok() || !window_stride_result.ok() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will break compat with previously exported parakeet models. I chose to do this b/c early in development and to avoid having to make a separate path that allows everything but timestamps if the new metadata isn't present.
Open to doing such a thing if reviewers feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i don't mind bc breaking at this early stage.
And it's in examples anyway not in core directory (like /extensions)
mergennachin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on extracting out the abstraction for TimestampExtractor into its own file that could be used for other models in the future. The main runner is becoming pretty large now.
examples/models/parakeet/main.cpp
Outdated
|
|
||
| if (!word.empty() && (ends_with_delimiter || is_delimiter_word)) { | ||
| segment_words.push_back(word); | ||
| if (!segment_words.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems redundant check right after push back
Made the following adjustments:
I decided against a I think this walks an appropriate middle ground for the current point in development - modularized code that can be easily extended if/when use with other models arrises, while avoiding too deep of an abstraction that may not work for other models. This is also why these utils are in namespaces This is also nice because the NeMo ported code remains almost trivially mappable to the original NeMo files/methods. @mergennachin does this seem reasonable to you? Open to discussion if you still feel there is a better approach. |
mergennachin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline comment
examples/models/parakeet/README.md
Outdated
| | `--audio_path` | Path to input audio file (.wav) | | ||
| | `--tokenizer_path` | Path to tokenizer file (default: `tokenizer.json`) | | ||
| | `--data_path` | Path to data file (.ptd) for delegate data (optional, required for CUDA) | | ||
| | `--timestamps` | Timestamp output mode: `none\|token\|word\|segment\|all` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we default to one of the options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, defaulted to "segment" because the others can get pretty verbose in output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say in README.md that segment is the default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes :)
|
|
||
| namespace parakeet::tokenizer_utils { | ||
|
|
||
| std::unordered_set<std::string> derive_supported_punctuation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on moving this to timestap_utils?
It's exclusively called once in main.cpp right before timestamp computation and passed to timestamp functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think it fits a bit better in tokenizer_utils, since logically it only touches the tokenizer and does nothing timestamp related. Additionally it is in tokenizer_utils.py in NeMo reference implementation.
Contextually though, agree it is only needed in the parakeet_runner when we compute timestamps. Sortof a growing pain of some level of abstraction/modularization before needed elsewhere?
I personally would leave it, but can move it if you feel its important/a blocker. Don't feel strongly enough about it to blockingly oppose
Summary
Enable computation of timestamps within
parakeet_runnerthrough a--timestampsflag (none|token|word|segment|all). Followed reference implementation from NVIDIA-NeMo/NeMo, which is cited as the way to run parakeet and compute timestamps for nvidia/parakeet-tdt-0.6b-v3 on HF.Requires meta-pytorch/tokenizers#163 for the
id_to_piecemethod onTokenizers.Test plan
Outputs the exact same transcription/timestamps as NVIDIA-NeMo/NeMo for audio files basketball.wav and audio.wav
NeMo (basketball.wav)
Click to see output
parakeet_runner(basketball.wav)Click to see output