Skip to content

Conversation

wu6u3tw
Copy link
Contributor

@wu6u3tw wu6u3tw commented Jul 14, 2025

The original text output from whisper-large-v3 includes numbers and the normalization part are included in the accuracy_eval script.
Therefore, to get the digit's part of the output in the label dict. I add digits, some symbols in the labels.

@wu6u3tw wu6u3tw requested a review from a team as a code owner July 14, 2025 06:44
Copy link
Contributor

github-actions bot commented Jul 14, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@wu6u3tw wu6u3tw force-pushed the dev-tinyinl-add_labels_in_accuracy_eval_whisper branch from bf35648 to b0faf8a Compare July 14, 2025 06:46
@wu6u3tw
Copy link
Contributor Author

wu6u3tw commented Jul 14, 2025

recheck

@keithachorn-intel
Copy link
Contributor

I'm not sure this will align with the text normalization elsewhere in the reference. Numerical values were previously expanded to full words. Will have to check how the OpenAI normalizer handles numeric values.

@wu6u3tw wu6u3tw changed the title [Whisper] Add labels' in the whisper output [Draft] [Whisper] Add labels' in the whisper output Jul 15, 2025
@hanyunfan
Copy link
Contributor

I'm not sure this will align with the text normalization elsewhere in the reference. Numerical values were previously expanded to full words. Will have to check how the OpenAI normalizer handles numeric values.

@wu6u3tw Could you review Keith’s question and provide additional context or a possible response?

Copy link
Contributor

@keithachorn-intel keithachorn-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per TF meeting, let's remove the euro, lb, and cent symbols since they are interacting strangely with VIM editor and don't appear to impact measured accuracy. Also, let's replicate this change in reference_SUT.py file.

"$",
"¢",
"£",
"€",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per TF meeting, let's remove the euro, lb, and cent symbols since they are interacting strangely with VIM editor and don't appear to impact measured accuracy. Also, let's replicate this change in reference_SUT.py file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove euro, cent, lb sign done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed and approved. Thanks!

@wu6u3tw wu6u3tw force-pushed the dev-tinyinl-add_labels_in_accuracy_eval_whisper branch from b0faf8a to a51a425 Compare October 1, 2025 18:45
@wu6u3tw wu6u3tw changed the title [Draft] [Whisper] Add labels' in the whisper output [Whisper] Add labels' in the whisper output Oct 1, 2025
Copy link
Contributor

@keithachorn-intel keithachorn-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and approved by Speech-to-text TF.

@keithachorn-intel
Copy link
Contributor

@pgmpablo157321 - This PR is approved per the TF discussion today. I think we can close it before the next WG sync (no need for wider input).

Copy link
Contributor

@hanyunfan hanyunfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hanyunfan hanyunfan merged commit 0b8ca03 into mlcommons:master Oct 13, 2025
29 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Oct 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants