[Whisper] Add `large-v3` version support #27336

flyingleafe · 2023-11-07T07:37:57Z

What does this PR do?

Adds the ability to download and convert the fresh large-v3 version of Whisper (https://github.com/openai/whisper/pull/1761/files).
Closes #27331.

The usage of _download method in convert_openai_to_hf.py turned out to be broken, that was fixed.
I also plan to add the processor (feature extractor + tokenizer) automatic file export today and take care that subtle changes in language tag tokenization are supported - hence the draft status.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

SantiDianaClibrain · 2023-11-07T09:19:49Z

Thanks, this can be extremely helpful!

ArthurZucker

Hey! THanks for the PR, the conversion script is fixed in #26834, which should be merged today cc @sanchit-gandhi.
Otherwise good to add the checkpoint path, and #27338 will add the tokenizier

flyingleafe · 2023-11-07T13:05:53Z

@ArthurZucker Thanks! Did not see the download fixing PR, and you were quite fast with tokenizer support, congrats)
Except for the tokenizer, the feature extractor parameters should also be fetched and exported, esp. given that v3 uses a different number of melbanks. I can handle that in this PR, if you do not yet have it already implemented somewhere locally.

ArthurZucker · 2023-11-07T13:10:48Z

Yes for sure!

flyingleafe · 2023-11-07T14:04:43Z

@ArthurZucker Added feature extractor export.
I reused the pre-computed mel filters from openai/whisper repository, doing so required slight changes in WhisperFeatureExtractor logic. I anticipate that auto-computed filters should be equivalent to ones saved in openai/whisper, but I am not 100% sure, so I think this is a more reliable way to obtain 100% functional equivalence.

ArthurZucker · 2023-11-07T14:09:32Z

Nice, just merged #27338 can you rebase?

flyingleafe · 2023-11-07T14:25:00Z

@ArthurZucker merged, can instead rebase/forcepush if that's preferable.

ArthurZucker · 2023-11-07T14:25:29Z

Merging should be fine, reviewing now !

ArthurZucker

Thanks a lot for the prompt PR and reactivity 🔥 , let's try to isolate the changes (so keep the downloading utils that were just merged) and try to match the mel creation. I think @sanchit-gandhi is having a look at that as well. Otherwise LGTM

src/transformers/models/whisper/convert_openai_to_hf.py

src/transformers/models/whisper/feature_extraction_whisper.py

flyingleafe · 2023-11-07T14:43:35Z

@ArthurZucker removed everything related to downloading the pre-computed filters, they are indeed equivalent to the constructed ones (np.allclose == True).

ArthurZucker

Good, num_mel_bins is properly set to dimensions["n_mels"] in the config and in the feature extractor. Should be the only places where it's needed. Lgtm, we usually add integration tests to make sure the converted checkpoints match with the original model in the test_modeling_whisper. Not sure if you have the hardware to do this?

HuggingFaceDocBuilderDev · 2023-11-07T15:15:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

sanchit-gandhi

Thanks for adding this @flyingleafe! The only difference we likely now have is in the generation config. After an offline discussion with @ArthurZucker we concluded that we previously hard-coded these arguments. What might be best is loading the appropriate generation config from the existing ones on the Hub? e.g. from openai/whisper-medium.en for English, and openai/whisper-large-v2 for multilingual v1 and 2, and openai/whisper-large-v3 (coming soon) for v3:

from transformers import GenerationConfig

generation_config = GenerationConfig.from_pretrained("openai/whisper-large-v2")
model.generation_config  = generation_config

...

sanchit-gandhi · 2023-11-07T15:33:47Z

src/transformers/models/whisper/convert_openai_to_hf.py

@@ -186,6 +188,13 @@ def convert_openai_whisper_to_tfms(checkpoint_path, pytorch_dump_folder_path):

    model.save_pretrained(pytorch_dump_folder_path)

+    # Export the feature extractor
+    feature_extractor = WhisperFeatureExtractor(


Super small request from me would be to also save the WhisperProcessor:

from transformers import WhisperProcessor processor = WhisperProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer) processor.save_pretrained(pytorch_dump_folder_path)

ArthurZucker

Okay LGTM with the changes to save the processor as a whole now! 🔥 🚀

flyingleafe · 2023-11-08T05:47:59Z

@ArthurZucker @sanchit-gandhi Your comment for the full preprocessor export is addressed. Since the preprocessor has the tokenizer as its constituent part, I renamed the --convert_tokenizer option to --convert_preprocessor.

I also took the liberty of removing additional parameters of --whisper_version and --multilingual, since the actual number of supported languages is derivable from the vocabulary size, which is the part of OpenAI model checkpoint.

@sanchit-gandhi I implemented the fetching of generation config from HF based on the number of languages supported as you have suggested, but it is kind of a chicken-and-egg situation. The alignment heads can be hardcoded into the dictionary as OpenAI does that, and the other parameters are either derived from the tokenizer or hardcoded as well. The only setting I don't quite understand how to derive is the set of suppressed tokens, if you give me a hint on that, I can remove the dependency on downloading extra stuff from HF completely.

ArthurZucker

Thanks it's cleaner and relies on the same logic as openai for the number of languages! 🔥

src/transformers/models/whisper/convert_openai_to_hf.py

Co-authored-by: Arthur <[email protected]>

sanchit-gandhi

Thanks for your contribution @flyingleafe! Looks pretty much ready to go from me. Just one small comment about the generation config below

sanchit-gandhi · 2023-11-08T16:54:29Z

src/transformers/models/whisper/convert_openai_to_hf.py

@@ -51,6 +60,20 @@
 }


+def _get_generation_config(is_multilingual: bool, num_languages: int = 100) -> GenerationConfig:


Thanks for adding this! The only generation config attribute that is checkpoint specific is the alignment heads: https://gist.github.com/hollance/42e32852f24243b748ae6bc1f985b13a

The alignment can only really be worked out by looking at the cross-attention plots: https://github.com/openai/whisper/blob/main/notebooks/Multilingual_ASR.ipynb

=> since it's checkpoint specific, I think we should remove this attribute from the generation config. The user will then be prompted to set it themselves if they require word-level timestamps:

transformers/src/transformers/models/whisper/modeling_whisper.py

Line 1973 in ef71673

if not hasattr(generation_config, "alignment_heads"):

This just requires adding the following three lines of code before we return the generation config:

generation_config = GenerationConfig.from_pretrained(repo) if hasattr(generation_config, "alignment_heads"): delattr(generation_config, "alignment_heads") return generation_config

WDYT @flyingleafe @ArthurZucker?

Agreed 😉 but it's also kinda specific to word timestamps so can add a comment

@sanchit-gandhi The alignment heads can be copy-pasted right from the OpenAI repository, without looking at the cross-attention plots. They are provided there in quite a compact way.

Let us set up the alignment heads directly from this dictionary if the user provided the version of OpenAI model, and skip setting those (with a warning to the user) if the checkpoint is custom.

Sure! If you have a clean way of determining whether the checkpoint is 'official' or 'custom' this works!

flyingleafe · 2023-11-10T06:28:32Z

@sanchit-gandhi
Basically what I did is setting the alignment head appropriately if the user provided Whisper model version instead of a local checkpoint, and not setting them (with a warning) otherwise.
It could be possible to also detect if the local checkpoint is equivalent to the official one by checking the hash, but that is probably a non-issue, I cannot think of a genuine use case when the user has the OpenAI checkpoint saved locally but is unable/unwilling to simply re-download it.

src/transformers/models/whisper/convert_openai_to_hf.py

flyingleafe · 2023-11-13T08:18:47Z

@sanchit-gandhi Your point is valid - why do extra work if we are downloading generation configs from HF hub anyway.
Removed all logic related to that, simply preserving the alignment heads in the config if the original checkpoint is downloaded.

flyingleafe · 2023-11-14T06:41:54Z

@sanchit-gandhi People complain in the downstream community projects that they expect tokenizer files in fast format (tokenizer.json) to be also present in the HF checkpoint.

I added a couple of lines here for conversion and export of fast tokenizer as well. Only you and your colleagues can add that to the official checkpoint though.

flyingleafe · 2023-11-16T05:55:57Z

@sanchit-gandhi bump, is that good for merge?

sanchit-gandhi

LGTM @flyingleafe! Just one super minor update then good to merge!

sanchit-gandhi · 2023-11-16T11:07:53Z

src/transformers/models/whisper/convert_openai_to_hf.py

@@ -154,6 +201,9 @@ def convert_openai_whisper_to_tfms(checkpoint_path, pytorch_dump_folder_path):
    tie_embeds = True
    ffn_dim = state_dict["decoder.layers.0.fc1.weight"].shape[0]

+    # a hacky way to properly set up the bos/eos/pad token ids in the model
+    endoftext_id = 50257 if dimensions["n_vocab"] > 51865 else 50256
+
    config = WhisperConfig(


Nice! The only missing config to update here is the decoder_start_token_id (endoftext_id + 1)

@sanchit-gandhi done

flyingleafe · 2023-11-20T06:29:12Z

@sanchit-gandhi Your last suggestion has been done three days ago, let's merge if good to go

ArthurZucker · 2023-11-20T16:36:42Z

Thanks for bearing with both of us 😉

Enable large-v3 downloading and update language list

306df51

ArthurZucker mentioned this pull request Nov 7, 2023

[Whisper] Add conversion script for the tokenizer #27338

Merged

ArthurZucker reviewed Nov 7, 2023

View reviewed changes

amyeroberts mentioned this pull request Nov 7, 2023

OpenAI to HF: Add large-v3 to conversion script #27335

Closed

5 tasks

flyingleafe marked this pull request as ready for review November 7, 2023 12:52

flyingleafe added 2 commits November 7, 2023 12:58

Merge branch 'main' into whisper-large-v3

ed93737

Fix type annotation

e18a945

flyingleafe added 2 commits November 7, 2023 13:12

make fixup

4d81614

Export Whisper feature extractor

ab7ed2c

flyingleafe requested a review from ArthurZucker November 7, 2023 14:06

flyingleafe added 2 commits November 7, 2023 14:12

Fix error after extractor loading

2a38115

Merge branch 'main' into whisper-large-v3

a1c4214

ArthurZucker reviewed Nov 7, 2023

View reviewed changes

Do not use pre-computed mel filters

e9eb8cc

ArthurZucker reviewed Nov 7, 2023

View reviewed changes

sanchit-gandhi approved these changes Nov 7, 2023

View reviewed changes

ArthurZucker approved these changes Nov 7, 2023

View reviewed changes

Save the full preprocessor properly

b292d91

Update docs

db1ce5c

ArthurZucker approved these changes Nov 8, 2023

View reviewed changes

src/transformers/models/whisper/convert_openai_to_hf.py Outdated Show resolved Hide resolved

Remove comment

d8a51eb

Co-authored-by: Arthur <[email protected]>

sanchit-gandhi approved these changes Nov 8, 2023

View reviewed changes

Add alignment heads consistent with each Whisper version

7d81358

sanchit-gandhi reviewed Nov 10, 2023

View reviewed changes

src/transformers/models/whisper/convert_openai_to_hf.py Outdated Show resolved Hide resolved

src/transformers/models/whisper/convert_openai_to_hf.py Outdated Show resolved Hide resolved

src/transformers/models/whisper/convert_openai_to_hf.py Outdated Show resolved Hide resolved

Remove alignment heads calculation

495bc9f

flyingleafe force-pushed the whisper-large-v3 branch from 7a6e18f to 495bc9f Compare November 13, 2023 08:16

Save fast tokenizer format as well

5c7f1c4

flyingleafe mentioned this pull request Nov 14, 2023

feat: code for whisper-large-v3 SYSTRAN/faster-whisper#548

Closed

amyeroberts mentioned this pull request Nov 14, 2023

Add OpenAI Whisper Large-v3 weights #27331

Closed

Fix slow to fast conversion

7a2af96

Fix bos/eos/pad token IDs in the model config

f9ee2a7

sanchit-gandhi approved these changes Nov 16, 2023

View reviewed changes

Add decoder_start_token_id to config

f22c396

ArthurZucker merged commit 87e217d into huggingface:main Nov 20, 2023

		@@ -51,6 +60,20 @@
		}


		def _get_generation_config(is_multilingual: bool, num_languages: int = 100) -> GenerationConfig:

[Whisper] Add large-v3 version support #27336

[Whisper] Add large-v3 version support #27336

Uh oh!

Conversation

flyingleafe commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

SantiDianaClibrain commented Nov 7, 2023

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

flyingleafe commented Nov 7, 2023

Uh oh!

ArthurZucker commented Nov 7, 2023

Uh oh!

flyingleafe commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker commented Nov 7, 2023

Uh oh!

flyingleafe commented Nov 7, 2023

Uh oh!

ArthurZucker commented Nov 7, 2023

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flyingleafe commented Nov 7, 2023

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Nov 7, 2023

Uh oh!

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi Nov 7, 2023

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

flyingleafe commented Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyingleafe Nov 9, 2023

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi Nov 9, 2023

Choose a reason for hiding this comment

Uh oh!

flyingleafe commented Nov 10, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Whisper] Add `large-v3` version support #27336

[Whisper] Add `large-v3` version support #27336

flyingleafe commented Nov 7, 2023 •

edited

Loading

flyingleafe commented Nov 7, 2023 •

edited

Loading

flyingleafe commented Nov 8, 2023 •

edited

Loading

sanchit-gandhi Nov 8, 2023 •

edited

Loading

ArthurZucker Nov 8, 2023 •

edited

Loading