-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: code for whisper-large-v3 #548
Changes from 7 commits
9c378b6
0b9ab4d
bb052ea
a477d84
00594e6
4023924
bf2b670
3983fb7
1fab6ee
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
import itertools | ||
import json | ||
import logging | ||
import os | ||
import zlib | ||
|
@@ -92,8 +93,8 @@ def __init__( | |
|
||
Args: | ||
model_size_or_path: Size of the model to use (tiny, tiny.en, base, base.en, | ||
small, small.en, medium, medium.en, large-v1, large-v2, or large), a path to a converted | ||
model directory, or a CTranslate2-converted Whisper model ID from the Hugging Face Hub. | ||
small, small.en, medium, medium.en, large-v1, large-v2, large-v3, or large), a path to a | ||
converted model directory, or a CTranslate2-converted Whisper model ID from the HF Hub. | ||
When a size or a model ID is configured, the converted model is downloaded | ||
from the Hugging Face Hub. | ||
device: Device to use for computation ("cpu", "cuda", "auto"). | ||
|
@@ -113,6 +114,9 @@ def __init__( | |
are saved in the standard Hugging Face cache directory. | ||
local_files_only: If True, avoid downloading the file and return the path to the | ||
local cached file if it exists. | ||
feature_size: Number of mel filters to use for feature extraction. If not set, | ||
the number of mel filters is inferred from the model version. The first release | ||
used 80 bins, but the large-v3 model uses 128 bins. | ||
""" | ||
self.logger = get_logger() | ||
|
||
|
@@ -142,7 +146,25 @@ def __init__( | |
"openai/whisper-tiny" + ("" if self.model.is_multilingual else ".en") | ||
) | ||
|
||
self.feature_extractor = FeatureExtractor() | ||
feature_extractor_file = os.path.join(model_path, "preprocessor_config.json") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe move that into a specific method ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll try to get to this tomorrow. extremely limited bandwidth with American holidays currently |
||
if os.path.isfile(feature_extractor_file): | ||
with open(feature_extractor_file, "r") as f: | ||
config = json.load(f) | ||
feat_kwargs = { | ||
k: config[k] | ||
for k in [ | ||
"n_fft", | ||
"hop_length", | ||
"feature_size", | ||
"sampling_rate", | ||
"chunk_length", | ||
] | ||
Comment on lines
+155
to
+161
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor remark: in this new method could you make these parameters less "hard-code" way because they come from this class: https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/feature_extractor.py#L8-L12 |
||
if k in config | ||
} | ||
else: | ||
feat_kwargs = {} | ||
|
||
self.feature_extractor = FeatureExtractor(**feat_kwargs) | ||
self.num_samples_per_token = self.feature_extractor.hop_length * 2 | ||
self.frames_per_second = ( | ||
self.feature_extractor.sampling_rate // self.feature_extractor.hop_length | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,7 @@ | |
"large-v1": "guillaumekln/faster-whisper-large-v1", | ||
"large-v2": "guillaumekln/faster-whisper-large-v2", | ||
"large": "guillaumekln/faster-whisper-large-v2", | ||
"large-v3": "bababababooey/faster-whisper-large-v3", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. who is owning that ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Some user posted that link on issues. Best would be to keep model under official acc like "systran", maybe move all models there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, we are waiting for the release on CTranslate2, then push the new converted model tomorrow (with the fix by OpenNMT/CTranslate2#1546) to Systran organization There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI, several models are now available on Systran organization: https://huggingface.co/Systran (including the large-v3 converted by the last CTranslate2 3.22.0) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nguyendc-systran thank you! So essentially just waiting for @stillmatic to do the last fixes you mentioned before merge? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Purfview I think you accidentally pasted the same URL. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @AvivSham Fixed it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if I missed out any but in tokenizer.json, there is a difference in token 50363, nospeech vs nocaptions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
IMHO, yes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, the inference between openai/hf/faster_whisper are not exactly the same but very similar. I guess I was witnessing the differences between v2 and v3. |
||
} | ||
|
||
|
||
|
@@ -50,7 +51,7 @@ def download_model( | |
Args: | ||
size_or_id: Size of the model to download from https://huggingface.co/guillaumekln | ||
(tiny, tiny.en, base, base.en, small, small.en medium, medium.en, large-v1, large-v2, | ||
large), or a CTranslate2-converted model ID from the Hugging Face Hub | ||
large, large-v3), or a CTranslate2-converted model ID from the Hugging Face Hub | ||
(e.g. guillaumekln/faster-whisper-large-v2). | ||
output_dir: Directory where the model should be saved. If not set, the model is saved in | ||
the cache directory. | ||
|
@@ -76,6 +77,7 @@ def download_model( | |
|
||
allow_patterns = [ | ||
"config.json", | ||
"preprocessor_config.json", | ||
"model.bin", | ||
"tokenizer.json", | ||
"vocabulary.*", | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used anymore