set python tests to transformers 4.35 #1551

vince62s · 2023-11-16T11:32:47Z

No description provided.

vince62s · 2023-11-16T13:35:10Z

@homink so obviously this is the tranformers version which trigger the issue with wav2vec.
I let you investigate.

homink · 2023-11-17T03:59:08Z

Transformers version 4.35.2 still works fine for wav2vec2 testing. Can you guess what other package should i test further?

>>> import transformers
>>> transformers.__version__
'4.35.2'
>>> import torch
>>> torch.__version__
'2.0.1+cu117'
>>> import ctranslate2
>>> import os
>>> import numpy as np
>>> model_name="facebook/wav2vec2-large-robust-ft-swbd-300h"
>>> expected_transcription = [
...                     "MISTER QUILTER IS THE APOSSEL OF THE MIDDLE CLASSES AND"
...                     " WE ARE GLAD TO WELCOME HIS GOSPEL",
...                 ]
>>> converter = ctranslate2.converters.TransformersConverter(
...     model_name, load_as_float16="int8"
... )
>>> output_dir = converter.convert("ctranslate2_model")
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
>>> w2v2_model = transformers.Wav2Vec2ForCTC.from_pretrained(model_name)
>>> del w2v2_model.wav2vec2.encoder.layers
>>> del w2v2_model.wav2vec2.encoder.layer_norm
>>> torch.save(w2v2_model, output_dir + "/wav2vec2_partial.bin")
>>> w2v2_processor = transformers.Wav2Vec2Processor.from_pretrained(model_name)
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
>>> torch.save(w2v2_processor, output_dir + "/wav2vec2_processor.bin")
>>> device = "cuda" if os.environ.get("CUDA_VISIBLE_DEVICES") else "cpu"
>>> cpu_threads = int(os.environ.get("OMP_NUM_THREADS", 0))
>>> w2v2_model = torch.load(output_dir + "/wav2vec2_partial.bin").to(device)
>>> w2v2_processor = torch.load(output_dir + "/wav2vec2_processor.bin")
>>> ct2_w2v2_model = ctranslate2.models.Wav2Vec2(
...     output_dir,
...     device=device,
...     device_index=[0],
...     compute_type="int8",
...     intra_threads=cpu_threads,
...     inter_threads=1,
... )
>>> speech_array = np.load("/nobackup/hkwon2/3rdParty/CTranslate2_20231024/tests/data/audio/mr_quilter.npy")
>>> input_values = w2v2_processor(
...     speech_array,
...     padding=True,
...     return_tensors="pt",
...     sampling_rate=16000,
... ).input_values
2023-11-16 19:53:24.343000: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> with torch.no_grad():
...     extract_features = w2v2_model.wav2vec2.feature_extractor(
...         input_values.to(w2v2_model.device)
...     ).transpose(1, 2)
...     hidden_states, extract_features = w2v2_model.wav2vec2.feature_projection(
...         extract_features
...     )
...     position_embeddings = w2v2_model.wav2vec2.encoder.pos_conv_embed(
...         hidden_states
...     )
...     hidden_states = position_embeddings + hidden_states
...     # hidden_states = w2v2_model.encoder.dropout(hidden_states)
...     # Dropout(p=0.0, inplace=False) bypassed
... 
>>> if ct2_w2v2_model.device == "cuda":
...     hidden_states = hidden_states.cpu()
... else:
...     hidden_states.numpy()
... 
array([[[ 0.1693899 ,  2.2634823 , -1.2306788 , ..., -3.899944  ,
         -3.0574274 ,  6.486378  ],
        [ 1.0232238 ,  0.15435292, -1.3267676 , ..., -4.3173304 ,
          0.4245553 ,  4.386175  ],
        [ 2.9067817 ,  2.274394  , -2.0829062 , ...,  1.2998496 ,
         -0.8333913 ,  1.6946803 ],
        ...,
        [ 1.6564841 ,  1.2391592 , -3.0736432 , ...,  6.115755  ,
          5.7456884 ,  5.0317454 ],
        [ 2.7244632 ,  1.0191028 , -1.9882839 , ...,  9.378099  ,
         -0.4236964 ,  3.1421895 ],
        [ 2.2139602 ,  2.5174143 , -3.9946754 , ...,  3.1492426 ,
         -0.53378797,  5.6609497 ]]], dtype=float32)
>>> hidden_states = np.ascontiguousarray(hidden_states)
>>> hidden_states = ctranslate2.StorageView.from_array(hidden_states)
>>> to_cpu = (
...     ct2_w2v2_model.device == "cuda" and len(ct2_w2v2_model.device_index) > 1
... )
>>> ct2_output = ct2_w2v2_model.encode(
...     hidden_states,
...     to_cpu=to_cpu,
... )  # 24 x Wav2Vec2EncoderLayerStableLayerNorm processed
>>> if ct2_w2v2_model.device == "cuda":
...     hidden_states = torch.as_tensor(
...         ct2_output,
...         device=ct2_w2v2_model.device,
...     )
... else:
...     hidden_states = torch.as_tensor(
...         np.array(ct2_output),
...         dtype=torch.float32,
...         device=ct2_w2v2_model.device,
...     )
... 
>>> encoder_outputs = transformers.modeling_outputs.BaseModelOutput(
...     last_hidden_state=hidden_states,
...     hidden_states=None,
...     attentions=None,
... )
>>> hidden_states = encoder_outputs[0]
>>> outputs = transformers.modeling_outputs.Wav2Vec2BaseModelOutput(
...     last_hidden_state=hidden_states,
...     extract_features=extract_features,
...     hidden_states=encoder_outputs.hidden_states,
...     attentions=encoder_outputs.attentions,
... )
>>> hidden_states = outputs[0]
>>> # hidden_states = w2v2_model.dropout(hidden_states)
>>> # Dropout(p=0.0, inplace=False) bypassed
>>> with torch.no_grad():
...     logits = w2v2_model.lm_head(hidden_states.to(torch.float32))[0]
... 
>>> predicted_ids = torch.argmax(logits, dim=-1)
>>> transcription = w2v2_processor.decode(predicted_ids, output_word_offsets=True)
>>> assert transcription[0] == expected_transcription[0]
>>>

vince62s · 2023-11-17T07:00:57Z

you seem to use torch 2.0.1
But ca you try to replicate the test script which trigger the fail ?

homink · 2023-11-18T00:38:46Z

I ran python/tests/test_transformers.py but don't see any errors. I have torch 2.0.1+cu117 and CTranslate2 3.21.0. If this wav2vec2 testing really bothers, I don't mind removing it. Let me know how we could proceed.

(base) hkwon@ titan9 :/nobackup/hkwon2/3rdParty/CTranslate2-3.21.0 % ls python/tests/
conftest.py  requirements.txt   test_fairseq.py  test_opennmt_py.py  test_spec.py          test_transformers.py  test_utils.py
__pycache__  requirements.txt~  test_marian.py   test_opennmt_tf.py  test_storage_view.py  test_translator.py
(base) hkwon@ titan9 :/nobackup/hkwon2/3rdParty/CTranslate2-3.21.0 % python python/tests/test_transformers.py 
(base) hkwon@ titan9 :/nobackup/hkwon2/3rdParty/CTranslate2-3.21.0 % python
Python 3.9.12 (main, Apr  5 2022, 06:56:58) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctranslate2
>>> ctranslate2.__version__
'3.21.0'
>>> import torch
>>> torch.__version__
'2.0.1+cu117'
>>>

vince62s · 2023-11-18T08:33:39Z

you keep testing with a set up that is not the test config set up. what you need to do is to use the current set up, which is in the detailed log of the failing test. torch==2.1.0, ctranslate2==master, transformers==4.35

homink · 2023-11-19T07:41:57Z

Thanks for the comment. Now I see what went wrong. Saving & loading models methods are changed and doesn't support my script. I will see what I can do and will get back.

homink · 2023-11-19T08:10:36Z

OK. The following change will work. Can you apply these into your PR?

CTranslate2/python/tests/test_transformers.py

Line 987 in c5f46a3

torch.save(w2v2_model, output_dir + "/wav2vec2_partial.bin")

should be

w2v2_model.save_pretrained(output_dir + "/wav2vec2_partial.bin")

CTranslate2/python/tests/test_transformers.py

Line 993 in c5f46a3

w2v2_model = torch.load(output_dir + "/wav2vec2_partial.bin").to(device)

should be

w2v2_model = transformers.Wav2Vec2ForCTC.from_pretrained(output_dir + "/wav2vec2_partial.bin").to(device)
del w2v2_model.wav2vec2.encoder.layers
del w2v2_model.wav2vec2.encoder.layer_norm

test transformers 4.35

b5c0974

vince62s added 3 commits November 20, 2023 08:19

Merge branch 'master' into trfmers435

084ba04

homin commit

10d89ed

black

f1660a9

vince62s changed the title ~~test transformers 4.35~~ set python tests to transformers 4.35 Nov 20, 2023

vince62s merged commit 46f57e2 into OpenNMT:master Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set python tests to transformers 4.35 #1551

set python tests to transformers 4.35 #1551

vince62s commented Nov 16, 2023

vince62s commented Nov 16, 2023

homink commented Nov 17, 2023

vince62s commented Nov 17, 2023

homink commented Nov 18, 2023

vince62s commented Nov 18, 2023 •

edited

Loading

homink commented Nov 19, 2023

homink commented Nov 19, 2023

set python tests to transformers 4.35 #1551

set python tests to transformers 4.35 #1551

Conversation

vince62s commented Nov 16, 2023

vince62s commented Nov 16, 2023

homink commented Nov 17, 2023

vince62s commented Nov 17, 2023

homink commented Nov 18, 2023

vince62s commented Nov 18, 2023 • edited Loading

homink commented Nov 19, 2023

homink commented Nov 19, 2023

vince62s commented Nov 18, 2023 •

edited

Loading