Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion from Eole to CTranslate2 #72

Closed
ArtanisTheOne opened this issue Jul 13, 2024 · 14 comments
Closed

Conversion from Eole to CTranslate2 #72

ArtanisTheOne opened this issue Jul 13, 2024 · 14 comments
Labels
enhancement New feature or request

Comments

@ArtanisTheOne
Copy link

A lot of the OpenNMT-py ecosystem encourages the use of CTranslate2 downstream for efficient inference. Would really love this to be added to the new eole.
Doing some retraining of some custom multilingual NMT models and am using Eole to keep everything as up-to date as possible.

@francoishernandez francoishernandez added the enhancement New feature or request label Aug 23, 2024
@isanvicente
Copy link
Contributor

Hi! Any news on the ctranslate2 converter? I'd be happy to help if needed, I would need some guidance though. Would the opennmt-py-converter differ much for eole?

@vince62s
Copy link
Contributor

vince62s commented Nov 4, 2024

no should be similar but from safetensors file. Also if you have the will we'll need to add the estimator but I'm not sure @minhthuc2502 did the layer part already

@isanvicente
Copy link
Contributor

isanvicente commented Nov 12, 2024

Hi!

Sorry for taking so long to answer. I've been trying to implement this for the past few days. Code here: https://github.com/isanvicente/CTranslate2/blob/master/python/ctranslate2/converters/eole.py

So far, I've mapped the config and layers of the old ONMT models to the new eole format (starting from _get_model_spec_seq2seq). Conversion is executed properly, but when translating with the model all I get is gibberish. My guess is either I messed up with the layer mapping at some point (decoder layers most probably) or some config parameter is not parsed properly. Could you take a look and see if you can find what I missed? You sure now better what changes were implemented from onmt to eole.

Thanks!

@vince62s
Copy link
Contributor

many options have changed.
I suggest you first print here https://github.com/isanvicente/CTranslate2/blob/master/python/ctranslate2/converters/eole.py#L208
to check the content of checkpoint[opt] and look at the options

for instance all these https://github.com/isanvicente/CTranslate2/blob/master/python/ctranslate2/converters/eole.py#L24-L29
are set differently now

@vince62s
Copy link
Contributor

vince62s commented Dec 16, 2024

@isanvicente I just pushed a PR here: OpenNMT/CTranslate2#1832

I tested it with EuroLLM-9B-instruct, seems to run fine.
I have not tested a seq2seq model yet, if you got a chance, welcome.
may not work out of the box, but the logic is almost there.

@isanvicente
Copy link
Contributor

isanvicente commented Dec 17, 2024

Hi @vince62s,

Thanks! I Tried one of the seq2seq models, but I got errors similar to those I had on my first tries. This is what the layer names look like in my eole model:

['encoder.transformer_layers.0.input_layernorm.weight', 'encoder.transformer_layers.0.input_layernorm.bias', 'encoder.transformer_layers.0.self_attn.linear_keys.weight', 'encoder.transformer_layers.0.self_attn.linear_values.weight', 'encoder.transformer_layers.0.self_attn.linear_query.weight', 'encoder.transformer_layers.0.self_attn.final_linear.weight', 'encoder.transformer_layers.0.post_attention_layernorm.weight', 'encoder.transformer_layers.0.post_attention_layernorm.bias', 'encoder.transformer_layers.0.mlp.gate_up_proj.weight', 'encoder.transformer_layers.0.mlp.down_proj.weight', 'encoder.transformer_layers.1.input_layernorm.weight', 'encoder.transformer_layers.1.input_layernorm.bias', 'encoder.transformer_layers.1.self_attn.linear_keys.weight', 'encoder.transformer_layers.1.self_attn.linear_values.weight', 'encoder.transformer_layers.1.self_attn.linear_query.weight', 'encoder.transformer_layers.1.self_attn.final_linear.weight', 'encoder.transformer_layers.1.post_attention_layernorm.weight', 'encoder.transformer_layers.1.post_attention_layernorm.bias', 'encoder.transformer_layers.1.mlp.gate_up_proj.weight', 'encoder.transformer_layers.1.mlp.down_proj.weight', 'encoder.transformer_layers.2.input_layernorm.weight', 'encoder.transformer_layers.2.input_layernorm.bias', 'encoder.transformer_layers.2.self_attn.linear_keys.weight', 'encoder.transformer_layers.2.self_attn.linear_values.weight', 'encoder.transformer_layers.2.self_attn.linear_query.weight', 'encoder.transformer_layers.2.self_attn.final_linear.weight', 'encoder.transformer_layers.2.post_attention_layernorm.weight', 'encoder.transformer_layers.2.post_attention_layernorm.bias', 'encoder.transformer_layers.2.mlp.gate_up_proj.weight', 'encoder.transformer_layers.2.mlp.down_proj.weight', 'encoder.transformer_layers.3.input_layernorm.weight', 'encoder.transformer_layers.3.input_layernorm.bias', 'encoder.transformer_layers.3.self_attn.linear_keys.weight', 'encoder.transformer_layers.3.self_attn.linear_values.weight', 'encoder.transformer_layers.3.self_attn.linear_query.weight', 'encoder.transformer_layers.3.self_attn.final_linear.weight', 'encoder.transformer_layers.3.post_attention_layernorm.weight', 'encoder.transformer_layers.3.post_attention_layernorm.bias', 'encoder.transformer_layers.3.mlp.gate_up_proj.weight', 'encoder.transformer_layers.3.mlp.down_proj.weight', 'encoder.transformer_layers.4.input_layernorm.weight', 'encoder.transformer_layers.4.input_layernorm.bias', 'encoder.transformer_layers.4.self_attn.linear_keys.weight', 'encoder.transformer_layers.4.self_attn.linear_values.weight', 'encoder.transformer_layers.4.self_attn.linear_query.weight', 'encoder.transformer_layers.4.self_attn.final_linear.weight', 'encoder.transformer_layers.4.post_attention_layernorm.weight', 'encoder.transformer_layers.4.post_attention_layernorm.bias', 'encoder.transformer_layers.4.mlp.gate_up_proj.weight', 'encoder.transformer_layers.4.mlp.down_proj.weight', 'encoder.transformer_layers.5.input_layernorm.weight', 'encoder.transformer_layers.5.input_layernorm.bias', 'encoder.transformer_layers.5.self_attn.linear_keys.weight', 'encoder.transformer_layers.5.self_attn.linear_values.weight', 'encoder.transformer_layers.5.self_attn.linear_query.weight', 'encoder.transformer_layers.5.self_attn.final_linear.weight', 'encoder.transformer_layers.5.post_attention_layernorm.weight', 'encoder.transformer_layers.5.post_attention_layernorm.bias', 'encoder.transformer_layers.5.mlp.gate_up_proj.weight', 'encoder.transformer_layers.5.mlp.down_proj.weight', 'encoder.layer_norm.weight', 'encoder.layer_norm.bias', 'decoder.transformer_layers.0.input_layernorm.weight', 'decoder.transformer_layers.0.input_layernorm.bias', 'decoder.transformer_layers.0.self_attn.linear_keys.weight', 'decoder.transformer_layers.0.self_attn.linear_values.weight', 'decoder.transformer_layers.0.self_attn.linear_query.weight', 'decoder.transformer_layers.0.self_attn.final_linear.weight', 'decoder.transformer_layers.0.post_attention_layernorm.weight', 'decoder.transformer_layers.0.post_attention_layernorm.bias', 'decoder.transformer_layers.0.mlp.gate_up_proj.weight', 'decoder.transformer_layers.0.mlp.down_proj.weight', 'decoder.transformer_layers.0.precontext_layernorm.weight', 'decoder.transformer_layers.0.precontext_layernorm.bias', 'decoder.transformer_layers.0.context_attn.linear_keys.weight', 'decoder.transformer_layers.0.context_attn.linear_values.weight', 'decoder.transformer_layers.0.context_attn.linear_query.weight', 'decoder.transformer_layers.0.context_attn.final_linear.weight', 'decoder.transformer_layers.1.input_layernorm.weight', 'decoder.transformer_layers.1.input_layernorm.bias', 'decoder.transformer_layers.1.self_attn.linear_keys.weight', 'decoder.transformer_layers.1.self_attn.linear_values.weight', 'decoder.transformer_layers.1.self_attn.linear_query.weight', 'decoder.transformer_layers.1.self_attn.final_linear.weight', 'decoder.transformer_layers.1.post_attention_layernorm.weight', 'decoder.transformer_layers.1.post_attention_layernorm.bias', 'decoder.transformer_layers.1.mlp.gate_up_proj.weight', 'decoder.transformer_layers.1.mlp.down_proj.weight', 'decoder.transformer_layers.1.precontext_layernorm.weight', 'decoder.transformer_layers.1.precontext_layernorm.bias', 'decoder.transformer_layers.1.context_attn.linear_keys.weight', 'decoder.transformer_layers.1.context_attn.linear_values.weight', 'decoder.transformer_layers.1.context_attn.linear_query.weight', 'decoder.transformer_layers.1.context_attn.final_linear.weight', 'decoder.transformer_layers.2.input_layernorm.weight', 'decoder.transformer_layers.2.input_layernorm.bias', 'decoder.transformer_layers.2.self_attn.linear_keys.weight', 'decoder.transformer_layers.2.self_attn.linear_values.weight', 'decoder.transformer_layers.2.self_attn.linear_query.weight', 'decoder.transformer_layers.2.self_attn.final_linear.weight', 'decoder.transformer_layers.2.post_attention_layernorm.weight', 'decoder.transformer_layers.2.post_attention_layernorm.bias', 'decoder.transformer_layers.2.mlp.gate_up_proj.weight', 'decoder.transformer_layers.2.mlp.down_proj.weight', 'decoder.transformer_layers.2.precontext_layernorm.weight', 'decoder.transformer_layers.2.precontext_layernorm.bias', 'decoder.transformer_layers.2.context_attn.linear_keys.weight', 'decoder.transformer_layers.2.context_attn.linear_values.weight', 'decoder.transformer_layers.2.context_attn.linear_query.weight', 'decoder.transformer_layers.2.context_attn.final_linear.weight', 'decoder.transformer_layers.3.input_layernorm.weight', 'decoder.transformer_layers.3.input_layernorm.bias', 'decoder.transformer_layers.3.self_attn.linear_keys.weight', 'decoder.transformer_layers.3.self_attn.linear_values.weight', 'decoder.transformer_layers.3.self_attn.linear_query.weight', 'decoder.transformer_layers.3.self_attn.final_linear.weight', 'decoder.transformer_layers.3.post_attention_layernorm.weight', 'decoder.transformer_layers.3.post_attention_layernorm.bias', 'decoder.transformer_layers.3.mlp.gate_up_proj.weight', 'decoder.transformer_layers.3.mlp.down_proj.weight', 'decoder.transformer_layers.3.precontext_layernorm.weight', 'decoder.transformer_layers.3.precontext_layernorm.bias', 'decoder.transformer_layers.3.context_attn.linear_keys.weight', 'decoder.transformer_layers.3.context_attn.linear_values.weight', 'decoder.transformer_layers.3.context_attn.linear_query.weight', 'decoder.transformer_layers.3.context_attn.final_linear.weight', 'decoder.transformer_layers.4.input_layernorm.weight', 'decoder.transformer_layers.4.input_layernorm.bias', 'decoder.transformer_layers.4.self_attn.linear_keys.weight', 'decoder.transformer_layers.4.self_attn.linear_values.weight', 'decoder.transformer_layers.4.self_attn.linear_query.weight', 'decoder.transformer_layers.4.self_attn.final_linear.weight', 'decoder.transformer_layers.4.post_attention_layernorm.weight', 'decoder.transformer_layers.4.post_attention_layernorm.bias', 'decoder.transformer_layers.4.mlp.gate_up_proj.weight', 'decoder.transformer_layers.4.mlp.down_proj.weight', 'decoder.transformer_layers.4.precontext_layernorm.weight', 'decoder.transformer_layers.4.precontext_layernorm.bias', 'decoder.transformer_layers.4.context_attn.linear_keys.weight', 'decoder.transformer_layers.4.context_attn.linear_values.weight', 'decoder.transformer_layers.4.context_attn.linear_query.weight', 'decoder.transformer_layers.4.context_attn.final_linear.weight', 'decoder.transformer_layers.5.input_layernorm.weight', 'decoder.transformer_layers.5.input_layernorm.bias', 'decoder.transformer_layers.5.self_attn.linear_keys.weight', 'decoder.transformer_layers.5.self_attn.linear_values.weight', 'decoder.transformer_layers.5.self_attn.linear_query.weight', 'decoder.transformer_layers.5.self_attn.final_linear.weight', 'decoder.transformer_layers.5.post_attention_layernorm.weight', 'decoder.transformer_layers.5.post_attention_layernorm.bias', 'decoder.transformer_layers.5.mlp.gate_up_proj.weight', 'decoder.transformer_layers.5.mlp.down_proj.weight', 'decoder.transformer_layers.5.precontext_layernorm.weight', 'decoder.transformer_layers.5.precontext_layernorm.bias', 'decoder.transformer_layers.5.context_attn.linear_keys.weight', 'decoder.transformer_layers.5.context_attn.linear_values.weight', 'decoder.transformer_layers.5.context_attn.linear_query.weight', 'decoder.transformer_layers.5.context_attn.final_linear.weight', 'decoder.layer_norm.weight', 'decoder.layer_norm.bias', 'src_emb.embeddings.weight', 'src_emb.pe.pe', 'tgt_emb.embeddings.weight', 'tgt_emb.pe.pe', 'generator.weight', 'generator.bias']

What I've tried so far:

  • conversion complained about a few options from config (TransformerSpec.from_config()), present in TransformerDecoderSpec but not in TransformerEncoderSpec: specifically alibi, rotary_dim, rotary_interleave, num_heads_kv and sliding_window.

  • converter complained about not finding 'encoder.embeddings.pe.pe': Changed layer names in set_transformer_encoder

I've had to make a few more changes in encoder layer names, but still not there. Last error is:

Traceback (most recent call last):
  File "/home/inaki/venv_ct2_test/bin/ct2-eole-converter", line 8, in <module>
    sys.exit(main())
  File "/home/inaki/venv_ct2_test/lib/python3.10/site-packages/ctranslate2/converters/eole_ct2.py", line 344, in main
    args = parser.parse_args()
  File "/home/inaki/venv_ct2_test/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
    return self.convert(
  File "/home/inaki/venv_ct2_test/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 97, in convert
    model_spec.validate()
  File "/home/inaki/venv_ct2_test/lib/python3.10/site-packages/ctranslate2/specs/model_spec.py", line 513, in validate
    super().validate()
  File "/home/inaki/venv_ct2_test/lib/python3.10/site-packages/ctranslate2/specs/model_spec.py", line 138, in validate
    raise ValueError(
ValueError: Some required model attributes are not set:

encoder/layer_0/ffn/layer_norm/gamma
encoder/layer_0/ffn/layer_norm/beta
encoder/layer_1/ffn/layer_norm/gamma
encoder/layer_1/ffn/layer_norm/beta
encoder/layer_2/ffn/layer_norm/gamma
encoder/layer_2/ffn/layer_norm/beta
encoder/layer_3/ffn/layer_norm/gamma
encoder/layer_3/ffn/layer_norm/beta
encoder/layer_4/ffn/layer_norm/gamma
encoder/layer_4/ffn/layer_norm/beta
encoder/layer_5/ffn/layer_norm/gamma
encoder/layer_5/ffn/layer_norm/beta

I'll document it and try to get back to you as soon as possible.

@vince62s
Copy link
Contributor

vince62s commented Dec 17, 2024

I'll fix those
@isanvicente Can you try again ?

I would be interested by the speed you get because with a 9B LLM (EuroLM-9B-instruct) I am getting the same speed within eole vs ct2

@isanvicente
Copy link
Contributor

Yes! Got it working! I had to make two minor changes:

  1. Line 208: layer name must be "transformers_layers" instead of "transformers

  2. Line 235: spec.embeddings is a list for the encoder (this comes from onmt I assume, because ONMT could have several the possibility to add feature embeddings?). I added the following code to fix the issue:

+    embeddings_specs = spec.embeddings
+    ## encoder embeddings are stored in a list(onmt legacy), ONMT version could contain various embedding sources, not eole 
+    if isinstance(embeddings_specs, list):
+        embeddings_specs = embeddings_specs[0]
+    set_embeddings(embeddings_specs, variables, "%s.embeddings" % scope)

The whole diff looks like this:

diff --git a/python/ctranslate2/converters/eole_ct2.py b/python/ctranslate2/converters/eole_ct2.py
index 9b4c2fb3..68ddd628 100644
--- a/python/ctranslate2/converters/eole_ct2.py
+++ b/python/ctranslate2/converters/eole_ct2.py
@@ -205,7 +205,7 @@ def set_transformer_encoder(spec, variables):
     set_input_layers(spec, variables, "src_emb")
     set_layer_norm(spec.layer_norm, variables, "encoder.layer_norm")
     for i, layer in enumerate(spec.layer):
-        set_transformer_encoder_layer(layer, variables, "encoder.transformer.%d" % i)
+        set_transformer_encoder_layer(layer, variables, "encoder.transformer_layers.%d" % i)
 
 
 def set_transformer_decoder(spec, variables, with_encoder_attention=True):
@@ -232,7 +232,11 @@ def set_input_layers(spec, variables, scope):
     else:
         spec.scale_embeddings = False
 
-    set_embeddings(spec.embeddings, variables, "%s.embeddings" % scope)
+    embeddings_specs = spec.embeddings
+    ## encoder embeddings are stored in a list(onmt legacy), ONMT version could contain various embedding sources, not eole 
+    if isinstance(embeddings_specs, list):
+        embeddings_specs = embeddings_specs[0]
+    set_embeddings(embeddings_specs, variables, "%s.embeddings" % scope)

I'll come back later today with speed results. I need to tweak the inference parameters.

cheers!

@isanvicente
Copy link
Contributor

isanvicente commented Dec 18, 2024

Hi again @vince62s ,

I tested the speed of a Basque --> french model on nvidia RTX A5000 GPU. Compute type is float16 (training, conversion to CT2 and all inferences).

Inference parameters:

batch-size=64
max-length=300

CT2 is giving me 2x speed. Table below shows results in seconds for a 5000 sentence file.

beam-size ct2 eole
5 13.38 s 23.8 s
3 9.2 s 18.3 s

I'm cheking the actual outputs, because translations are not 100% equal (very close, but I'm observing a few differences here and there.

@vince62s
Copy link
Contributor

can you fix your stats table ? can you post your inference config ?

@isanvicente
Copy link
Contributor

isanvicente commented Dec 18, 2024

My inference config only contains transforms (eole_inference_config.yaml):

transforms: [onmt_tokenize]

transforms_configs:
    onmt_tokenize:
        src_subword_type: bpe
        tgt_subword_type: bpe
        src_subword_model: /mnt/nfs/NMT/eu-fr/v1.0/codes.eu-fr.bi
        tgt_subword_model: /mnt/nfs/NMT/eu-fr/v1.0/codes.eu-fr.bi
        src_onmttok_kwargs: {"mode": "conservative", "joiner": "■", "joiner_annotate": true, "case_markup": true, "support_prior_joiners": true}
        tgt_onmttok_kwargs: {"mode": "conservative", "joiner": "■", "joiner_annotate": true, "case_markup": true, "support_prior_joiners": true}
    filtertoolong:
        src_seq_length: 150
        tgt_seq_length: 150

I execute inference with the following command:

 $eole predict -model_path ~/NMT/eu-fr/EXP_eufr_back41bicln09_fulldev/models/step_670000 -src test_5000.eu.txt -output test_5000.eufr-eole-beam3.out -world_size 1 -gpu_ranks 0 --max_length 300 --batch_size 64 --beam_size 3 --config ~/NMT/eu-fr/test.yaml --report_time

CT2 model inference is done with a python script. I can't attach the script, so here is the relevant code:

translator = ctranslate2.Translator(args.model, device="cuda", device_index=[0],compute_type="bfloat16")

tokenizer = pyonmttok.Tokenizer(
    bpe_model_path=args.bpe_model,
    mode="conservative",
    joiner="■",
    joiner_annotate=True,
    case_markup=True,
    support_prior_joiners=True)

def tokenize_fn(text):
    return tokenizer.tokenize(text)[0]

def detokenize_fn(tokens):
    return tokenizer.detokenize(tokens)

print(translator.translate_file(
    os.path.join(args.root_dir,args.input_file), os.path.join(args.root_dir,args.output_file),
    max_batch_size=args.batch_size, beam_size=3, num_hypotheses=1, max_decoding_length=300,
    source_tokenize_fn=tokenize_fn,
    target_detokenize_fn=detokenize_fn
))

@vince62s
Copy link
Contributor

vince62s commented Dec 18, 2024

with my latest commit you should be able to run the same "predict" command line but with an extra flag -engine which defaults to "eole" but can take "ct2"
let me know if this works for you and report the same speed.

BUT: the ctranslate2 model should be in a subfolder "ctranslate2" of the model_path of the config file / command line

@isanvicente
Copy link
Contributor

isanvicente commented Dec 19, 2024

Hi @vince62s,

this took longer than expected, because the last commit requires pytorch >2.5 (I had 2.3.1), and I had to put together a new virtualenv.

So far I could not make the "predict " command run with ct2. First, predict with ct2 is hardcoded to "decoder" type models, so I got: RuntimeError: This model cannot be used as a sequence generator.

When I tried to set the model type to "encoder_decoder" it complained about not finding the model.bin, so I gave the ct2 path directly, but got stuck with a pydantic config error:

le "/home/inaki/eole-latest/eole/bin/run/__init__.py", line 42, in build_config
    config = cls.config_class(**config_dict)
  File "/home/inaki/venv_ct2_test/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for PredictConfig
  Unable to extract tag using discriminator 'architecture' [type=union_tag_not_found, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/union_tag_not_found

I can't dig deeper right now. I'll check it later.

@vince62s
Copy link
Contributor

my fault, will push a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants