Skip to content

Commit

Permalink
update nllb recipe and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
francoishernandez committed Jan 31, 2025
1 parent 3fd59e6 commit f3de9cd
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 6 deletions.
24 changes: 23 additions & 1 deletion recipes/nllb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,34 @@

## Conversion

### 1. Sentencepiece with OpenNMT Tokenizer

```bash
eole convert HF --model_dir facebook/nllb-200-1.3B --output ./nllb-1.3b --token $HF_TOKEN --tokenizer onmt
```

### 2. HuggingFace Tokenizer

```bash
eole convert HF --model_dir facebook/nllb-200-1.3B --output ./nllb-1.3b --token $HF_TOKEN
```


## Inference

```bash
eole predict -c inference.yaml
echo "What is the weather like in Tahiti?" > test.en
```


### 1. Sentencepiece with OpenNMT Tokenizer

```bash
eole predict -c inference-pyonmttok.yaml
```

### 2. HuggingFace Tokenizer

```bash
eole predict -c inference-hf.yaml
```
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
model_path: "nllb-1.3b"
transforms: ["sentencepiece", "prefix"]
transforms: ["prefix", "huggingface_tokenize"]
transforms_configs:
prefix:
src_prefix: "</s> eng_Latn"
tgt_prefix: "deu_Latn"
sentencepiece:
src_subword_model: ./flores200_sacrebleu_tokenizer_spm.model
tgt_subword_model: ./flores200_sacrebleu_tokenizer_spm.model

huggingface_tokenize:
huggingface_model: facebook/nllb-200-1.3B

tgt_file_prefix: true

Expand Down
21 changes: 21 additions & 0 deletions recipes/nllb/inference-pyonmttok.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
model_path: "nllb-1.3b"
# transforms: ["sentencepiece", "prefix"]
transforms: ["prefix", "huggingface_tokenize"]
transforms_configs:
prefix:
src_prefix: "</s> eng_Latn"
tgt_prefix: "deu_Latn"
# sentencepiece:
# src_subword_model: ./flores200_sacrebleu_tokenizer_spm.model
# tgt_subword_model: ./flores200_sacrebleu_tokenizer_spm.model
huggingface_tokenize:
huggingface_model: facebook/nllb-200-1.3B

tgt_file_prefix: true

gpu_ranks: [0]
world_size: 1
beam_size: 5

src: test.en
output: test.de

0 comments on commit f3de9cd

Please sign in to comment.