Skip to content

Commit

Permalink
Add Qwen2.5 results and update README
Browse files Browse the repository at this point in the history
  • Loading branch information
ikergarcia1996 committed Feb 4, 2025
1 parent 894d2a1 commit 1e24e35
Show file tree
Hide file tree
Showing 32 changed files with 254,386 additions and 112 deletions.
67 changes: 59 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,48 +110,99 @@ outlines
pydantic
bitsandbytes
jinja2
seqeval
```

To reproduce our environment install the `requirements.txt` file:

You should unzip the `.zip` file in [data/](data/)
```bash
pip install -r requirements.txt
```


You should unzip the `.zip` file in [data/](data/).
The expected data structure is
```
data/
diann_2023/
DIANN_2023_T1_en.json
DIANN_2023_T1_en.json
...
dipromats_2023/
...
exist_2022/
...
exist_2023/
...
sqac_squad_2024
...
```

## Models
We have trained 3 models of different size:
- HiTZ/Qwen2.5-14B-Instruct_ODESIA: https://huggingface.co/HiTZ/Qwen2.5-14B-Instruct_ODESIA
- HiTZ/Hermes-3-Llama-3.1-8B_ODESIA: https://huggingface.co/HiTZ/Hermes-3-Llama-3.1-8B_ODESIA
- HiTZ/gemma-2b-it_ODESIA: https://huggingface.co/HiTZ/gemma-2b-it_ODESIA

## Run Evaluation/Inference

You can evaluate any model on the development set with the following command:

```bash
python3 -m src.evaluate --tasks all --model_name HiTZ/Hermes-3-Llama-3.1-8B_ODESIA --output_dir results/finetune/Hermes-3-Llama-3.1-8B_ODESIA
torchrun --standalone --nproc_per_node=1 src/evaluate.py --tasks all --model_name HiTZ/Qwen2.5-14B-Instruct_ODESIA --output_dir results/finetune/Qwen2.5-14B-Instruct

```

To reproduce our leaderboard results, you can run inference on the test sets using the following command. The resulting output files are ready to be submitted to the ODESIA challenge:

```bash
python3 -m src.inference --tasks all --model_name HiTZ/Hermes-3-Llama-3.1-8B_ODESIA --output_dir results/finetune/Hermes-3-Llama-3.1-8B_ODESIA
torchrun --standalone --nproc_per_node=1 src/inference.py --tasks all --model_name HiTZ/Qwen2.5-14B-Instruct_ODESIA --output_dir results/finetune/Qwen2.5-14B-Instruct
```

You can also run inference in selected tasks
```bash
torchrun --standalone --nproc_per_node=1 src/inference.py \
--tasks \
exist_2022_t1_es \
exist_2022_t2_es \
exist_2023_t1_es \
exist_2023_t2_es \
exist_2023_t3_es \
dipromats_2023_t1_es \
dipromats_2023_t2_es \
dipromats_2023_t3_es \
diann_2023_t1_es \
squad_2024_t1_es \
--model_name HiTZ/Qwen2.5-14B-Instruct_ODESIA \
--output_dir results/finetune/Qwen2.5-14B-Instruct
```


> Warning: The test sets do not contain the labels. If you want to evaluate the predictions, you should submit them to the ODESIA leaderboard [https://leaderboard.odesia.uned.es/leaderboard/challenge](https://leaderboard.odesia.uned.es/leaderboard/challenge) or use the PyEvAll library [https://github.com/UNEDLENAR/PyEvALL/tree/main](https://github.com/UNEDLENAR/PyEvALL/tree/main)
> Warning: We randomly sample few-shot examples from the train split for every input. These few-shot examples vary each evaluation run, so the evaluation results may change slightly each time you run an evaluation.

### 4-bit quantization
If you do not have enough VRAM to run a model, you can use 4-bit quantization by adding the `--quantization` flag to the previous commands. Example:


```bash
python3 -m src.evaluate --tasks all --model_name meta-llama/Meta-Llama-3-70B-Instruct --output_dir results/zero-shot/Llama-3-70B-Instruct --quantization
torchrun --standalone --nproc_per_node=1 src/inference.py --quantization --tasks all --model_name HiTZ/Qwen2.5-14B-Instruct_ODESIA --output_dir results/finetune/Qwen2.5-14B-Instruct
```
> Warning: We randomly sample few-shot examples from the train split for every input. These few-shot examples vary each evaluation run, so the evaluation results may change slightly each time you run an evaluation.

> Warning: We randomly sample few-shot examples from the train split for every input. These few-shot examples vary with each evaluation run, so the evaluation results may change slightly each time you run an evaluation.
> Warning: Quantization affects the model performance. Expect lower scores when running the model with 4 bit quantization.

## Run Training

To finetune a model, you first need to define a `Training config`. Config examples for LLama3.1 and Gemma using Full-Finetuning and LoRA are available in the [train_configs/](train_configs/) directory. Full-Finetuning will achieve slightly better results but requires a lot of VRAM (We use 4x A100 80GB). LoRA uses much less VRAM and supports model quantization, so it can be run on a single GPU.
To finetune a model, you first need to define a `Training config`. Config examples for LLama3.1 and Gemma using Full-Finetuning and LoRA are available in the [train_configs/](train_configs/) directory. Full-Finetuning will achieve slightly better results but requires a lot of VRAM (We use 4x A100 80GB for the 8B model and 8xA100 80GB for the 14B model). LoRA uses much less VRAM and supports model quantization, so it can be run on a single GPU.

We use Deepspeed to split the model across 4 x A100 80GB GPUs. You can reproduce our fine-tuning results with the following command:

```bash
export PYTHONPATH="$PYTHONPATH:$PWD"
accelerate launch --config_file train_configs/deepspeed.json src/train.py train_configs/llama8b.yaml
accelerate launch --config_file train_configs/deepspeed.json src/train.py train_configs/qwen14B.yaml

```

Expand Down
37 changes: 37 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
accelerate==1.3.0
bitsandbytes==0.45.1
deepspeed==0.16.3
einops==0.8.0
fastjsonschema==2.21.1
flash-attn==2.7.3
hf_transfer==0.1.9
huggingface-hub==0.27.1
Jinja2==3.1.5
liger_kernel==0.5.2
ninja==1.11.1.3
numpy==1.26.4
outlines==0.1.14
outlines_core==0.1.26
peft==0.14.0
polars==1.20.0
pydantic==2.10.6
pydantic_core==2.27.2
pyparsing==3.2.1
regex==2024.11.6
safetensors==0.5.2
scikit-learn==1.6.1
scipy==1.15.1
sentencepiece==0.2.0
seqeval==1.2.2
timm==1.0.14
tokenizers==0.21.0
torch==2.5.1
tqdm==4.67.1
traitlets==5.14.3
transformers @ git+https://github.com/huggingface/transformers@86d7564611d21731fc004b4e79e472d48c4b0fec
triton==3.1.0
trl @ git+https://github.com/huggingface/trl.git@f34b70a32ef2820d3fd5c5b1ff6d1fd1e7799f04
typer==0.15.1
types-python-dateutil==2.9.0.20241206
typing_extensions==4.12.2
wandb==0.19.4
Loading

0 comments on commit 1e24e35

Please sign in to comment.