Add Qwen2.5 results and update README

hitz-zentroa · Feb 4, 2025 · 1e24e35 · 1e24e35
1 parent 894d2a1
commit 1e24e35
Show file tree

Hide file tree

Showing 32 changed files with 254,386 additions and 112 deletions.
diff --git a/README.md b/README.md
@@ -110,48 +110,99 @@ outlines
 pydantic
 bitsandbytes
 jinja2
+seqeval
 ```
 
+To reproduce our environment install the `requirements.txt` file:
 
-You should unzip the `.zip` file in [data/](data/)
+```bash
+pip install -r requirements.txt
+```
+
+
+You should unzip the `.zip` file in [data/](data/).  
+The expected data structure is 
+```
+data/
+    diann_2023/
+        DIANN_2023_T1_en.json
+        DIANN_2023_T1_en.json
+        ...
+    dipromats_2023/
+        ...
+    exist_2022/
+        ...
+    exist_2023/
+        ...
+    sqac_squad_2024
+        ...
+```
+
+## Models
+We have trained 3 models of different size:
+- HiTZ/Qwen2.5-14B-Instruct_ODESIA: https://huggingface.co/HiTZ/Qwen2.5-14B-Instruct_ODESIA
+- HiTZ/Hermes-3-Llama-3.1-8B_ODESIA: https://huggingface.co/HiTZ/Hermes-3-Llama-3.1-8B_ODESIA
+- HiTZ/gemma-2b-it_ODESIA: https://huggingface.co/HiTZ/gemma-2b-it_ODESIA
 
 ## Run Evaluation/Inference
 
 You can evaluate any model on the development set with the following command:
 
 ```bash
-python3 -m src.evaluate --tasks all --model_name HiTZ/Hermes-3-Llama-3.1-8B_ODESIA --output_dir results/finetune/Hermes-3-Llama-3.1-8B_ODESIA
+torchrun --standalone --nproc_per_node=1 src/evaluate.py --tasks all --model_name HiTZ/Qwen2.5-14B-Instruct_ODESIA --output_dir results/finetune/Qwen2.5-14B-Instruct
+
 ```
 
 To reproduce our leaderboard results, you can run inference on the test sets using the following command. The resulting output files are ready to be submitted to the ODESIA challenge:
 
 ```bash
-python3 -m src.inference --tasks all --model_name HiTZ/Hermes-3-Llama-3.1-8B_ODESIA --output_dir results/finetune/Hermes-3-Llama-3.1-8B_ODESIA
+torchrun --standalone --nproc_per_node=1 src/inference.py --tasks all --model_name HiTZ/Qwen2.5-14B-Instruct_ODESIA --output_dir results/finetune/Qwen2.5-14B-Instruct
+```
+
+You can also run inference in selected tasks
+```bash
+torchrun --standalone --nproc_per_node=1 src/inference.py \
+--tasks \
+exist_2022_t1_es \
+exist_2022_t2_es \
+exist_2023_t1_es \
+exist_2023_t2_es \
+exist_2023_t3_es \
+dipromats_2023_t1_es \
+dipromats_2023_t2_es \
+dipromats_2023_t3_es \
+diann_2023_t1_es \
+squad_2024_t1_es \
+--model_name HiTZ/Qwen2.5-14B-Instruct_ODESIA \
+--output_dir results/finetune/Qwen2.5-14B-Instruct
 ```
 
 
 > Warning: The test sets do not contain the labels. If you want to evaluate the predictions, you should submit them to the ODESIA leaderboard [https://leaderboard.odesia.uned.es/leaderboard/challenge](https://leaderboard.odesia.uned.es/leaderboard/challenge) or use the PyEvAll library [https://github.com/UNEDLENAR/PyEvALL/tree/main](https://github.com/UNEDLENAR/PyEvALL/tree/main)
 
+> Warning: We randomly sample few-shot examples from the train split for every input. These few-shot examples vary each evaluation run, so the evaluation results may change slightly  each time you run an evaluation. 
+ 
+
 ### 4-bit quantization
 If you do not have enough VRAM to run a model, you can use 4-bit quantization by adding the `--quantization` flag to the previous commands. Example:
 
 
 ```bash
-python3 -m src.evaluate --tasks all --model_name meta-llama/Meta-Llama-3-70B-Instruct --output_dir results/zero-shot/Llama-3-70B-Instruct --quantization
+torchrun --standalone --nproc_per_node=1 src/inference.py --quantization --tasks all --model_name HiTZ/Qwen2.5-14B-Instruct_ODESIA --output_dir results/finetune/Qwen2.5-14B-Instruct 
 ```
-> Warning: We randomly sample few-shot examples from the train split for every input. These few-shot examples vary each evaluation run, so the evaluation results may change slightly  each time you run an evaluation. 
 
-> Warning: We randomly sample few-shot examples from the train split for every input. These few-shot examples vary with each evaluation run, so the evaluation results may change slightly each time you run an evaluation. 
+> Warning: Quantization affects the model performance. Expect lower scores when running the model with 4 bit quantization. 
+ 
 
 ## Run Training
 
-To finetune a model, you first need to define a `Training config`. Config examples for LLama3.1 and Gemma using Full-Finetuning and LoRA are available in the [train_configs/](train_configs/) directory. Full-Finetuning will achieve slightly better results but requires a lot of VRAM (We use 4x A100 80GB). LoRA uses much less VRAM and supports model quantization, so it can be run on a single GPU. 
+To finetune a model, you first need to define a `Training config`. Config examples for LLama3.1 and Gemma using Full-Finetuning and LoRA are available in the [train_configs/](train_configs/) directory. Full-Finetuning will achieve slightly better results but requires a lot of VRAM (We use 4x A100 80GB for the 8B model and 8xA100 80GB for the 14B model). LoRA uses much less VRAM and supports model quantization, so it can be run on a single GPU. 
 
 We use Deepspeed to split the model across 4 x A100 80GB GPUs. You can reproduce our fine-tuning results with the following command:
 
 ```bash
 export PYTHONPATH="$PYTHONPATH:$PWD"
-accelerate launch --config_file train_configs/deepspeed.json src/train.py train_configs/llama8b.yaml
+accelerate launch --config_file train_configs/deepspeed.json src/train.py train_configs/qwen14B.yaml
 
 ```
 

diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,37 @@
+accelerate==1.3.0
+bitsandbytes==0.45.1
+deepspeed==0.16.3
+einops==0.8.0
+fastjsonschema==2.21.1
+flash-attn==2.7.3
+hf_transfer==0.1.9
+huggingface-hub==0.27.1
+Jinja2==3.1.5
+liger_kernel==0.5.2
+ninja==1.11.1.3
+numpy==1.26.4
+outlines==0.1.14
+outlines_core==0.1.26
+peft==0.14.0
+polars==1.20.0
+pydantic==2.10.6
+pydantic_core==2.27.2
+pyparsing==3.2.1
+regex==2024.11.6
+safetensors==0.5.2
+scikit-learn==1.6.1
+scipy==1.15.1
+sentencepiece==0.2.0
+seqeval==1.2.2
+timm==1.0.14
+tokenizers==0.21.0
+torch==2.5.1
+tqdm==4.67.1
+traitlets==5.14.3
+transformers @ git+https://github.com/huggingface/transformers@86d7564611d21731fc004b4e79e472d48c4b0fec
+triton==3.1.0
+trl @ git+https://github.com/huggingface/trl.git@f34b70a32ef2820d3fd5c5b1ff6d1fd1e7799f04
+typer==0.15.1
+types-python-dateutil==2.9.0.20241206
+typing_extensions==4.12.2
+wandb==0.19.4