Skip to content

Commit

Permalink
Final results and README
Browse files Browse the repository at this point in the history
  • Loading branch information
ikergarcia1996 committed Sep 18, 2024
1 parent 8d21fab commit 32e3b48
Show file tree
Hide file tree
Showing 81 changed files with 203,414 additions and 141,636 deletions.
168 changes: 167 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,167 @@
# Odesia-Struct
<p align="center">
<br>
<img src="https://leaderboard.odesia.uned.es/themes/custom/b5subtheme/images/logos/ODESIA_Final-08-1.png" style="height: 250px;">
<br>
<h3 align="center">Evaluation of NLP models in Spanish</h3>
<h1 align="center">IXA Submission for the 2024 ODESIA Challenge</h1>



<p align="center">
<a href="https://twitter.com/intent/tweet?text=The+IXA+Code+for+Odesia:&url=https%3A%2F%2Fgithub.com%2Fhitz-zentroa%2FOdesia-Struct"><img alt="Twitter" src="https://img.shields.io/twitter/url?style=social&url=https%3A%2F%2Fgithub.com%2Fhitz-zentroa%2FOdesia-Struct"></a>
<a href="https://github.com/hitz-zentroa/Odesia-Struct/blob/main/LICENSE"><img alt="GitHub license" src="https://img.shields.io/github/license/hitz-zentroa/Odesia-Struct"></a>
<a href="https://huggingface.co/collections/HiTZ/odesia-challenge-2024-66ea8e1731b52eaa8f3d8cd8"><img alt="Pretrained Models" src="https://img.shields.io/badge/🤗HuggingFace-Pretrained Models-green"></a>
<a href="https://upload.wikimedia.org/wikipedia/commons/8/80/Comingsoon.png"><img alt="Paper" src="https://img.shields.io/badge/📖-Paper-orange"></a>
<br>
<a href="http://www.hitz.eus/"><img src="https://img.shields.io/badge/HiTZ-Basque%20Center%20for%20Language%20Technology-blueviolet"></a>
<a href="http://www.ixa.eus/?language=en"><img src="https://img.shields.io/badge/IXA-%20NLP%20Group-ff3333"></a>
<br>
<br>
</p>


This repository contains the IXA submission for the 2024 ODESIA Challenge.
- 📈 ODESIA Leaderboard: https://leaderboard.odesia.uned.es/leaderboard/challenge



# Explanation of the approach

Every task is converted into a text-to-text task in a format suitable for current state-of-the-art decoder-only models.

## ⤵️ Model Input
We format every task following the same prompt schema. The prompt includes the guidelines for the task, up to 20 few-shot examples randomly sampled from the train split, and the input to analyze.

```jinja
{{ Guidelines }}
Examples
--------
{% for example in examples %}
Input: {{ example.question }}
Output: {{ example.answer.model_dump_json() }}
{% endfor %}
--------
Now, analyze the following input:
Input: {{ question }}
```

## ➡️ Model output

Every task output is defined as a JSON schema using Pydantic. For example, for `DIPROMATS_2023`, which is a multi-label classification task, the output is defined as follows:


```python
class LabelEnum(str, Enum):
ad_populum = "ad-populum"
flag_waving = "flag-waving"
absurdity_appeal = "absurdity-appeal"
demonization = "demonization"
doubt = "doubt"
fear_appeals_destructive = "fear-appeals-destructive"
name_calling = "name-calling"
propaganda_slinging = "propaganda-slinging"
scapegoating = "scapegoating"
undiplomatic_assertiveness_whataboutism = (
"undiplomatic-assertiveness-whataboutism"
)
loaded_language = "loaded-language"
appeal_to_false_authority = "appeal-to-false-authority"
bandwagoning = "bandwagoning"
non_propaganda = "non-propaganda"

class Identification(BaseModel):
label: List[LabelEnum]
```


Using [🗒️ Outlines](https://github.com/dottxt-ai/outlines), we use guided generation to produce the output. At inference, the model is forced to produce a valid `JSON` output that is compliant with the Pydantic specification. For example:


```python
{"label":["ad-populum","loaded-language"]}
```


The Guidelines and output specification for every task are defined in [src/tasks](src/tasks)

## Model finetuning

We finetune a decoder-only model (gemma-2b or Llama3.1) in a multi-task setting. This means we train a single model that works for every task. Our pretrained models are available on 🤗HuggingFace: https://huggingface.co/collections/HiTZ/odesia-challenge-2024-66ea8e1731b52eaa8f3d8cd8

# Reproduce our results

## Requirements

You should install the following requirements. All of them can be installed with `pip install [requirement]`


```
torch
transformers
accelerate
deepspeed
outlines
pydantic
bitsandbytes
jinja2
```


You should unzip the `.zip` file in [data/](data/)

## Run Evaluation/Inference

You can evaluate any model on the development set with the following command:

```bash
python3 -m src.evaluate --tasks all --model_name HiTZ/Hermes-3-Llama-3.1-8B_ODESIA --output_dir results/finetune/Hermes-3-Llama-3.1-8B_ODESIA
```

To reproduce our leaderboard results, you can run inference on the test sets using the following command. The resulting output files are ready to be submitted to the ODESIA challenge:

```bash
python3 -m src.inference --tasks all --model_name HiTZ/Hermes-3-Llama-3.1-8B_ODESIA --output_dir results/finetune/Hermes-3-Llama-3.1-8B_ODESIA
```


> Warning: The test sets do not contain the labels. If you want to evaluate the predictions, you should submit them to the ODESIA leaderboard [https://leaderboard.odesia.uned.es/leaderboard/challenge](https://leaderboard.odesia.uned.es/leaderboard/challenge) or use the PyEvAll library [https://github.com/UNEDLENAR/PyEvALL/tree/main](https://github.com/UNEDLENAR/PyEvALL/tree/main)
### 4-bit quantization
If you do not have enough VRAM to run a model, you can use 4-bit quantization by adding the `--quantization` flag to the previous commands. Example:


```bash
python3 -m src.evaluate --tasks all --model_name meta-llama/Meta-Llama-3-70B-Instruct --output_dir results/zero-shot/Llama-3-70B-Instruct --quantization
```
> Warning: We randomly sample few-shot examples from the train split for every input. These few-shot examples vary each evaluation run, so the evaluation results may change slightly each time you run an evaluation.
> Warning: We randomly sample few-shot examples from the train split for every input. These few-shot examples vary with each evaluation run, so the evaluation results may change slightly each time you run an evaluation.
## Run Training

To finetune a model, you first need to define a `Training config`. Config examples for LLama3.1 and Gemma using Full-Finetuning and LoRA are available in the [train_configs/](train_configs/) directory. Full-Finetuning will achieve slightly better results but requires a lot of VRAM (We use 4x A100 80GB). LoRA uses much less VRAM and supports model quantization, so it can be run on a single GPU.

We use Deepspeed to split the model across 4 x A100 80GB GPUs. You can reproduce our fine-tuning results with the following command:

```bash
export PYTHONPATH="$PYTHONPATH:$PWD"
accelerate launch --config_file train_configs/deepspeed.json src/train.py train_configs/llama8b.yaml

```


If you want to run LoRA finetuning with a single GPU, you can use the following command:


```bash
python3 -m src.train train_configs/gemma2B_LoRa.yaml
```


> Warning: Our inputs are very long, as we use many few-shot examples. Therefore, training requires a lot of VRAM and might be very slow. You can reduce the number of few-shot examples by modifying the __init__ default parameter for every task in [src/tasks](src/tasks).
Loading

0 comments on commit 32e3b48

Please sign in to comment.