HiTZ at VarDial 2025 Shared Tasks: Norwegian slot and intent detection and dialect identification (NorSID)
This repository contains the code and resources for the HiTZ team's participation in the VarDial 2025 Shared Tasks. The team participated in the following tasks: Intent Detection, Slot Filling and Dialect Identification.
Our group is formed by the following participants, all with affiliation to the Ixa Group of the HiTZ center of the University of the Basque Country: Jaione Bengoetxea, Mikel Zubillaga, Ekhi Azurmendi, Maite Heredia, Julen Etxaniz, Markel Ferro and Jeremy Barnes.
The instructions and files of the NorSID shared task are in the NorSID
directory. It contais the training, development and test data for the NorSID shared task, as well as the evaluation scripts.
Intent detection and slot filling code is in the sid
directory. It contains code to replicate our experiments, including data download, preprocessing and model training.
intent_only
: contains the code to finetune a BERT model of your choosing for intent detection.slot_intent
: contains the code of the combined slot and intent detection model used for the submissions.submission
: contains the final submission files for the intent detection and slot filling tasks.
Dialect identification code is in the dialect
directory. It contains code to replicate our experiments, including data download, preprocessing and model training.
few_shot
: contains the code to evaluate decoder models as few-shot.finetune
: contains the code to finetune encoder and decoder models dialect identification.lex_map
: contains the code for the baseline model based on lexical mapping.lia
: contains the code to download and preprocess the LIA dataset.nb_samtale
: contains the code to download and preprocess the NB Samtale dataset.ndc
: contains the code to download and preprocess the NDC dataset.nordial
: contains the code to download and preprocess the NorDial dataset. It also contains the code to train the dialect identification model.norsid
: contains the code to download and preprocess the NorSID dataset.nts
: contains the code to download and preprocess the NTS dataset.