Skip to content

HiTZ at VarDial 2025 Shared Tasks: Norwegian slot and intent detection and dialect identification (NorSID)

Notifications You must be signed in to change notification settings

hitz-zentroa/vardial-2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HiTZ at VarDial 2025 Shared Tasks: Norwegian slot and intent detection and dialect identification (NorSID)

This repository contains the code and resources for the HiTZ team's participation in the VarDial 2025 Shared Tasks. The team participated in the following tasks: Intent Detection, Slot Filling and Dialect Identification.

Our group is formed by the following participants, all with affiliation to the Ixa Group of the HiTZ center of the University of the Basque Country: Jaione Bengoetxea, Mikel Zubillaga, Ekhi Azurmendi, Maite Heredia, Julen Etxaniz, Markel Ferro and Jeremy Barnes.

NorSID

The instructions and files of the NorSID shared task are in the NorSID directory. It contais the training, development and test data for the NorSID shared task, as well as the evaluation scripts.

Intent Detection & Slot Filling

Intent detection and slot filling code is in the sid directory. It contains code to replicate our experiments, including data download, preprocessing and model training.

  • intent_only: contains the code to finetune a BERT model of your choosing for intent detection.
  • slot_intent: contains the code of the combined slot and intent detection model used for the submissions.
  • submission: contains the final submission files for the intent detection and slot filling tasks.

Dialect Identification

Dialect identification code is in the dialect directory. It contains code to replicate our experiments, including data download, preprocessing and model training.

  • few_shot: contains the code to evaluate decoder models as few-shot.
  • finetune: contains the code to finetune encoder and decoder models dialect identification.
  • lex_map: contains the code for the baseline model based on lexical mapping.
  • lia: contains the code to download and preprocess the LIA dataset.
  • nb_samtale: contains the code to download and preprocess the NB Samtale dataset.
  • ndc: contains the code to download and preprocess the NDC dataset.
  • nordial: contains the code to download and preprocess the NorDial dataset. It also contains the code to train the dialect identification model.
  • norsid: contains the code to download and preprocess the NorSID dataset.
  • nts: contains the code to download and preprocess the NTS dataset.

About

HiTZ at VarDial 2025 Shared Tasks: Norwegian slot and intent detection and dialect identification (NorSID)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published