CantoNLU

This repository accompanies our preprint: "CantoNLU: A benchmark for Cantonese natural language understanding", where we introduce a general language understanding benchmark in Cantonese, in a collaboration with York Hay Ng, Sophia Chan, Helena Zhao, and Annie En-Shiun Lee.

Download model first

This repository requires a local copy of a BERT model and Wikipedia dataset to run.

To download the resources, simply run

python download.py --lang=yue

where lang can be yue or wuu.

Model pre-training

To continually pre-train on Mandarin BERT, simply run

python run.py --pretrain --lang=yue

where lang can be yue or wuu. Additional flags are available--see run.py.

Fine-tuning

To fine-tune on POS and DEPS, the code requires the Cantonese UD file. Download the CoNLL-U file and place it in data/, then use the conllu_2_pos_dataset() function in utils.py.

Pre-trained model weights

The monolingual and transfer models are available at the following Google Drive links.

Monolingual model: https://drive.google.com/file/d/1wl4MYqPRxj5FPdHJR8SXC7Z7SZCFsLNw/view?usp=drive_link
Transfer model: https://drive.google.com/file/d/19QKyw-lzbNmU1_EcBUuDuO4TiEfFFuFF/view?usp=drive_link

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
openrice-senti		openrice-senti
wsd		wsd
.gitignore		.gitignore
acceptability-dataset.ipynb		acceptability-dataset.ipynb
canto-test.ipynb		canto-test.ipynb
download.py		download.py
evaluate.ipynb		evaluate.ipynb
main.py		main.py
nlptea-dataset.ipynb		nlptea-dataset.ipynb
readme.md		readme.md
requirements.txt		requirements.txt
run-pretrain.sh		run-pretrain.sh
run.py		run.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CantoNLU

Download model first

Model pre-training

Fine-tuning

Pre-trained model weights

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CantoNLU

Download model first

Model pre-training

Fine-tuning

Pre-trained model weights

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages