gr-nlp-toolkit is a Python toolkit with state-of-the-art performance in (modern) Greek, supporting the following functionalities:
- Named Entity Recognition (NER)
- Part-of-Speech Tagging (POS Tagging)
- Morphological tagging
- Dependency parsing
- Greeklish to Greek transliteration ("kalimera" -> "καλημερα")
Apart from the python library (details below), you can also interact with gr-nlp-toolkit in a no-code fashion by visiting our web playground here: https://huggingface.co/spaces/AUEB-NLP/greek-nlp-toolkit-demo
Thanks to HuggingFace 🤗 for the GPUs.
The toolkit is supported for Python 3.9+.
You can install it from PyPI by executing the following in the command line:
pip install gr-nlp-toolkitTo use the toolkit, first initialize a Pipeline specifying which task processors you need. Each processor
annotates the text with a specific task's annotations.
For example:
- To obtain Part-of-Speech and Morphological Tagging annotations, add the
posprocessor - To obtain Named Entity Recognition annotations, add the
nerprocessor - To obtain Dependency Parsing annotations, add the
dpprocessor - To enable the transliteration from Greeklish to Greek, add the
g2gprocessor or theg2g_liteprocessor for a lighter but less accurate model (Greeklish to Greek transliteration example: "thessalonikh" -> "θεσσαλονίκη")
-
DP, POS, NER processors (input text in Greek)
from gr_nlp_toolkit import Pipeline nlp = Pipeline("pos,ner,dp") # Instantiate the Pipeline with the DP, POS and NER processors doc = nlp("Η Ιταλία κέρδισε την Αγγλία στον τελικό του Euro 2020.") # Apply the pipeline to a sentence in Greek
A
Documentobject is created and is annotated. The original text is tokenized and split to tokens# Iterate over the generated tokens for token in doc.tokens: print(token.text) # the text of the token print(token.ner) # the named entity label in IOBES encoding : str print(token.upos) # the UPOS tag of the token print(token.feats) # the morphological features for the token print(token.head) # the head of the token print(token.deprel) # the dependency relation between the current token and its head
token.neris set by thenerprocessor,token.uposandtoken.featsare set by theposprocessor andtoken.headandtoken.deprelare set by thedpprocessor.A small detail is that to get the
Tokenobject that is the head of another token you need to accessdoc.tokens[head-1]. The reason for this is that the enumeration of the tokens starts from 1 and when the fieldtoken.headis set to 0, that means the token is the root of the word. -
Greeklish to Greek Conversion (input text in Greeklish)
from gr_nlp_toolkit import Pipeline nlp = Pipeline("g2g") # Instantiate the pipeline with the g2g processor doc = nlp("O Volos kai h Larisa einai sth Thessalia") # Apply the pipeline to a sentence in Greeklish print(doc.text) # Access the transliterated text, which is "ο Βόλος και η Λάρισα είναι στη Θεσσαλία"
-
Use all the processors together (input text in Greeklish)
from gr_nlp_toolkit import Pipeline nlp = Pipeline("pos,ner,dp,g2g") # Instantiate the Pipeline with the G2G, DP, POS and NER processors doc = nlp("O Volos kai h Larisa einai sthn Thessalia") # Apply the pipeline to a sentence in Greeklish print(doc.text) # Print the transliterated text # Iterate over the generated tokens for token in doc.tokens: print(token.text) # the text of the token print(token.ner) # the named entity label in IOBES encoding : str print(token.upos) # the UPOS tag of the token print(token.feats) # the morphological features for the token print(token.head) # the head of the token print(token.deprel) # the dependency relation between the current token and its head
The software was presented as a paper at COLING 2025. Read the full technical report/paper here: https://aclanthology.org/2025.coling-demos.17/
If you use our toolkit, please cite it:
@inproceedings{loukas-etal-coling2025-greek-nlp-toolkit,
title = "{GR}-{NLP}-{TOOLKIT}: An Open-Source {NLP} Toolkit for {M}odern {G}reek",
author = "Loukas, Lefteris and
Smyrnioudis, Nikolaos and
Dikonomaki, Chrysa and
Barbakos, Spiros and
Toumazatos, Anastasios and
Koutsikakis, John and
Kyriakakis, Manolis and
Georgiou, Mary and
Vassos, Stavros and
Pavlopoulos, John and
Androutsopoulos, Ion",
editor = "Rambow, Owen and
Wanner, Leo and
Apidianaki, Marianna and
Al-Khalifa, Hend and
Eugenio, Barbara Di and
Schockaert, Steven and
Mather, Brodie and
Dras, Mark",
booktitle = "Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations",
month = jan,
year = "2025",
address = "Abu Dhabi, UAE",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.coling-demos.17/",
pages = "174--182",
}- The first time you use a processor, the models are downloaded from Hugging Face and stored into the .cache folder. The NER, DP and POS processors are each about 500 MB, while the G2G processor is about 1.2 GB in size.
- If the input text is already in Greek, the G2G (Greeklish-to-Greek) processor is skipped.
- If your machine has an accelerator but you want to run the process on the CPU, you can pass the flag
use_cpu=Trueto the Pipeline object. By default,use_cpuis set to False. - The Greeklish-to-Greek transliteration processor (ByT5) weights can be found in HuggingFace: https://huggingface.co/AUEB-NLP/ByT5_g2g
- The NER/POS/DP processors/weights can be found in HuggingFace: https://huggingface.co/AUEB-NLP/gr-nlp-toolkit
While many methodology details are shared in the GR-NLP-TOOLKIT paper publication @ COLING 2025 (see above), additional research details can be found here:
-
C. Dikonimaki, "A Transformer-based natural language processing toolkit for Greek -- Part of speech tagging and dependency parsing", BSc thesis, Department of Informatics, Athens University of Economics and Business, 2021. http://nlp.cs.aueb.gr/theses/dikonimaki_bsc_thesis.pdf (POS/DP/Morphological tagging processor)
-
N. Smyrnioudis, "A Transformer-based natural language processing toolkit for Greek -- Named entity recognition and multi-task learning", BSc thesis, Department of Informatics, Athens University of Economics and Business, 2021. http://nlp.cs.aueb.gr/theses/smyrnioudis_bsc_thesis.pdf (NER processor)
-
A. Toumazatos, J. Pavlopoulos, I. Androutsopoulos, & S. Vassos, "Still All Greeklish to Me: Greeklish to Greek Transliteration." In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 15309–15319). https://aclanthology.org/2024.lrec-main.1330/ (Greeklish-to-Greek processor)
