-
Notifications
You must be signed in to change notification settings - Fork 10
Part Of Speech
Definition from Wikipedia:
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech,[1] based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph.
The POS tags for Greek language are based on the Universal POS tags. They are defined in the tag map file.
The Universal POS tags schema is followed because there is a public annotated Greek Dependency Treebank here which is based on the Universal POS tags and thus there was an important potential for a good kickoff for a spaCy model for POS tagging in Greek language.
The Universal POS tags (with their definitions in Greek language and some clarifications) are the following:
- ADJ: επίθετα
- ADV: επιρρήματα
- ADP: προθέσεις
- AUX: ρήματα για σχηματισμό χρόνων
- INTJ: επιφωνήματα
- PROPN: ουσιαστικά που χρησιμοποιούνται ως ονόματα
- VERB: ρήματα
- CCONJ: παρατακτικοί σύνδεσμοι
- SCONJ: υποτακτικοί σύνδεσμοι
- PART: μόρια
- PUNCT: σημεία στίξης
- SYM: σύμβολα
- NUM: αριθμητικά
- PRON: αντωνυμίες
- SPACE: κενό
- DET: άρθρα
- NOUN: ουσιαστικά
Note: In the Greek UD Treebank there are no annotations for the SPACE tag. There is a need for further annotation so the model can learn the SPACE tag. Coming soon.
There is also an extended list of Greek POS tags for a more sophisticated model that can be used in the future if there is appropriate annotated dataset. These tags are listed here for future reference and use.