-
Notifications
You must be signed in to change notification settings - Fork 2
A note on extended and composite tags
The extended tags consist of features adapted from Biber (2006) and Biber et al. (1999). These features are semantic categories of different verbs, nouns, verbs, and adjectives. These semantic subclasses are subsets of the original tags in the simple tag set. For example, the various types of adjectives like evaluative adjectives (JJEVAL
) are based on JJAT
and JJPR
. So if you want to use the various semantic types of adjectives, e.g. JJEVAL
, in your analysis, this will overlap with the counts of JJAT
and JJPR
. In other words, JJEVAL
is also counted in the simple tag JJAT
and JJPR
. To eliminate this double counting, you need to use JJATother
and JJPRother
which are mutually exclusive with the various semantic subclasses of adjectives. The same logic applies to the semantic classes of adverbs, nouns and one semantic subclass of prepositions (PrepNSTNC
). In conclusion, there are simple tags for adjectives, adverbs, nouns, and prepositions and there are other
versions of these tags which have been added to avoid double counts (same is true for THSC
, THRC
and WHSC
). It also means that simple tags should not be used in combination with their other
variants, for example either use JJAT
or JJATother
, but never simultaneously.
The composite tags have been added to facilitate the gradual process of feature selection. Composite tags generally have the word all
at the end. For example WhVSTNCall
is the sum of WhVATT
, WhVFCT
, WhVLIK
, WhVCOM
. This choice is provided based on Egbert and Staples' (2019) suggestion. In simple words, you will select fine-grained features as a first step (in the above example the four subtypes of WH
clauses are the first option). If none of these features load on your dimensions or factors, the next step is to exclude the 4 different types and only use the combined version WhVSTNCall
and re-run the analysis. Obviously, you can select the all
variants from the very beginning and discard subclasses if they are so infrequent. However, the important thing is to avoid overlap at all costs. So you do not want to simultaneously use WhVSTNCall
along with the four sub classes it represents.
Composite tags that are specifically based on Biber (2006) are – like their individual siblings – counted based on simple tags. For example, WhVSTNCall
like its individual siblings WhVATT
, WhVFCT
, WhVLIK
, WhVCOM
is a subset of WHSC
. So it is not advisable to combine WHSC
and WhVSTNCall
in the same factor or principal component analysis. To avoid overlap use WHSCother
instead of WHSC
whenever you are using the semantic subclasses.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of Spoken and Written English. Longman Publications Group.
Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Benjamins.
Egbert, J., & Staples, S. (2019). Doing multidimensional analysis in SPSS, SAS and R. In T. Berber-Sardinha & M. V. Pinto (Eds.), Multidimensional analysis research methods and current Issues (pp. 125--144). Bloomsbury.