Skip to content

Code compound words consistently #5

@lmaurits

Description

@lmaurits

Currently rainbow (and possibly others, but probably not) is coded such that each compound form gets one cognate set for each part of the compound, in order to enable representing partial cognacy (e.g. Finnish sateenkaari and Karelian ukonkoari, where kaari and koari are related but sateen ("of rain") and ukon ("of thunder") are not.

This is:

  1. Inconsistent with our explanation of how we code cognates in the included documentation.
  2. Inconsistent with how we have coded other compound words with partial cognacy (e.g. vulture where partial cognacy between Finnish and Estonian is not represented)
  3. Problematic for phylogenetic inference because it introduces an exceptionally high number of singleton cognate sets can which skew rate/age estimates.

rainbow should be recoded so that each form is associated with only one cognate set, and only cognacy in both components of compounds counts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions