Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion concepticondata/conceptlists.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -472,4 +472,5 @@ Soederholm-2013-420 Söderholm, Carina and Häyry, Emilia and Laine, Matti and K
Amsel-2012-559 Amsel, Ben D. and Urbach, Thomas P. and Kutas, Marta 2012 559 ratings English English https://doi.org/10.3758/s13428-012-0215-z Amsel2012 This list of 559 object concepts was rated for perceptual modalities (color, motion, auditory, olfactory, gustatory, graspability, pain) as well as familiarity. Ratings were given on an 8-point scale. 1028-1041
Eilola-2010-210 Eilola, Tiina M. and Havelka, Jelena 2010 210 ratings English, Finnish English, Finnish https://doi.org/10.3758/BRM.42.1.134 Eilola2010 This list of 210 words contains ratings of familiarity, valence, emotionality, offensiveness, and concreteness for Finnish and British English nouns, including 34 taboo words. Ratings were provided by native speakers of each language. For British English in particular, the aim of the study was to collect data comparable to the American English norms in the Affective Norms for English Words database [(Bradley & Lang, 1999)](:bib:Bradley1999). The present mappings were based on the English words. 134-140
Pache-2023-207 Pache, Matthias 2023 207 basic English Chibchan https://doi.org/10.1086/722240 Pache2023 This list was used for a comparative analysis of the Chibchan languages with the aim of revising their internal genealogical classification. The author claims that the list represents the Swadesh 207 list, however, it is unclear which list is meant exactly, since Swadesh never published a list containing 207 words. The list is likely very similar to [Comrie(1977)](:bib:Comrie1977) but uses slightly different glosses. The data for the Chibchan languages was gathered from existing sources on various Chibchan languages. 81-103
Guenther-2022-30 Günther, Fritz and Rinaldi, Luca 2022 30 norms English global https://doi.org/10.1038/s41598-022-12027-5 Guenther2022 This dataset combines cross-linguistic lexical frequency information for body-part terms with anatomical and neurobiological measures of body representation. For each concept, lexical forms are provided for a wide range of languages (Amharic, Arabic, Bengali, Chinese, Croatian, Czech, Dutch, English, German, Greek, Hebrew, Hindi, Hungarian, Italian, Japanese, Latin, Latvian, Malay, Polish, Portuguese, Russian, Somali, Spanish, Swahili, Tagalog, Tamil, Turkish, Urdu, and Yoruba). For each language, separate variables encode the base form, the plural form, overall corpus frequency, and frequency per million words, derived from language-specific corpora. Additionally, the list contains measures of cortical representational size and physical body surface area, which is encoded using anterior–posterior (AP) distinctions. Further, different definitions of "arm" were used: one unit spanning shoulder to wrist (with the hand treated separately), or upper arm (shoulder to elbow) and forearm (elbow to wrist) treated separetely. 1-13
Guenther-2022-30 Günther, Fritz and Rinaldi, Luca 2022 30 norms English global https://doi.org/10.1038/s41598-022-12027-5 Guenther2022 This dataset combines cross-linguistic lexical frequency information for body-part terms with anatomical and neurobiological measures of body representation. For each concept, lexical forms are provided for a wide range of languages (Amharic, Arabic, Bengali, Chinese, Croatian, Czech, Dutch, English, German, Greek, Hebrew, Hindi, Hungarian, Italian, Japanese, Latin, Latvian, Malay, Polish, Portuguese, Russian, Somali, Spanish, Swahili, Tagalog, Tamil, Turkish, Urdu, and Yoruba). For each language, separate variables encode the base form, the plural form, overall corpus frequency, and frequency per million words, derived from language-specific corpora. Additionally, the list contains measures of cortical representational size and physical body surface area, which is encoded using anterior–posterior (AP) distinctions. Further, different definitions of "arm" were used: one unit spanning shoulder to wrist (with the hand treated separately), or upper arm (shoulder to elbow) and forearm (elbow to wrist) treated separetely. 1-13
Khalid-2020-200 Khalid, Zoya 2020 200 basic English Asur https://aclanthology.org/2020.icon-main.6.pdf Khalid2020 This list of 200 words covers some basic lexical items for the language Asur (or Asuri). 45-47
201 changes: 201 additions & 0 deletions concepticondata/conceptlists/Khalid-2020-200.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
ID NUMBER ENGLISH CONCEPTICON_ID CONCEPTICON_GLOSS
Khalid-2020-200-1 1 I 1209 I
Khalid-2020-200-2 2 you (singular) 1215 THOU
Khalid-2020-200-3 3 he 2642 HE OR SHE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to "he" unless you have SUPERPERFECT evidence that the "he" in that language also means "she". And then make sure it does not mean "it" as well.

Copy link
Contributor Author

@abishekjs abishekjs Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means 'He' and 'She' but I am not sure if it also means 'It'. I checked the original source PDF, table 1 'Pronouns in Asur' and additionally the Asur dictionary, page 6- https://b0def920-9729-447b-97c8-5f6e2e57c168.filesusr.com/ugd/3e3b0a_20c16ff844024dbea47b405720964b0d.pdf

But the language uses demonstratives distinguishing animacy. 'It' could be referenced as 'this' or 'that' (inanimate).

Khalid-2020-200-4 4 we 1131 WE (INCLUSIVE)
Khalid-2020-200-5 5 you (plural) 1213 YOU
Khalid-2020-200-6 6 they 817 THEY
Khalid-2020-200-7 7 this 1214 THIS
Khalid-2020-200-8 8 that 78 THAT
Khalid-2020-200-9 9 here 136 HERE
Khalid-2020-200-10 10 there 1937 THERE
Khalid-2020-200-11 11 who 1235 WHO
Khalid-2020-200-12 12 what 1236 WHAT
Khalid-2020-200-13 13 where 1237 WHERE
Khalid-2020-200-14 14 when 1238 WHEN
Khalid-2020-200-15 15 how 1239 HOW
Khalid-2020-200-16 16 not 1240 NOT
Khalid-2020-200-17 17 all 98 ALL
Khalid-2020-200-18 18 many 1198 MANY
Khalid-2020-200-19 19 some 1241 SOME
Khalid-2020-200-20 20 few 1242 FEW
Khalid-2020-200-21 21 one 1493 ONE
Khalid-2020-200-22 22 two 1498 TWO
Khalid-2020-200-23 23 three 492 THREE
Khalid-2020-200-24 24 four 1500 FOUR
Khalid-2020-200-25 25 five 493 FIVE
Khalid-2020-200-26 26 big 1202 BIG
Khalid-2020-200-27 27 long 1203 LONG
Khalid-2020-200-28 28 wide 1243 WIDE
Khalid-2020-200-29 29 thick 1244 THICK
Khalid-2020-200-30 30 heavy 1210 HEAVY
Khalid-2020-200-31 31 small 1246 SMALL
Khalid-2020-200-32 32 tall 711 TALL
Khalid-2020-200-33 33 short 1645 SHORT
Khalid-2020-200-34 34 narrow 1267 NARROW
Khalid-2020-200-35 35 thin 2308 THIN
Khalid-2020-200-36 36 girl 1646 GIRL
Khalid-2020-200-37 37 boy 1366 BOY
Khalid-2020-200-38 38 man (human being) 683 PERSON
Khalid-2020-200-39 39 child 2099 CHILD
Khalid-2020-200-40 40 wife 1199 WIFE
Khalid-2020-200-41 41 husband 1200 HUSBAND
Khalid-2020-200-42 42 mother 1216 MOTHER
Khalid-2020-200-43 43 father 1217 FATHER
Khalid-2020-200-44 44 animal 619 ANIMAL
Khalid-2020-200-45 45 fish 227 FISH
Khalid-2020-200-46 46 bird 937 BIRD
Khalid-2020-200-47 47 dog 2009 DOG
Khalid-2020-200-48 48 deer 1936 DEER
Khalid-2020-200-49 49 rabbit 1136 RABBIT
Khalid-2020-200-50 50 goat 1502 GOAT
Khalid-2020-200-51 51 pig 1337 PIG
Khalid-2020-200-52 52 louse 1392 LOUSE
Khalid-2020-200-53 53 snake 730 SNAKE
Khalid-2020-200-54 54 tree 906 TREE
Khalid-2020-200-55 55 forest 420 FOREST
Khalid-2020-200-56 56 stick 1295 STICK
Khalid-2020-200-57 57 fruit 1507 FRUIT
Khalid-2020-200-58 58 mango 2398 MANGO
Khalid-2020-200-59 59 seed 714 SEED
Khalid-2020-200-60 60 leaf 628 LEAF
Khalid-2020-200-61 61 root 670 ROOT
Khalid-2020-200-62 62 bark (of a tree) 1204 BARK
Khalid-2020-200-63 63 flower 239 FLOWER
Khalid-2020-200-64 64 grass 606 GRASS
Khalid-2020-200-65 65 rope 1218 ROPE
Khalid-2020-200-66 66 skin 2613 SKIN (HUMAN)
Khalid-2020-200-67 67 meat 634 MEAT
Khalid-2020-200-68 68 blood 946 BLOOD
Khalid-2020-200-69 69 bone 1394 BONE
Khalid-2020-200-70 70 fat (noun) 323 FAT (ORGANIC SUBSTANCE)
Khalid-2020-200-71 71 egg 744 EGG
Khalid-2020-200-72 72 horn 1393 HORN (ANATOMY)
Khalid-2020-200-73 73 tail 1220 TAIL
Khalid-2020-200-74 74 hair 1040 HAIR
Khalid-2020-200-75 75 head 1256 HEAD
Khalid-2020-200-76 76 ear 1247 EAR
Khalid-2020-200-77 77 eye 1248 EYE
Khalid-2020-200-78 78 nose 1221 NOSE
Khalid-2020-200-79 79 mouth 674 MOUTH
Khalid-2020-200-80 80 lips 478 LIP
Khalid-2020-200-81 81 tooth 1380 TOOTH
Khalid-2020-200-82 82 tongue (organ) 1205 TONGUE
Khalid-2020-200-83 83 fingernail 1258 FINGERNAIL
Khalid-2020-200-84 84 leg 1297 LEG
Khalid-2020-200-85 85 knee 1371 KNEE
Khalid-2020-200-86 86 hand 1277 HAND
Khalid-2020-200-87 87 wing 1257 WING
Khalid-2020-200-88 88 belly 1251 BELLY
Khalid-2020-200-89 89 guts 1334 GUTS
Khalid-2020-200-90 90 neck 1333 NECK
Khalid-2020-200-91 91 back 1291 BACK
Khalid-2020-200-92 92 breast 1402 BREAST
Khalid-2020-200-93 93 heart 1223 HEART
Khalid-2020-200-94 94 liver 1224 LIVER
Khalid-2020-200-95 95 to drink 1401 DRINK
Khalid-2020-200-96 96 to eat 1336 EAT
Khalid-2020-200-97 97 to bite 1403 BITE
Khalid-2020-200-98 98 to suck 1421 SUCK
Khalid-2020-200-99 99 to spit 1440 SPIT
Khalid-2020-200-100 100 to vomit 1278 VOMIT
Khalid-2020-200-101 101 to blow 176 BLOW (WITH MOUTH)
Khalid-2020-200-102 102 to breathe 1407 BREATHE
Khalid-2020-200-103 103 to laugh 1355 LAUGH
Khalid-2020-200-104 104 to see 1409 SEE
Khalid-2020-200-105 105 to hear 1408 HEAR
Khalid-2020-200-106 106 to know 1410 KNOW (SOMETHING)
Khalid-2020-200-107 107 to think 2271 THINK
Khalid-2020-200-108 108 to fear 1419 FEAR (BE AFRAID)
Khalid-2020-200-109 109 to sleep 1585 SLEEP
Khalid-2020-200-110 110 to live 1422 BE ALIVE
Khalid-2020-200-111 111 to die 1494 DIE
Khalid-2020-200-112 112 to fight 1423 FIGHT
Khalid-2020-200-113 113 to hit 1433 HIT
Khalid-2020-200-114 114 to cut 1432 CUT
Khalid-2020-200-115 115 to split 1437 SPLIT
Khalid-2020-200-116 116 to stab 1434 STAB
Khalid-2020-200-117 117 to scratch 1436 SCRATCH
Khalid-2020-200-118 118 to dig 1418 DIG
Khalid-2020-200-119 119 to swim 1439 SWIM
Khalid-2020-200-120 120 to fly 1441 FLY (MOVE THROUGH AIR)
Khalid-2020-200-121 121 to walk 1443 WALK
Khalid-2020-200-122 122 to come 1446 COME
Khalid-2020-200-123 123 to lie (as in a bed) 215 LIE DOWN
Khalid-2020-200-124 124 to sit 1416 SIT
Khalid-2020-200-125 125 to turn (intransitive) 1444 TURN AROUND
Khalid-2020-200-126 126 to fall 1280 FALL
Khalid-2020-200-127 127 to catch 702 CATCH
Khalid-2020-200-128 128 to squeeze 1414 SQUEEZE
Khalid-2020-200-129 129 to wash 1453 WASH
Khalid-2020-200-130 130 to wipe 1454 WIPE
Khalid-2020-200-131 131 to push 1452 PUSH
Khalid-2020-200-132 132 to tie 1094 FASTEN
Khalid-2020-200-133 133 to sew 1457 SEW
Khalid-2020-200-134 134 to count 1420 COUNT
Khalid-2020-200-135 135 to say 1623 SPEAK
Khalid-2020-200-136 136 to sing 1261 SING
Khalid-2020-200-137 137 to play 1413 PLAY
Khalid-2020-200-138 138 to float 1574 FLOAT
Khalid-2020-200-139 139 to flow 2003 FLOW
Khalid-2020-200-140 140 to freeze 1431 FREEZE
Khalid-2020-200-141 141 to swell 1573 SWELL
Khalid-2020-200-142 142 sun 1343 SUN
Khalid-2020-200-143 143 moon 1313 MOON
Khalid-2020-200-144 144 star 1430 STAR
Khalid-2020-200-145 145 water 948 WATER
Khalid-2020-200-146 146 rain 658 RAIN (PRECIPITATION)
Khalid-2020-200-147 147 river 666 RIVER
Khalid-2020-200-148 148 pond 2035 POND
Khalid-2020-200-149 149 salt 1274 SALT
Khalid-2020-200-150 150 stone 857 STONE
Khalid-2020-200-151 151 sand 671 SAND
Khalid-2020-200-152 152 dust 2 DUST
Khalid-2020-200-153 153 earth 2159 GROUND
Khalid-2020-200-154 154 cloud 1489 CLOUD
Khalid-2020-200-155 155 fog 249 FOG
Khalid-2020-200-156 156 wind 960 WIND
Khalid-2020-200-157 157 ice 617 ICE
Khalid-2020-200-158 158 smoke 778 SMOKE (EXHAUST)
Khalid-2020-200-159 159 fire 221 FIRE
Khalid-2020-200-160 160 ash 646 ASH
Khalid-2020-200-161 161 to burn 2102 BURN
Khalid-2020-200-162 162 road/path 2457 PATH OR ROAD
Khalid-2020-200-163 163 mountain/hill 2118 MOUNTAIN OR HILL
Khalid-2020-200-164 164 red 156 RED
Khalid-2020-200-165 165 green 1425 GREEN
Khalid-2020-200-166 166 yellow 1424 YELLOW
Khalid-2020-200-167 167 white 1335 WHITE
Khalid-2020-200-168 168 black 163 BLACK
Khalid-2020-200-169 169 night 1233 NIGHT
Khalid-2020-200-170 170 day 1225 DAY (NOT NIGHT)
Khalid-2020-200-171 171 year 1226 YEAR
Khalid-2020-200-172 172 today 1283 TODAY
Khalid-2020-200-173 173 tomorrow 1329 TOMORROW
Khalid-2020-200-174 174 yesterday 1174 YESTERDAY
Khalid-2020-200-175 175 warm 1232 WARM
Khalid-2020-200-176 176 cold 1287 COLD
Khalid-2020-200-177 177 full 1429 FULL
Khalid-2020-200-178 178 new 1231 NEW
Khalid-2020-200-179 179 old 1229 OLD
Khalid-2020-200-180 180 good 1035 GOOD
Khalid-2020-200-181 181 bad 1292 BAD
Khalid-2020-200-182 182 rotten 1728 ROTTEN
Khalid-2020-200-183 183 dirty 1230 DIRTY
Khalid-2020-200-184 184 straight 1404 STRAIGHT
Khalid-2020-200-185 185 round 1395 ROUND
Khalid-2020-200-186 186 square 850 SQUARE
Khalid-2020-200-187 187 sharp (as a knife) 1396 SHARP
Khalid-2020-200-188 188 dull (as a knife) 379 BLUNT
Khalid-2020-200-189 189 wet 1726 WET
Khalid-2020-200-190 190 dry 1398 DRY
Khalid-2020-200-191 191 correct 1725 CORRECT (RIGHT)
Khalid-2020-200-192 192 near 1942 NEAR
Khalid-2020-200-193 193 far 1406 FAR
Khalid-2020-200-194 194 right 1019 RIGHT
Khalid-2020-200-195 195 left 244 LEFT
Khalid-2020-200-196 196 at 1461 AT
Khalid-2020-200-197 197 in 1460 IN
Khalid-2020-200-198 198 with 1340 WITH
Khalid-2020-200-199 199 and 1577 AND
Khalid-2020-200-200 200 name 1405 NAME
54 changes: 54 additions & 0 deletions concepticondata/conceptlists/Khalid-2020-200.tsv-metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
{
"@context": [
"http://www.w3.org/ns/csvw",
{
"@language": "en"
}
],
"dialect": {
"encoding": "utf-8-sig",
"delimiter": "\t",
"skipBlankRows": true
},
"tables": [
{
"tableSchema": {
"columns": [
{
"datatype": {
"base": "string",
"format": "[a-zA-Z]+\\-[0-9]{4}\\-[0-9]+[a-z]?\\-[0-9]+[a-z]?$"
},
"name": "ID"
},
{
"datatype": {
"base": "string",
"format": "[0-9\\.]+([a-z\\\u2013]+)?$"
},
"name": "NUMBER"
},
{
"datatype": {
"base": "integer",
"minimum": 1
},
"name": "CONCEPTICON_ID"
},
{
"datatype": "string",
"name": "CONCEPTICON_GLOSS"
},
{
"datatype": "string",
"name": "ENGLISH"
}
],
"primaryKey": [
"ID"
]
},
"url": "Khalid-2020-200.tsv"
}
]
}
15 changes: 14 additions & 1 deletion concepticondata/references/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -4964,4 +4964,17 @@ @article{Guenther2022
number = {8043},
pages = {1--13},
year = {2022}
}
}

@inproceedings{Khalid2020,
title = {A Grammatical Sketch of Asur: A {N}orth {M}unda language},
author = {Khalid, Zoya},
editor = {Bhattacharyya, Pushpak and Sharma, Dipti Misra and Sangal, Rajeev},
booktitle = {Proceedings of the 17th International Conference on Natural Language Processing (ICON)},
month = {dec},
year = {2020},
address = {Indian Institute of Technology Patna, Patna, India},
publisher = {NLP Association of India (NLPAI)},
url = {https://aclanthology.org/2020.icon-main.6/},
pages = {40--49}
}