Connecting Content and Logical Words, 2016-2019

Dautriche, Isabelle (2021). Connecting Content and Logical Words, 2016-2019. [Data Collection]. Colchester, Essex: UK Data Service. 10.5255/UKDA-SN-855112

As anyone who has learnt a foreign language or travelled abroad will have noticed, languages differ in the sounds they employ, the names they give to things, and the rules of grammar. However, linguists have long observed that, beneath this surface diversity, all human languages share a number of fundamental structural similarities. Most obviously, all languages use sounds, all languages have words, and all languages have a grammar. More subtly and more surprisingly, similarities can also be observed in more fine-grained linguistic features: for instance, George Zipf famously observed that, across multiple languages, short words tend also to be more frequent, and in my own recent work I have shown that languages prefer to use words that sound alike (e.g., cat, mat, rat, bat, fat, ...). Why do all languages exhibit these shared features?
This project aims to tackle exactly this key question by studying how languages are shaped by the human mind. In particular, I will explore how the way we learn language and use it to communicate drives the emergence of important features of lexicons, the set of all words in a language. To simulate the process of language change and evolution in the lab, I will use an experimental paradigm where an artificial language is passed between learners (language learning), and used by individuals to communicate with each other (language use). This paradigm has been successfully applied in previous research showing that key structural features of language can be explained as a consequence of repeated learning and use; my contribution will be to apply the same methods to study the evolution of the lexicon. I will then use two complementary techniques to evaluate the ecological validity of these results. First, do the artificial lexicons obtained after repeated learning and communication match the structure of lexicons found in real human languages? We will assess this by analyzing real natural language corpora using computational methods. Second, are these lexicons easily learnable by young children, the primary conduit of natural language transmission in the wild? This will be assessed using methods from developmental psychology to study word learning in toddlers.
The present project requires an unprecedented integration of techniques and concepts from language evolution, computational linguistics and developmental psychology, three fields that have so far worked independently to understand the structure of language. The outcomes of the project will be of vital interest for all these communities, and will provide insights into the foundational properties found in all human languages, as well as the nature of the constraints underlying language processing and language acquisition. This project will provide a springboard for my future work at the intersection of computational and experimental approaches to language and cognitive development.

Data description (abstract)

Content words (e.g. nouns and adjectives) are generally connected: there are no gaps in their denotations; no noun means ‘table or shoe’ or ‘animal or house’. We explore a formulation of connectedness which is applicable to content and logical words alike, and which compares well with the classic notion of monotonicity for quantifiers. On a first inspection, logical words satisfy this generalized version of the connectedness property at least as well as content words do — that is, both in terms of what may be observed in the lexicons of natural languages (although our investigations remain modest in that respect) and in terms of acquisition biases (with an artificial rule learning experiment). This reduces the putative differences between content and logical words, as well as the associated challenges that these differences would pose, e.g., for learners.

Data creators: