The Oxford Aesop Corpus 2010

Kochanski, Greg and Loukina, Anastassia (2015). The Oxford Aesop Corpus 2010. [Data Collection]. Colchester, Essex: UK Data Archive. 10.5255/UKDA-SN-851830

When we say that music, poetry and language all have rhythms, what is meant by rhythm What accounts for the rhythmic differences between languages or dialects? Within the last decade, techniques for quantitative measurements of rhythm have begun to appear. So far, these rhythm measures require much careful manual marking of the speech, and they are highly dependent on the choice of words. So, they have been limited to carefully designed laboratory experiments.
The aim of our project is to systematically test and improve these rhythm measurements to be more reliable, easier, and robust enough to use outside the laboratory. This process will give us clues as to which sounds of speech contribute most to rhythm and ultimately allow us a better understanding of what we mean by the term rhythm. We aim to build tools that will open part of linguistics to quantitative measurements. They will allow researchers to work with more natural speech and perhaps allow medical uses.
Finally, we will use our optimised measures to produce the first survey of the rhythm of British English dialects. We will investigate how different the British dialects are, compared to the differences between English and other languages.

Data description (abstract)

The aim of our project is to systematically test and improve these rhythm measurements to be more reliable, easier, and robust enough to use outside the laboratory. This corpus of data consists of short paragraphs and children poetry read by native speakers of Southern British English, Russian (Moscow and St. Petersburg), Green (Athens), Taiwanese Mandarin, and French (Paris). The experimental data consists of speech recordings. It also contains the orthographic texts, automatically generated transcriptions and metadata files.

The research project involved reading text from a computer screen in laboratory experiments. The speakers involved were 20-28 years old, born to monolingual parents and had grown in their respective countries. When recording took place, all speakers were living in Oxford, UK. Those that were non-English participants had lived outside their home country for less than 4 years. Speakers also read up to 700 randomly selected short sentences which were intended to use for training an automatic speech recognition system.

Data creators:
Creator Name Affiliation ORCID (as URL)
Kochanski Greg University of Oxford
Loukina Anastassia University of Oxford
Sponsors: ESRC
Grant reference: RES-062-23-1323
Topic classification: Media, communication and language
Psychology
Keywords: speech, linguistics, linguistic analysis, multilingualism
Project title: Comparing dialects using statistical measures of rhythm.
Grant holders: Greg Kochanski, Elinor Keane
Project dates:
FromTo
1 August 200831 October 2010
Date published: 28 Apr 2015 14:18
Last modified: 19 Aug 2015 08:53

Available Files

Data

Documentation

Downloads

data downloads and page views since this item was published

View more statistics

Altmetric

Edit item (login required)

Edit Item Edit Item