WYRED - West Yorkshire Regional English Database 2016-2019

Gold, Erica (2020). WYRED - West Yorkshire Regional English Database 2016-2019. [Data Collection]. Colchester, Essex: UK Data Service. 10.5255/UKDA-SN-854354

Forensic speech science (FSS) - an applied sub-discipline of phonetics - has come to play a critical role in criminal cases involving voice evidence. Within FSS, Forensic speaker comparison (FSC) involves the comparison of a criminal recording (e.g. a threatening phone call), and a known suspect sample (e.g. a police interview). It is the role of an expert forensic phonetician to advise the trier of fact (e.g. judge or jury) on the likelihood of the two samples coming from the same speaker. There are two important elements involved in making such a comparison. First, the expert will carry out an assessment of the similarity of the speech characteristics in the criminal recording and the suspect sample. Second, the expert will assess the degree to which the same speech features for the criminal sample can be considered to be typical for a given speaker group. The speaker group will typically be defined by age, sex and geographical region (or accent). This second element is critical in providing context for the first; the suspect could have speech very similar to that in the criminal recording but this could be purely coincidental if they exhibit speech characteristics that are common to their speaker group. In contrast, if the criminal and suspect are observed as having speech features considered as being atypical for their speaker group then this would provide strong evidence for it being the same speaker.

One complication associated with FSC is that data to estimate whether a speech feature is typical or atypical for the given speaker group, commonly known as population data, are scarcely available. Population data are typically obtained by collecting a set of recordings containing the voices of a homogeneous group of speakers similar in age, sex, and geographical region (or accent). Unfortunately, the time and expense involved in the collection of population data means that forensic phoneticians face a huge challenge in obtaining such data for casework. This problem is further complicated by the high degree of variation that exists in speech across different speaker groups. Methodological research in the field of FSS has demonstrated that identifying the correct population for a FSC is vital in accurately representing the strength of evidence. It is largely for these reasons that experts argue that the biggest problem facing the field is the limited availability of population data.

The primary aim of this research is to explore a novel set of proposed methods that seek to remedy the aforementioned problems. The current lack of a platform on which to exchange data means that population data for a specific speaker group might have already been collected, unbeknown to experts in need of such data. This project intends to bring an end to this type of scenario by developing an international platform on which to share data, and also encouraging fellow researchers and experts to participate in data sharing. In addition, the project will explore the extent to which population data are generalizable; specifically, this will entail identifying the geographical (or regional accent) level at which speaker groups can be defined. For example, an expert might define a population group as having a Wakefield accent, when in actuality a population defined more generally as West Yorkshire would suffice. This would clearly have implications for the way in which population data would be collected.

In order to explore the issue of defining the population data, a West Yorkshire (WY) database of 180 male speakers will be collected (including 60 speakers from each of the three boroughs: Bradford, Kirklees, and Wakefield). The database will be used to test the sensitivity of the strength of evidence when FSC cases are simulated using varying definitions of accent for the population data. In addition to serving methodological purpose, the WY database will also serve as a practical resource for casework and research in its own right.

Data description (abstract)

The West Yorkshire Regional English Database (WYRED) consists of approximately 200 hours of high-quality audio recordings of 180 West Yorkshire (British English) speakers. All participants are male between the ages of 18-30, and are divided evenly (60 per region) across three boroughs within West Yorkshire (Northern England): Bradford, Kirklees, and Wakefield. Speakers participated in four spontaneous speaking tasks. The first two tasks relate to a mock crime where the participant speaks to a police officer (Research Assistant 1) followed by an accomplice (Research Assistant 2). Speakers returned a minimum of 6 days later at which point they were paired with someone from their borough and recorded having a conversation on any topics they wish. The final task is an experimental task in which speakers are asked to leave a voicemail message related to the fictitious crime from the first recording session. In total, each speaker participated in approximately 1 hour of spontaneous speech recordings. The primary motivation for the construction of the West Yorkshire Regional English Database (WYRED) was to provide a collection of regionally stratified speech recordings (by boroughs) from within a single, politically defined region (a county). The corpus aims to facilitate research on methodological issues surrounding the delimitation of the reference population when considering the typicality of a speech sample for a given forensic speaker comparison case, while also providing valuable insight into the West Yorkshire accent(s).

Data creators:

Creator Name	Affiliation	ORCID (as URL)
Gold Erica		https://orcid.org/0000-0003-3638-0511

Contributors:

Name	Affiliation	ORCID (as URL)
Earnshaw Kate	University of Huddersfield
Ross Sula	University of Huddersfield

Sponsors:

Economic and Social Research Council

Grant reference:

ES/N003268/1

Topic classification:

Media, communication and language
Law, crime and legal systems
Demography (population, vital statistics and censuses)
Society and culture

Keywords:

SPEECH, ENGLISH (LANGUAGE), ACCENTS (DIALECT), NATIVE LANGUAGE, MALES, SPEAKING, FORENSIC SCIENCE, ACCENTS, REGIONAL MINORITY LANGUAGES

Project title:

(WYRED) Using BIG data to understand the BIG picture: Overcoming heterogeneity in speech for forensic applications

Grant holders:

Erica Gold

Project dates:

From	To
15 February 2016	31 August 2019

Date published:

08 Sep 2020 12:13

Last modified:

10 Sep 2020 10:32

Coverage and Methodology

Collection period:

Date from:	Date to:
15 February 2016	31 August 2019

Geographical area:

West Yorkshire

Country:

United Kingdom

Data collection method:

WYRED consists of recordings from 180 male speakers, aged between 18 and 30 at the time of recording. All participants are British English speakers from Northern England in the county of West Yorkshire. The 180 speakers are divided between three of the five boroughs within West Yorkshire (Bradford, Kirklees, Wakefield), such that there are 60 speakers from each of the boroughs. Participants were assigned to a borough based on the postcode (zip code) where they grew up and went to primary and secondary school. All participants are native English speakers who grew up in English-only speaking households and did not speak any other languages. None of the participants reported any speech or hearing impairments. Speakers, however, were not included in the database if they were deemed to have spent a significant period (more than a few years) outside the area, had missing/broken front teeth or facial piercings that affected their speech.
Recruitment largely took place through email advertisements, but also via flyers, in class presentations, Facebook Ads, and referrals. All interested participants registered their interest in participating through an online survey that allowed us to screen for eligible participants. Speakers were then invited to participate via email. All participants were compensated for their participation.
In addition to each participant’s age, WYRED also contains metadata that may be of interest to other researchers. The following metadata has been collected for each participant: relationship status and where their partner was from, where the participants’ parents were from, employment status and type of work, highest level of education, smoker/Non-smoker, left or right handed, height and weight.

Observation unit:

Individual

Kind of data:

Text, Audio

Type of data:

Cohort and longitudinal studies, Experimental data

Resource language:

English

Access and Administration

Available Files

Downloads

data downloads and page views since this item was published

View more statistics

Altmetric

Related Resources

Publications

The ‘West Yorkshire Regional English Database’: Investigations into the Generalizability of Reference Populations for Forensic Speaker Comparison Casework

Website

Using BIG data to understand the BIG picture: Overcoming heterogeneity in speech for forensic applications

WYRED Project

Edit item (login required)

Edit Item

WYRED - West Yorkshire Regional English Database 2016-2019

Data description (abstract)

Available Files

Data and documentation bundle

Data

Documentation

Read me