The data in the file IJSLLheatmapdatawithspeakers.csv are ‘speaker scores’ for the 60 male speakers from the TUULS corpus, and 100 male speakers from the DyViS (Dynamic Variability in Speech) corpus (Nolan, McDougall, de Jong & Hudson 2009). The first 100 rows in the spreadsheet represent the DyViS speakers, who are labelled 001’ to 121’, as there are gaps in the sequence of number labels. The second block of 60 rows contains the data for the TUULS speakers (labelled using a ‘P’ prefix followed by a number). The columns are arranged according to the same sequence. The speaker scores were generated using an algorithm implemented in the Nuance Forensics Automatic Speaker Recognition (ASR) system, v. 11.1 (Nuance Communications 2018). In this context, a speaker score is a measure of the difference between a pair of statistical models each representing the resonance characteristics of an individual speaker’s voice. These models are arrived at through the extraction of ‘mel-frequency cepstral coefficients’ (MFCCs), a vector of values based upon a small number of parameters describing the long-term shape of the acoustic spectrum of the speaker’s voice. The analyses were based on 120-second-long extracts from the recordings of each of the 160 speakers. These samples were then divided into two 60-second-long halves, A and B. Nuance Forensics performed pairwise comparisons of the MFCC vectors for the A sample for every speaker against the B sample for every other speaker, as well as those of the same-speaker A and B samples (i.e. every speaker was also compared to himself). 60 seconds’ worth of net speech is more than sufficient as a basis from which to derive a reliable model for each speaker. The heatmap shown on page 14 of Watt et al. (2020) indicates the relative resemblances of each pair of speech samples (N=25,600). For obvious reasons, the highest level of resemblance is between the models for the A and B samples for individual speakers (represented on the diagonal from top left to bottom right). It will be noticed that overall there is greater similarity among the 100 DyViS speakers than there is among the TUULS speakers, but this is to be expected given the much greater homogeneity of the DyViS corpus, which is composed of recordings of young men of around the same age (late teens/early twenties) who were all students of the University of Cambridge at the time of recording, and who were selected specifically because they were good examplars of the Standard Southern British English accent (SSBE, also known as Received Pronunciation). They were also screened to ensure that no speakers with abnormal (e.g. pathological) voice qualities were included. The male half of the TUULS corpus, by contrast, is made up of recordings of the speech of men in two different age groups (young = 18-25 years; older = 45+) from three different urban centres in North-East England, viz. Middlesbrough, Sunderland and Newcastle upon Tyne. In the interests of gathering a representative and forensically-realistic sample of the speech of the region, no attempt was made to control for voice quality, and there is also more variation in terms of accent features. For further information, see Watt et al. (2020). References Nolan, F., McDougall, K., de Jong, G. & Hudson, T. (2009). The DyViS database: Style-controlled recordings of 100 homogeneous speakers for forensic phonetic research. International Journal of Speech, Language and the Law 16(1): 31-57. Nuance Communications (2018). Nuance Forensics: Prosecute criminals using their voice. NUAN–CS–2236–03–DS, 4th March 2018. Online resource: , accessed 12th April 2021. Watt, D., Harrison, P., Hughes, V., French, P., Llamas, C., Braun, A. & Robertson, D. (2020). Assessing the effects of accent-mismatched reference population databases on the performance of an automatic speaker recognition system. International Journal of Speech, Language and the Law 27(1): 1-34.