Assuming identities online: Experimental chatlogs

Grant, Timothy David (2016). Assuming identities online: Experimental chatlogs. [Data Collection]. Colchester, Essex: UK Data Archive. 10.5255/UKDA-SN-852099

Preventive policing of serious crime sometimes involves deception and disguise. A case in point is the prevention of abuse arising from paedophile grooming and peer to peer networks where abuse images of children are discussed and exchanged. The preventive techniques by police investigators include assuming identities of existing community members, and of children, so that interventions and arrests can be made. Often, there are tight time constraints associated with this process - investigators have only a small window in which to learn and assume the identity in question before arousing suspicion in their target(s). The training that undercover online investigators currently receive, although broadly informed by linguistic theory, is in need of development. Furthermore, the time constraints mean that a semi-automated system to assist in identity assumption would represent a crucial contribution to the investigative toolkit. Taking an inductive approach, which is to say that the phenomena of interest, rather than a specific theoretical paradigm, are primary, this research aims to bridge the gap between complex theories of the discursive construction of online identities on the one hand, and computational approaches to analysing online communications on the other. A small scale study CFL and Lexegesys are currently engaged in is addressing the challenges of automation at the pragmatic and interactional levels, working towards the semi-automated identification of phenomena such as indirect speech acts and topic management. The work is extremely practical and is informed by real-world police investigations. A partner in the project, the West Midlands Police, Technical Intelligence Development Unit is crucially committed to providing data and operational insights. In addition to empirical applied linguistics, the project conducts proof-of-concept work for software that will assist in an ethical use of assumed identities in policing. Furthermore, it will involve an assessment of the ethical and policy implications for policing and security of complexity in online identity performance.

Data description (abstract)

Research taking a computational approach to the analysis of online communications has thus far focused overwhelmingly on the structural elements of Computer Mediated Discourse (CMD), such as typography, orthography and other low level features, with little to no attention being paid to the socially situated discourses in which these features are embedded. The Centre for Forensic Linguistics (CFL) - a research centre within Aston University combining leading-edge research and investigative forensic practice - and Lexegesys - a consultancy and technology company specialising in developing and implementing data analysis solutions, recently collaborated on a project that was successful in automating the process of identification and extraction of low-level features for the purposes of attributing authorship of unknown texts within the context of Twitter. Yet CMD has widely been recognized to operate on a number of linguistic levels, such as those of meaning, of interaction, and of social practice. Outside of the computational linguistic field, the characteristic features of CMD are understood as resources that users draw on in the construction of identities in particular contexts, and CMD constitutes social practice in and of itself rather than simply being shaped by social variables. This data collection consists of transcripts of Instant Messaging conversations between a 'Judge' and an 'Interlocutor', the latter being replaced at some point by an 'Impersonator'. 3 x 15 minute chats per file, representing 3 conditions of preparation for the Impersonator in each case - No Preparation, Over the Shoulder preparation, and Homework preparation. The transcripts correspond to postgraduate students (files 1-12) and undergraduate students (files 13-30). Judges were asked to record when they thought a switch had taken place, what linguistic criteria led them to think this, and how confident they were in their decision. Information on when switches actually occurred was also collected, and cross referenced with these judgements.

Data creators:
Creator Name Affiliation ORCID (as URL)
Grant Timothy David Aston University
Contributors:
Name Affiliation ORCID (as URL)
Sorell Tom University of Warwick
Sponsors: Economic and Social Research Council
Grant reference: ES/L003279/1
Topic classification: Media, communication and language
Law, crime and legal systems
Keywords: instant messaging, identity, english (language)
Project title: Assuming Identities Online: description, development and ethical implications.
Alternative title: Experimental Chatlogs - Postgraduates (1-12) and Undergraduates (13-30)
Grant holders: Professor Timothy Grant
Project dates:
FromTo
1 August 201431 July 2016
Date published: 17 Feb 2016 10:20
Last modified: 17 Feb 2016 10:20

Available Files

Data

Documentation

Read me

Downloads

data downloads and page views since this item was published

View more statistics

Altmetric

Edit item (login required)

Edit Item Edit Item