Brezina, Vaclav (2024). The Hansard Corpus, 1802-2023. [Data Collection]. Colchester, Essex: UK Data Service. 10.5255/UKDA-SN-857509
Corpus linguistics is a UK success story. It is an approach to the study of language, pioneered in large part by UK researchers, that uses computers to permit the analysis of millions, or even billions, of words of data to look for patterns of usage that are not necessarily observable otherwise. Corpus linguistics has revolutionised linguistics, changing the ways that language is analysed and how languages are taught. It is therefore an increasingly well established approach to the study of language among linguists. Yet the analysis of language is not the sole preserve of linguists but, rather, is a thread that runs through all of the social sciences. Since 2013, the Centre for Corpus Approaches to Social Science has brought the benefits of the corpus approach to a range of social science disciplines (including Criminology, Sociology, Accountancy and Psychology), and has enabled new approaches to answering questions in those disciplines (e.g. on understanding hate crime, views on climate change, financial accounting, and learning in primary schools). The Centre has also produced new tools and resources for the large-scale study of language, and made them available free of charge to academics and non-academics in the UK and internationally through the Centre website, Summer Schools, training events, and a dedicated Corpus MOOC. Overall, the Centre has established itself as the world leader in innovative research in Corpus Linguistics and its applications beyond linguistics. The next phase of the Centre's work will maintain and enhance that position, by continuing our core activities and undertaking new ones. In particular, we will maximise the practical impact in four of our areas of activity (Corporate Communication, Climate Change, Language Development and Language Assessment) and bring the transformative benefits of Corpus Linguistics to a social scientific understanding of communication for, about and by people who are ill. We will analyse both existing and new language datasets that are relevant to mental health (including anxiety disorder, psychosis and depression), chronic pain, obesity and medical communication. Our analyses will bring about new understandings of the experience of illness, manifestations of stigma and the communicative needs of medical professionals, and will feed into recommendations and interventions for support, training and policy. This will address three of the ESRC's strategic priorities, and make a positive difference to research and practice in healthcare, and communication about health and illness more generally.
Data description (abstract)
Corpus linguistics is a British success story, revolutionising language analysis and teaching through computer-assisted examination of vast datasets. Since 2013, the Centre for Corpus Approaches to Social Science has extended this method beyond linguistics to social sciences, tackling issues like hate crime, climate change, financial reporting, and education. It has also developed freely accessible tools and resources, establishing itself as a global leader.
The Hansard Corpus is a extensive dataset including 2 billion words of UK parliamentary speeches from 1802-2023. The corpus is annotated and searchable for grammatical and semantic categories. The data is available via a free software tool #LancsBox X, which allows full access to the dataset as well as complex statistical analyses (R package) and visualisations.
Data creators: |
|
||||||
---|---|---|---|---|---|---|---|
Sponsors: | ESRC | ||||||
Grant reference: | ES/R008906/1 | ||||||
Topic classification: |
Media, communication and language Politics History |
||||||
Keywords: | PARLIAMENTARY DEBATES, LINGUISTICS, LINGUISTIC ANALYSIS, LANGUAGES AND LINGUISTICS EDUCATION, SPEECH, HISTORY | ||||||
Project title: | ESRC Centre for Corpus Approaches to Social Science (Transition Review) | ||||||
Grant holders: | Elena Semino, Vaclav Brezina, Basil Germond, Steven Young, Dana Gablasova, Garrath Williams, Andrew Hardie, Kate Cain, Claire Hardaker, John Baker | ||||||
Project dates: |
|
||||||
Date published: | 16 Dec 2024 13:22 | ||||||
Last modified: | 16 Dec 2024 13:23 | ||||||
Available Files
No Files to display
Downloads
Altmetric
Related Resources
Data collections
#LancsBox |
Website
ESRC Centre for Corpus Approaches to Social Science (Transition Review) |
ESRC Centre for Corpus Approaches to Social Science |