Östling, Andreas and Sargeant, Holli and Xie, Huiyuan and Bull, Ludwig and Terenin, Alexander and Jonsson, Leif and Magnusson, Måns and Steffek, Felix
(2024).
Cambridge Law Corpus, 1550-2023.
[Data Collection]. Colchester, Essex:
UK Data Service.
10.5255/UKDA-SN-856927
The Cambridge Law Corpus is a corpus designed for legal AI research. It consists of over 250,000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases dating from the 16th century. It was funded by the research project, Legal Systems and Artificial Intelligence, which was jointly supported by the UK’s Economic and Social Research Council, part of UKRI, and the Japanese Society and Technology Agency (JST), and involved collaboration between Cambridge University (the Centre for Business Research, Department of Computer Science and Faculty of Law) and Hitotsubashi University, Tokyo (the Graduate Schools of Law and Business Administration).
Data description (abstract)
The Cambridge Law Corpus (CLC) is a corpus designed for legal AI research. It consists of over 250,000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. Together with the corpus, annotations on case outcomes for 638 cases, done by legal experts, are provided. The Word files were cleaned and transformed into an XML format. PDF files were converted to textual form via optical character recognition (OCR). The resulting text files were then converted to the XML standard format. Because of legal and ethical considerations, the full Cambridge Law Corpus (CLC) is only available for research purposes under restrictions and available via Related Resources. A smaller dataset consisting of 15 selected cases from the CLC is available on the University of Cambridge Apollo Data Repository which can be accessed via Related Resources.
Data creators: |
Creator Name |
Affiliation |
ORCID (as URL) |
Östling Andreas |
University of Uppsala |
|
Sargeant Holli |
University of Cambridge |
|
Xie Huiyuan |
University of Cambridge |
|
Bull Ludwig |
CourtCorrect |
|
Terenin Alexander |
University of Cambridge |
|
Jonsson Leif |
Ericsson |
|
Magnusson Måns |
University of Uppsala |
|
Steffek Felix |
University of Cambridge |
|
|
Sponsors: |
Economic and Social Research Council
|
Grant reference: |
ES/T006315/1
|
Topic classification: |
Law, crime and legal systems History
|
Keywords: |
LAW, LEGAL DECISIONS, COURTS, LEGAL RECORDS
|
Project title: |
Legal Systems and Artificial Intelligence
|
Grant holders: |
Simon Deakin, Jennifer Cobbe, Jon Crowcroft, Christopher Markou, Jatinder Singh, Felix Steffek
|
Project dates: |
From | To |
---|
1 January 2020 | 31 January 2023 |
|
Date published: |
19 Feb 2024 13:29
|
Last modified: |
19 Feb 2024 13:29
|
Temporal coverage: |
From | To |
---|
1 January 1550 | 1 September 2023 |
|
Collection period: |
Date from: | Date to: |
---|
1 January 2020 | 31 December 2023 |
|
Geographical area: |
UK |
Country: |
United Kingdom |
Data collection method: |
The original cases of the Cambridge Law Corpus were supplied by the legal technology company CourtCorrect in raw form, including Microsoft Word and PDF files. |
Observation unit: |
Text unit |
Kind of data: |
Text, Software |
Type of data: |
Historical data |
Resource language: |
English |
|
Data sourcing, processing and preparation: |
Because of legal and ethical considerations, the Cambridge Law Corpus (CLC) is only available for research purposes under restrictions. For further details on how to access the dataset see https://www.cst.cam.ac.uk/research/srg/projects/law. A smaller dataset consisting of 15 selected cases from the CLC is available on the University of Cambridge Apollo Data Repository, at: https://www.repository.cam.ac.uk/handle/1810/357329. For further information, please contact Simon Deakin (s.deakin@cbr.cam.ac.uk).
|
Rights owners: |
Name |
Affiliation |
ORCID (as URL) |
Östling Andreas |
University of Uppsala |
|
Sargeant Holli |
University of Cambridge |
|
Xie Huiyuan |
University of Cambridge |
|
Bull Ludwig |
CourtCorrect |
|
Terenin Alexander |
University of Cambridge |
|
Jonsson Leif |
Ericsson |
|
Magnusson Måns |
University of Uppsala |
|
Steffek Felix |
University of Cambridge |
|
|
Contact: |
Name | Email | Affiliation | ORCID (as URL) |
---|
Deakin, Simon | s.deakin@cbr.cam.ac.uk | University of Cambridge | Unspecified |
|
Notes on access: |
The Data Collection is available from an external repository. Access is available via Related Resources.
|
Publisher: |
UK Data Service
|
Last modified: |
19 Feb 2024 13:29
|
|
Available Files
No Files to display
Data collections
Publications
Website
Edit item (login required)
 |
Edit Item |