Banducci, Susan
(2020).
2016 EU Referendum campaign online news and information URLs.
[Data Collection]. Colchester, Essex:
UK Data Service.
10.5255/UKDA-SN-854256
The advent of Web 2.0 - the second generation of the World Wide Web, that allows users to interact, collaborate, create and share information online, in virtual communities - has radically changed the media environment, the types of content the public is exposed to as well as the exposure process itself. Individuals are faced with a wider range of options (from social and traditional media), new patterns of exposure (socially mediated and selective), and alternate modes of content production (e.g. user-generated content). In order to understand change (and stability) in opinions and behaviour, it is necessary to measure to what information a person has been exposed. The measures social scientists have traditionally used to capture information exposure usually rely on self-reports of newspaper reading and television news broadcast viewing. These measures do not take into account that individuals browse and share diverse information from social and traditional media on a wide range of platforms. According to the OECD's Global Science Forum 2013 report, social scientists' inability to anticipate the Arab Spring was partly due to a failure to understand 'the new ways in which humans communicate' via social media and the ways they are exposed to information. And social media's mixed record for predicting the results of recent UK elections suggests better tools and a unified methodology are needed to analyze and extract political meaning from this new type of data.
We argue that a new set of tools, which models exposure as a network and incorporates both social and traditional media sources, is needed in the social sciences to understand media exposure and its effects in the age of digital information. Whether one is consuming the news online or producing/consuming information on social media, the fundamental dynamic of consuming public affairs news involves formation of ties between users and media content by a variety of means (e.g. browsing, social sharing, search). Online media exposure is then a process of network formation that links sources and consumers of content via their interactions, requiring a network perspective for its proper understanding. We propose a set of scalable network-oriented tools to 1) extract, analyse, and measure media content in the age of "big media data", 2) model the linkages between consumers and producers of media content in complex information networks, and 3) understand co-development of network structures with consumer attitudes/behaviours.
In order to develop and validate these tools, we bring together an interdisciplinary and international team of researchers at the interface of social science and computer science. Expertise in network analysis, text mining, statistical methods and media analysis will be combined to test innovative methodologies in three case studies including information dynamics in the 2015 British election and opinion formation on climate change. Developing a set of sophisticated network and text analysis tools is not enough, however. We also seek to build national capacity in computational methods for the analysis of online 'big' data.
Data description (abstract)
The data set represents processed data from individual web browsing histories collected during the EU Referendum campaign as part of ICM Unlimited Reflected Life's panel. Each line of data represents the number of times an individual user visited a news & information domain during the data collection period.
Data creators: |
|
Sponsors: |
Economic and Social Research Council
|
Grant reference: |
ES/N012283/1
|
Topic classification: |
Media, communication and language Politics
|
Keywords: |
INTERNET NEWS, INTERNET, INTERNET USE, ELECTIONS, EU REFERENDUM 2016, EUROPEAN UNION
|
Project title: |
Measuring Information Exposure in Dynamic and Dependent Networks (ExpoNet)
|
Grant holders: |
Susan Banducci, Travis Coan, Hywel Williams, David Lazer, Lorien Jasny, Gabriel Katz
|
Project dates: |
From | To |
---|
1 January 2016 | 28 June 2019 |
|
Date published: |
05 May 2020 15:53
|
Last modified: |
05 May 2020 15:53
|
Temporal coverage: |
From | To |
---|
3 February 2016 | 24 June 2016 |
|
Collection period: |
Date from: | Date to: |
---|
3 February 2016 | 24 June 2016 |
|
Country: |
United Kingdom |
Data collection method: |
We contracted with ICM Unlimited to capture web browsing history data from their Reflected Life panel. Reflected Life is a digital toolkit ICM use to track the digital profile of online panel members. Users download the Reflected Life App onto their phones, tablets and desktops. The app is easily downloaded onto each users digital device from which it tracks and shares each and every URL the user visits and their search history. Over the course of the study, ICM provided every URL our panel has visited. These web browsing histories were collected for 3,310 panel members during the UK's EU referendum campaign, we captured of the digital footprint of respondents over 12 weeks prior to the referendum. |
Observation unit: |
Individual |
Kind of data: |
Numeric |
Type of data: |
Other surveys |
Resource language: |
English |
|
Data sourcing, processing and preparation: |
The click stream data was collected between February 17 (three days before the EU referendum was announced) and June 23 2016 (the day of the vote), for a total number of 3310 users, 959 of which were also present in at least one of the survey panels. Thus our analysis is based on the 959 users where we have survey data and online browsing histories. The periods for the clickstream data are 7-26 February, 15-30 April and 1 May thru 23 June all in 2016. Opening a new page generates a request which shows up as one URL in the data, but this request is accompanied by multiple others, generating URLs that correspond to adds, widgets and trackers which are not relevant for our analysis. In fact, for every loaded page containing relevant information (such as a newspaper article), there are on average at least 10 other URLs loading at the same time in the click-stream data, which are irrelevant. Many of these URLs are on the same domain as the page of interest, so the challenge is being able to identify pages that contain articles and distinguish them from irrelevant URLs on the same domain.
To achieve this goal, we started from a list of the most popular news domains, as identified by Alexa, an Amazon company that ranks websites by traffic and classifies them into multiple categories based on content. Within the "News" category, we selected websites which were places in the "Newspapers", "Analysis and Opinion", "Breaking News", "Current Events", "Extended Coverage", "Internet Broadcasts", "Magazines and E-zines", "Journalism" and "Weblogs" sub-categories, and selected the top 400 domains in each news category, as well as the top domains categorized by UK region. The total number of news domains considered was 4,179. Out of these, 750 domains appeared in our web browsing history/clickstream data.
We then visited each of these domains, manually coded whether the website contained articles or not, and what the structure of URLs pointing to articles on the website was. Knowing the structure of links that point to articles allows us to write regular expressions that match all the possible articles on a domain, while excluding any other types of pages on that domain. Most news websites have a clear subdirectory structure which can be used for this purpose. For example, article pages on the Guardian website have the following structure: www.theguardian.com/section/year/month/day/article-title. We can therefore identify all the Guardian articles that show up in a user's browsing history (and only articles) with the following general regular expression: www.theguardian.com/.+/\d{4}/\w{3}/\d{2}/.+$. The coded news domains were further pruned to eliminate those that only included weather and other procedurally generated articles (such as traffic information, sports results, TV programming guides, stocks monitoring pages), news aggregation websites (such as Google and Yahoo News, Flipboard, etc.), videos without an attached article or description, guides and how-to pages (recipes, reviews, self-diagnosis, travel guides, etc.), which left a total number of 508 domains.
In total we extracted 332 news and information domains visited by the sample. Data deposited represent the number of times a user accessed each extracted news and information domain over the 12 week period.
|
Rights owners: |
|
Contact: |
|
Notes on access: |
The Data Collection is available to any user without the requirement for registration for download/access.
|
Publisher: |
UK Data Service
|
Last modified: |
05 May 2020 15:53
|
|
Available Files
Data
Documentation
Data collections
Publications
Website
Edit item (login required)
 |
Edit Item |