Conference paper accepted: Classification and Event Identification Using Word Embedding

Our new paper has just been accepted for presentation at CLEF 2019 in September.

Classification and Event Identification Using Word Embedding

This paper presents our contribution to the CLEF 2019 ProtestNews Track, which aims to classify and identify protest events in English-language news from India and China. We used traditional classification models, namely, support vector machines and XGBoost classifiers, combined with various word embedding approaches. Multiple models were tested for experimental purposes, in addition to the two models evaluated within the official campaign. Results show promising performance, especially in terms of precision on both document and sentence classification tasks.

Come and talk to us if you would like to know more.

New paper: Communities of online news exposure during the UK General Election 2015

New paper available in Online Social Networks and Media

Communities of online news exposure during the UK General Election 2015

Media exposure has become increasingly complex and hard to measure with the rise in online news consumption. Furthermore, since many people now routinely access news via social media, questions arise as to whether social news-sharing is affected by the polarization and partisan echo chambers that are often observed in social media communication. This study considers news-sharing on Twitter during the UK General Election in 2015, using the act of sharing as an indicator that the sharer has been exposed to that online news content. Analysis of the network structure of users and the news articles they share identifies multiple distinct user communities, which are characterized by analysis of the articles shared within them. Communities are characterised by news article sources (web domains), geographical origin and content; time of article publication was also considered but showed no significant relationships. There is evidence for ideologically biased audiences that predominantly share content from either left-leaning or right-leaning news sources, but these audiences also see content from opposing viewpoints. Other audiences are characterized by geography and/or specialised on particular news topics. Overall these findings suggest that many people consume a diverse range of news content over the election period and that the level of political bias in content exposure varies widely across the Twitter user population.

New paper: Scaling Laws in Geo-located Twitter Data

New paper accepted for publication in PLOS One

Scaling Laws in Geo-located Twitter Data

We observe and report on a systematic relationship between population density and Twitter use. Number of tweets, number of users and population per unit area are related by power laws, with exponents greater than one, that are consistent with each other and across a range of spatial scales. This implies that population density can accurately predict Twitter activity. Furthermore this trend can be used to identify ‘anomalous’ areas that deviate from the trend. Analysis of geo-tagged and place-tagged tweets show that geo-tagged tweets are different with respect to user type and content. Our findings have implications for the spatial analysis of Twitter data and for understanding demographic biases in the Twitter user base.