Information flow estimation: a study of news on Twitter
News media has long been an ecosystem of creation, reproduction, and critique, where news outlets report on current events and add commentary to ongoing stories. Understanding the dynamics of news information creation and dispersion is important to accurately ascribe credit to influential work and understand how societal narratives develop. These dynamics can be modelled through a combination of information-theoretic natural language processing and networks; and can be parameterised using large quantities of textual data. However, it is challenging to see "the wood for the trees", i.e., to detect small but important flows of information in a sea of noise. Here we develop new comparative techniques to estimate temporal information flow between pairs of text producers. Using both simulated and real text data we compare the reliability and sensitivity of methods for estimating textual information flow, showing that a metric that normalises by local neighbourhood structure provides a robust estimate of information flow in large networks. We apply this metric to a large corpus of news organisations on Twitter and demonstrate its usefulness in identifying influence within an information ecosystem, finding that average information contribution to the network is not correlated with the number of followers or the number of tweets. This suggests that small local organisations and right-wing organisations which have lower average follower counts still contribute significant information to the ecosystem. Further, the methods are applied to smaller full-text datasets of specific news events across news sites and Russian troll bots on Twitter. The information flow estimation reveals and quantifies features of how these events develop and the role of groups of bots in setting disinformation narratives.
READ FULL TEXT