Facilitating on-line opinion dynamics by mining expressions of causation. The case of climate change debates on The Guardian

by   Tom Willaert, et al.

News website comment sections are spaces where potentially conflicting opinions and beliefs are voiced. Addressing questions of how to study such cultural and societal conflicts through technological means, the present article critically examines possibilities and limitations of machine-guided exploration and potential facilitation of on-line opinion dynamics. These investigations are guided by a discussion of an experimental observatory for mining and analyzing opinions from climate change-related user comments on news articles from the TheGuardian.com. This observatory combines causal mapping methods with computational text analysis in order to mine beliefs and visualize opinion landscapes based on expressions of causation. By (1) introducing digital methods and open infrastructures for data exploration and analysis and (2) engaging in debates about the implications of such methods and infrastructures, notably in terms of the leap from opinion observation to debate facilitation, the article aims to make a practical and theoretical contribution to the study of opinion dynamics and conflict in new media environments.



There are no comments yet.


page 1


Contrarian effect in opinion forming: insights from Greta Thunberg phenomenon

In recent months the figure of Greta Thunberg and the theme of climate c...

Opinion dynamics in social networks: From models to data

Opinions are an integral part of how we perceive the world and each othe...

Discrete Opinion Dynamics with M choices

Here, I study how to obtain an opinion dynamics model for the case where...

DeSMOG: Detecting Stance in Media On Global Warming

Citing opinions is a powerful yet understudied strategy in argumentation...

Food for Thought: Analyzing Public Opinion on the Supplemental Nutrition Assistance Program

This project explores public opinion on the Supplemental Nutrition Assis...

An Opinion Mining of Text in COVID-19 Issues along with Comparative Study in ML, BERT RNN

The global world is crossing a pandemic situation where this is a catast...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Background

Over the past two decades, the rise of social media and the digitization of news and discussion platforms have radically transformed how individuals and groups create, process and share news and information. As Alan Rusbridger, former-editor-in-chief of the newspaper The Guardian has it, these technologically-driven shifts in the ways people communicate, organize themselves and express their beliefs and opinions, have

empower[ed] those that were never heard, creating a a new form of politics and turning traditional news corporations inside out. It is impossible to think of Donald Trump; of Brexit; of Bernie Sanders; of Podemos; of the growth of the far right in Europe; of the spasms of hope and violent despair in the Middle East and North Africa without thinking also of the total inversion of how news is created, shared and distributed. Much of it is liberating and and inspiring. Some of it is ugly and dark. And something - the centuries-old craft of journalism - is in danger of being lost (Rusbridger, 2018, xx-xxi).

Rusbridger’s observation that the present media-ecology puts traditional notions of politics, journalism, trust and truth at stake is a widely shared one (see for instance Lichfield, 2018; Singer and Brooking, 2018; Sunstein, 2018). As such, it has sparked interdisciplinary investigations, diagnoses and ideas for remedies across the economical, socio-political, and technological spectrum, challenging our existing assumptions and epistemologies (see Floridi, 2013, 2014). Among these lines of inquiry, particular strands of research from the computational social sciences are addressing pressing questions of how emerging technologies and digital methods might be operationalized to regain a grip on the dynamics that govern the flow of on-line news and its associated multitudes of voices, opinions and conflicts. Could the information circulating on on-line (social) news platforms for instance be mined to better understand and analyze the problems facing our contemporary society? Might such data mining and analysis help us to monitor the growing number of social conflicts and crises due to cultural differences and diverging world-views? And finally, would such an approach potentially facilitate early detection of conflicts and even ways to resolve them before they turn violent?

Answering these questions requires further advances in the study of cultural conflict based on digital media data. This includes the development of fine-grained representations of cultural conflict based on theoretically-informed text analysis, the integration of game-theoretical approaches to models of polarization and alignment, as well as the construction of accessible tools and media-monitoring observatories: platforms that foster insight into the complexities of social behaviour and opinion dynamics through automated computational analyses of (social) media data. Through an interdisciplinary approach, the present article aims to make both a practical and theoretical contribution to these aspects of the study of opinion dynamics and conflict in new media environments.

1.2 Objective

The objective of the present article is to critically examine possibilities and limitations of machine-guided exploration and potential facilitation of on-line opinion dynamics on the basis of an experimental data analytics pipeline or observatory for mining and analyzing climate change-related user comments from the news website of The Guardian (TheGuardian.com). Combining insights from the social and political sciences with computational methods for the linguistic analysis of texts, this observatory provides a series of spatial (network) representations of the opinion landscapes on climate change on the basis of causation frames expressed in news website comments. This allows for the exploration of opinion spaces at different levels of detail and aggregation.

Technical and theoretical questions related to the proposed method and infrastructure for the exploration and facilitation of debates will be discussed in three sections. The first section concerns notions of how to define what constitutes a belief or opinion and how these can be mined from texts. To this end, an approach based on the automated extraction of semantic frames expressing causation is proposed. The observatory thus builds on the theoretical premise that expressions of causation such as ‘global warming causes rises in sea levels’ can be revelatory for a person or group’s underlying belief systems. Through a further technical description of the observatory’s data-analytical components, section two of the paper deals with matters of spatially modelling the output of the semantic frame extractor and how this might be achieved without sacrificing nuances of meaning. The final section of the paper, then, discusses how insights gained from technologically observing opinion dynamics can inform conceptual modelling efforts and approaches to on-line opinion facilitation. As such, the paper brings into view and critically evaluates the fundamental conceptual leap from machine-guided observation to debate facilitation and intervention.

Through the case examples from The Guardian’s website and the theoretical discussions explored in these sections, the paper intends to make a twofold contribution to the fields of media studies, opinion dynamics and computational social science. Firstly, the paper introduces and chains together a number of data analytics components for social media monitoring (and facilitation) that were developed in the context of the <project name anonymized for review> infrastructure project. The <project name anonymized for review> infrastructure makes the components discussed in this paper available as open web services in order to foster reproducibility and further experimentation and development <infrastructure reference URL anonymized for review>. Secondly, and supplementing these technological and methodological gains, the paper addresses a number of theoretical, epistemological and ethical questions that are raised by experimental approaches to opinion exploration and facilitation. This notably includes methodological questions on the preservation of meaning through text and data mining, as well as the role of human interpretation, responsibility and incentivisation in observing and potentially facilitating opinion dynamics.

1.3 Data: the communicative setting of TheGuardian.com

In order to study on-line opinion dynamics and build the corresponding climate change opinion observatory discussed in this paper, a corpus of climate-change related news articles and news website comments was analyzed. Concretely, articles from the ‘climate change’ subsection from the news website of The Guardian dated from 2009 up to April 2019 were processed, along with up to 200 comments and associated metadata for articles where commenting was enabled at the time of publication. The choice for studying opinion dynamics using data from The Guardian is motivated by this news website’s prominent position in the media landscape as well as its communicative setting, which is geared towards user engagement. Through this interaction with readers, the news platform embodies many of the recent shifts that characterize our present-day media ecology.

TheGuardian.com is generally acknowledged to be one of the UK’s leading online newspapers, with 8,2 million unique visitors per month as of May 2013 (Reid, 2018). The website consists of a core news site, as well as a range of subsections that allow for further classification and navigation of articles. Articles related to climate change can for instance be accessed by navigating through the ‘News’ section, over the subsection ‘environment’, to the subsubsection ‘climate change’ (Guardian, 2019b). All articles on the website can be read free of charge, as The Guardian relies on a business model that combines revenues from advertising, voluntary donations and paid subscriptions.

Apart from offering high-quality, independent journalism on a range of topics, a distinguishing characteristic of The Guardian is its penchant for reader involvement and engagement. Adopting to the changing media landscape and appropriating business models that fit the transition from print to on-line news media, the Guardian has transformed itself into a platform that enables forms of citizen journalism, blogging, and welcomes readers comments on news articles (for this transition, see for instance Rusbridger, 2018, chap. 11). In order for a reader to comment on articles, it is required that a user account is made, which provides a user with a unique user name and a user profile page with a stable URL. According to the website’s help pages, providing users with an identity that is consistently recognized by the community fosters proper on-line community behaviour (Guardian, 2010). Registered users can post comments on content that is open to commenting, and these comments are moderated by a dedicated moderation team according to The Guardian’s community standards and participation guidelines (Guardian, 2009). In support of digital methods and innovative approaches to journalism and data mining, The Guardian has launched an open API (application programming interface) through which developers can access different types of content (Guardian, 2018)

. It should be noted that at the moment of writing this article, readers’ comments are not accessible through this API. For the scientific and educational purposes of this paper, comments were thus consulted using a dedicated scraper.

Taking into account this community and technologically-driven orientation, the communicative setting of The Guardian from which opinions are to be mined and the underlying belief system revealed, is defined by articles, participating commenters and comment spheres (that is, the actual comments aggregated by user, individual article or collection of articles) (see Figure 1).

Figure 1: Communicative setting of many online newspaper sites. The newspaper publishes articles on different topics and users can comment on these articles and previous comments.

In this setting, articles (and previous comments on those articles) can be commented on by participating commenters, each of which bring to the debate his or her own opinions or belief system. What this belief system might consists of can be inferred on a number of levels, with varying degrees of precision. On the most general level, a generic description of the profile of the average reader of The Guardian can be informative. Such profiles have been compiled by market researchers with the purpose of informing advertisers about the demographic that might be reached through this news website (and other products carrying The Guardian’s brand). As of the writing of this article, the audience The Guardian is presented to advertisers as a ‘progressive’ audience:

Living in a world of unprecedented societal change, with the public narratives around politics, gender, body image, sexuality and diet all being challenged. The Guardian is committed to reflecting the progressive agenda, and reaching the crowd that uphold those values. It’s helpful that we reach over half of progressives in the UK (Guardian, 2019a).

A second, equally high-level indicator of the beliefs that might be present on the platform, are the links through which articles on climate change can be accessed. An article on climate change might for instance be consulted through the environment section of the news website, but also through the business section. Assuming that business interests might potentially be at odds with environmental concerns, it could be hypothesized that the particular comment sphere for that article consists of at least two potentially clashing frames of mind or belief systems.

However, as will be expanded upon further in this article, truly capturing opinion dynamics requires a more systemic and fine-grained approach. The present article therefore proposes a method for harvesting opinions from the actual comment texts. The presupposition is thereby that comment spheres are marked by a diversity of potentially related opinions and beliefs. Opinions might for instance be connected through the reply structure that marks the comment section of an article, but this connection might also manifest itself on a semantic level (that is, the level of meaning or the actual contents of the comments). To capture this multidimensional, interconnected nature of the comment spheres, it is proposed to represent comment spheres as networks, where the nodes represent opinions and beliefs, and edges the relationships between these beliefs (see the spatial representation of beliefs infra). The use of precision language tools to extract such beliefs and their mutual relationships, as will be explored in the following sections, can open up new pathways of model validation and creation.

2 Mining opinions and beliefs from texts

In traditional experimental settings, survey techniques and associated statistical models provide researchers with established methods to gauge and analyze the opinions of a population. When studying opinion landscapes through on-line social media, however, harvesting beliefs from big textual data such as news website comments and developing or appropriating models for their analysis is a non-trivial task (for an overview of methodological challenges facing computational social science and digital methods, see for instance Watts, 2013; Rogers, 2013, 2019).

In the present context, two challenges related to data-gathering and text mining need to be addressed: (1) defining what constitutes an expression of an opinion or belief, and (2) associating this definition with a pattern that might be extracted from texts. Recent scholarship in the fields of natural language processing (NLP) and argumentation mining has yielded a range of instruments and methods for the (automatic) identification of argumentative claims in texts

(see for instance Farzindar et al., 2017; Stede et al., 2018). Adding to these instruments and methods, the present article proposes an approach in which belief systems or opinions on climate change are accessed through expressions of causation.

2.1 Causal mapping methods and the climate change debate

The climate change debate is often characterized by expressions of causation, that is, expressions linking a certain cause with a certain effect. Cultural or societal clashes on climate change might for instance concern diverging assessments of whether global warming is man-made or not (for a sample of arguments in favour of or against anthropogenic global warming, see ProCon, 2019). Based on such examples, it can be stated that expressions of causation are closely associated with opinions or beliefs, and that as such, these expressions can be considered a valuable indicator for the range and diversity of the opinions and beliefs that constitute the climate change debate. The observatory under discussion therefore focuses on the extraction and analysis of linguistic patterns called causation frames. As will be further demonstrated in this section, the benefit of this causation-based approach is that it offers a systemic approach to opinion dynamics that comprises different layers of meaning, notably the cognitive or social meaningfulness of patterns on account of their being expressions of causation, as well as further lexical and semantic information that might be used for analysis and comparison.

The study of expressions of causation as a method for accessing and assessing belief systems and opinions has been formalized and streamlined since the 1970s. Pioneered by political scientist Robert Axelrod and others, this causal mapping method (also referred to as ‘cognitive mapping’) was introduced as a means of reconstructing and evaluating administrative and political decision-making processes, based on the principle that

the notion of causation is vital to the process of evaluating alternatives. Regardless of philosophical difficulties involved in the meaning of causation, people do evaluate complex policy alternatives in terms of the consequences a particular choice would cause, and ultimately of what the sum of these effects would be. Indeed, such causal analysis is built into our language, and it would be very difficult for us to think completely in other terms, even if we tried (Axelrod, 2016, 5).

Axelrod’s causal mapping method comprises a set of conventions to graphically represent networks of causes and effects (the nodes in a network) as well as the qualitative aspects of this relation (the network’s directed edges, notably assertions of whether the causal linkage is positive or negative). These causes and effects are to be extracted from relevant sources by means of a series of heuristics and an encoding scheme (it should be noted that for this task Axelrod had human readers in mind). The graphs resulting from these efforts provide a structural overview of the relations among causal assertions (and thus beliefs):

The basic elements of the proposed system are quite simple. The concepts a person uses are represented as points, and the causal links between these concepts are represented as arrows between these points. This gives a pictorial representation of the causal assertions of a person as a graph of points and arrows. This kind of representation of assertions as a graph will be called a cognitive map. The policy alternatives, all of the various causes and effects, the goals, and the ultimate utility of the decision maker can all be thought of as concept variables, and represented as points in the cognitive map. The real power of this approach appears when a cognitive map is pictured in graph form; it is then relatively easy to see how each of the concepts and causal relationships relate to each other, and to see the overall structure of the whole set of portrayed assertions (Axelrod, 2016, 5).

In order to construct these cognitive maps based on textual information, Margaret Tucker Wrightson provides a set of reading and coding rules for extracting cause concepts, linkages (relations) and effect concepts from expressions in the English language. The assertion ‘Our present topic is the militarism of Germany, which is maintaining a state of tension in the Baltic Area’ might for instance be encoded as follows: ‘the militarism of Germany’ (cause concept), /+/ (a positive relationship), ‘maintaining a state of tension in the Baltic area’ (effect concept) (Tucker Wrightson, 2016, 296-297). Emphasizing the role of human interpretation, it is acknowledged that no strict set of rules can capture the entire spectrum of causal assertions:

The fact that the English language is as varied as those who use it makes the coder’s task complex and difficult. No set of rules will completely solve the problems he or she might encounter. These rules, however, provide the coder with guidelines which, if conscientiously followed, will result in outcomes meeting social scientific standards of comparative validity and reliability (Tucker Wrightson, 2016, 332).

To facilitate the task of encoders, the causal mapping method has gone through various iterations since its original inception, all the while preserving its original premises. Recent software packages have for instance been devised to support the data encoding and drawing process (see for instance Laukkanen and Wang, 2015). As such, causal or cognitive mapping has become an established opinion and decision mining method within political science, business and management, and other domains. It has notably proven to be a valuable method for the study of recent societal and cultural conflicts. Thomas Homer-Dixon et al. for instance rely on cognitive-affective maps created from survey data to analyze interpretations of the housing crisis in Germany, Israeli attitudes toward the Western Wall, and moderate versus skeptical positions on climate change (Homer-Dixon et al., 2014). Similarly, Duncan Shaw et al. venture to answer the question of ‘Why did Brexit happen?’ by building causal maps of nine televised debates that were broadcast during the four weeks leading up to the Brexit referendum (Shaw et al., 2017).

In order to appropriate the method of causal mapping to the study of on-line opinion dynamics, it needs to expanded from applications at the scale of human readers and relatively small corpora of archival documents and survey answers, to the realm of ‘big’ textual data and larger quantities of information. This attuning of cognitive mapping methods to the large-scale processing of texts required for media monitoring necessarily involves a degree of automation, as will be explored in the next section.

2.2 Automated causation tracking with the Penelope semantic frame extractor

As outlined in the previous section, causal mapping is based on the extraction of so-called cause concepts, (causal) relations, and effect concepts from texts. The complexity of each of these these concepts can range from the relatively simple (as illustrated by the easily-identifiable cause and effect relation in the example of ‘German militarism’ cited earlier), to more complex assertions such as ‘The development of international cooperation in all fields across the ideological frontiers will gradually remove the hostility and fear that poison international relations’, which contains two effect concepts (viz. ‘the hostility that poisons international relations’ and ‘the fear that poisons international relations’). As such, this statement would have to be encoded as a double relationship (Tucker Wrightson, 2016, 297-298).

The coding guidelines in Tucker Wrightson (2016) further reflect that extracting cause and effect concepts from texts is an operation that works on both the syntactical and semantic levels of assertions. This can be illustrated by means of the guidelines for analyzing the aforementioned causal assertion on German militarism:

1. The first step is the realization of the relationship. Does a subject affect an object? 2. Having recognized that it does, the isolation of the cause and effects concepts is the second step. As the sentence structure indicates, ”the militarism of Germany” is the causal concept, because it is the initiator of the action, while the direct object clause, ”a state of tension in the Baltic area,” constitutes that which is somehow influenced, the effect concept (Tucker Wrightson, 2016, 296).

In the field of computational linguistics, from which the present paper borrows part of its methods, this procedure for extracting information related to causal assertions from texts can be considered an instance of an operation called semantic frame extraction (for the concept of semantic frames, see Fillmore, 1982). A semantic frame captures a coherent part of the meaning of a sentence in a structured way. As documented in the FrameNet project (Baker et al., 1998), the Causation frame is defined as follows:

A Cause causes an Effect. Alternatively, an Actor, a participant of a (implicit) Cause, may stand in for the Cause. The entity Affected by the Causation may stand in for the overall Effect situation or event (Framenet, 2001).

In a linguistic utterance such as a statement in a news website comment, the Causation frame can be evoked by a series of lexical units, such as ‘cause’, ‘bring on’, etc. In the example ‘If such a small earthquake CAUSES problems, just imagine a big one!’, the Causation frame is triggered by the verb ‘causes’, which therefore is called the frame evoking element. The Cause slot is filled by ‘a small earthquake’, the Effect slot by ‘problems’ (Framenet, 2001).

In order to automatically mine cause and effects concepts from the corpus of comments on The Guardian, the present paper uses the Penelope semantic frame extractor: a tool that exploits the fact that semantic frames can be expressed as form-meaning mappings called constructions. Notably, frames were extracted from Guardian comments by focusing on the following lexical units (verbs, prepositions and conjunctions), listed in FrameNet as frame evoking elements of the Causation frame: Cause.v, Due to.prep, Because.c, Because of.prep, Give rise to.v, Lead to.v or Result in.v.

As illustrated by the following examples, the strings output by the semantic frame extractor adhere closely to the original utterance, preserving all of the the comments’ causation frames real-world noisiness:

    "causalRelations": [
                        "utterance": "Has anyone totted up the extra pollution on London streets emanating from traffic jams caused by Extinction Rebellion ?",
                        "cause": "extinction rebellion",
                        "effect": "traffic jams"

The output of the semantic frame extractor as such is used as the input for the ensuing pipeline components in the climate change opinion observatory. The aim of a further analysis of these frames is to find patterns in the beliefs and opinions they express. As will be discussed in the following section, which focuses on applications and cases, maintaining semantic nuances in this further analytic process foregrounds the role of models and aggregation levels.

3 Analyses and applications

Based on the presupposition that relations between causation frames reveal beliefs, the output of the semantic frame extractor creates various opportunities for exploring opinion landscapes and empirically validating conceptual models for opinion dynamics.

In general, any alignment of conceptual models and real-world data is an exercise in compromising, as the idealized, abstract nature of models is likely to be at odds with the messiness of the actual data. Finding such a compromise might for instance involve a reduction of the simplicity or elegance of the model, or, on the other hand, an increased aggregation (and thus reduced granularity) of the data.

Addressing this challenge, the current section reflects on questions of data modelling, aggregation and meaning by exploring, through case examples, different spatial representations of opinion landscapes mined from the TheGuardian.com’s comment sphere. These spatial renditions will be understood as network visualizations in which nodes represent argumentative statements (beliefs) and edges the degree of similarity between these statements. On the most general level, then, such a representation can consists of an overview of all the causes expressed in the corpus of climate change-related Guardian comments. This type of visualization provides a birds-eye view of the entire opinion landscape as mined from the comment texts. In turn, such a general overview might elicit more fine-grained, micro-level investigations, in which a particular cause is singled out and its more specific associated effects are mapped. These macro and micro level overviews come with their own proper potential for theory building and evaluation, as well as distinct requirements for the depth or detail of meaning that needs to be represented. To get the most general sense of an opinion landscape one might for instance be more tolerant of abstract renditions of beliefs (e.g. by reducing statements to their most frequently used terms), but for more fine-grained analysis one requires more context and nuance (e.g. adhering as closely as possible to the original comment).

3.1 Aggregation

As follows from the above, one of the most fundamental questions when building automated tools to observe opinion dynamics that potentially aim at advising means of debate facilitation concerns the level of meaning aggregation. A clear argumentative or causal association between, for instance, climate change and catastrophic events such as floods or hurricanes may become detectable by automatic causal frame tracking at the scale of large collections of articles where this association might appear statistically more often, but detection comes with great challenges when the aim is to classify certain sets of only a few statements in more free expression environments such as comment spheres.

In other words, the problem of meaning aggregation is closely related to issues of scale and aggregation over utterances. The more fine-grained the semantic resolution is, that is, the more specific the cause or effect is that one is interested in, the less probable it is to observe the same statement twice. Moreover, with every independent variable (such as time, different commenters or user groups, etc.), less data on which fine-grained opinion statements are to be detected is available. In the present case of parsed comments from TheGuardian.com, providing insights into the belief system of individual commenters, even if all their statements are aggregated over time, relies on a relatively small set of argumentative statements. This relative sparseness is in part due to the fact that the scope of the semantic frame extractor is confined to the frame evoking elements listed earlier, thus omitting more implicit assertions of causation (i.e. expressions of causation that can only be derived from context and from reading between the lines).

Similarly, as will be explored in the ensuing paragraphs, matters of scale and aggregation determine the types of further linguistic analyses that can be performed on the output of the frame extractor. Within the field of computational linguistics, various techniques have been developed to represent the meaning of words as vectors that capture the contexts in which these words are typically used. Such analyses might reveal patterns of statistical significance, but it is also likely that in creating novel, numerical representations of the original utterances, the semantic structure of argumentatively linked beliefs is lost.

In sum, developing opinion observatories and (potential) debate facilitators entails finding a trade-off, or, in fact, a middle way between macro- and micro-level analyses. On the one hand, one needs to leverage automated analysis methods at the scale of larger collections to maximum advantage. But one also needs to integrate opportunities to interactively zoom into specific aspects of interest and provide more fine-grained information at these levels down to the actual statements. This interplay between macro- and micro-level analyses is explored in the case studies below.

3.2 Spatial renditions of TheGuardian.com’s opinion landscape

The main purpose of the observatory under discussion is to provide insight into the belief structures that characterize the opinion landscape on climate change. For reasons outlined above, this raises questions of how to represent opinions and, correspondingly, determining which representation is most suited as the atomic unit of comparison between opinions. In general terms, the desired outcome of further processing of the output of the semantic frame extractor is a network representation in which similar cause or effect strings are displayed in close proximity to one another. A high-level description of the pipeline under discussion thus goes as follows. In a first step, it can be decided whether one wants to map cause statements or effect statements. Next, the selected statements are grouped per commenter (i.e. a list is made of all cause statements or effect statements per commenter). These statements are filtered in order to retain only nouns, adjectives and verbs (thereby also omitting frequently occurring verbs such as ‘to be’). The remaining words are then lemmatized, that is, reduced to their dictionary forms. This output is finally translated into a network representation, whereby nodes represent (aggregated) statements, and edges express the semantic relatedness between statements (based on a set overlap whereby the number of shared lemmata are counted).

As illustrated by two spatial renditions that were created using this approach and visualized using the network analysis tool Gephi (Bastian et al., 2009), the labels assigned to these nodes (lemmata, full statements, or other) can be appropriated to the scope of the analysis.

3.2.1 A macro-level overview: causes addressed in the climate change debate

Suppose one wants to get a first idea about the scope and diversity of an opinion landscape, without any preconceived notions of this landscape’s structure or composition. One way of doing this would be to map all of the causes that are mentioned in comments related to articles on climate change, that is, creating an overview of all the causes that have been retrieved by the frame extractor in a single representation. Such a representation would not immediately provide the granularity to state what the beliefs or opinions in the debates actually are, but rather, it might inspire a sense of what those opinions might be about, thus pointing towards potentially interesting phenomena that might warrant closer examination.

Figure 2: This is a global representation of the data produced by considering a 10 percent subsample of all the causes identified by the causation tracker on the set of comments. It treats statements as nodes of a network and two statements are linked if they share the same lemma (the number of shared lemmata corresponds to the link weight). In this analysis, only nouns, verbs and adjectives are considered (the text processing is done with spaCy (Honnibal and Montani, 2019)). For this global view, each cause statement is labeled by that word within the statement that is most frequent in all the data. The visual output was created using the network exploration tool Gephi (0.92). The 2D layout is the result of the OpenOrd layout algorithm integrated in Gephi followed by the label adjustment tool to avoid too much overlap of labels.

Figure 2, a high-level overview of the opinion landscape, reveals a number of areas to which opinions and beliefs might pertain. The top-left clusters in the diagram for instance reveal opinions about the role of people and countries, whereas on the right-hand side, we find a complementary cluster that might point to beliefs concerning the influence of high or increased CO2-emissions. In between, there is a cluster on power and energy sources, reflecting the energy debate’s association to both issues of human responsibility and CO2 emissions. As such, the overview can already inspire, potentially at best, some very general hypotheses about the types of opinions that figure in the climate change debate.

3.2.2 Micro-level investigations: opinions on nuclear power and global warming

Based on the range of topics on which beliefs are expressed, a micro-level analysis can be conducted to reveal what those beliefs are and, for instance, whether they align or contradict each other. This can be achieved by singling out a cause of interest, and mapping out its associated effects.

As revealed by the global overview of the climate change opinion landscape, a portion of the debate concerns power and energy sources. One topic with a particularly interesting role in this debate is nuclear power. Figure 3 illustrates how a more detailed representation of opinions on this matter can be created by spatially representing all of the effects associated with causes containing the expression ‘nuclear power’. Again, similar beliefs (in terms of words used in the effects) are positioned closer to each other, thus facilitating the detection of clusters. Commenters on The Guardian for instance express concerns about the deaths or extinction that might be caused by this energy resource. They also voice opinions on its cleanliness, whether or not it might decrease pollution or be its own source of pollution, and how it reduces CO2-emissions in different countries.

Figure 3: A detailed representation of effect statements associated with nuclear power. Clusters concern potential extinction or deaths, notions of cleanliness and pollution, and the reduction of CO2 levels in different countries. Labels represent the full output of the semantic frame extractor.

Whereas the detailed opinion landscape on ‘nuclear power’ is relatively limited in terms of the number of mined opinions, other topics might reveal more elaborate belief systems. This is for instance the case for the phenomenon of ‘global warming’. As shown in Figure 4, opinions on global warming are clustered around the idea of ‘increases’, notably in terms of evaporation, drought, heat waves, intensity of cyclones and storms, etc. An adjacent cluster is related to ‘extremes’, such as extreme summers and weather events, but also extreme colds.

Figure 4: A detailed representation of the effects of global warming. This graph conveys the diversity of opinions, as well as emerging patterns. It can for instance be observed that certain opinions are clustered around the idea of ‘increases’, notably in terms of evaporation, drought, heat waves, intensity of cyclones and storms, etc. An adjacent cluster is related to ‘extremes’, such as extreme summers and weather events, but also extreme colds. Labels represent the full output of the semantic frame extractor.

4 From opinion observation to debate facilitation

The observatory introduced in the preceding paragraphs provides preliminary insights into the range and scope of the beliefs that figure in climate change debates on TheGuardian.com. The observatory as such takes a distinctly descriptive stance, and aims to satisfy, at least in part, the information needs of researchers, activists, journalists and other stakeholders whose main concern is to document, investigate and understand on-line opinion dynamics. However, in the current information sphere, which is marked by polarization, misinformation and a close entanglement with real-world conflicts, taking a mere descriptive or neutral stance might not serve every stakeholder’s needs. Indeed, given the often skewed relations between power and information, questions arise as to how media observations might in turn be translated into (political, social or economic) action. Knowledge about opinion dynamics might for instance inform interventions that remedy polarization or disarm conflict. In other words, the construction of (social) media observatories unavoidably lifts questions about the possibilities, limitations and, especially, implications of the machine-guided and human-incentivized

facilitation of on-line discussions and debates.

Addressing these questions, the present paragraph introduces and explores the concept of a debate facilitator, that is, a device that extends the capabilities of the previously discussed observatory to also promote more interesting and constructive discussions. Concretely, we will conceptualize a device that reveals how the personal opinion landscapes of commenters relate to each other (in terms of overlap or lack thereof), and we will discuss what steps might potentially be taken on the basis of such representation to balance the debate. Geared towards possible interventions in the debate, such a device may thus go well beyond the observatory’s objectives of making opinion processes and conflicts more transparent, which concomitantly raises a number of serious concerns that need to be acknowledged.

On rather fundamental ground, tools that steer debates in one way or another may easily become manipulative and dangerous instruments in the hands of certain interest groups. Various aspects of our daily lives are for instance already implicitly guided by recommender systems, the purpose and impact of which can be rather opaque. For this reason, research efforts across disciplines are directed at scrutinizing and rendering such systems more transparent (Milano et al., 2019). Such scrutiny is particularly pressing in the context of interventions on on-line communication platforms, which have already been argued to enforce affective communication styles that feed rather than resolve conflict. The objectives behind any facilitation device should therefore be made maximally transparent and potential biases should be fully acknowledged at every level, from data ingest to the dissemination of results (for a thorough discussion of challenges facing social media research in a post-truth era, see Rogers, 2018). More concretely, the endeavour of constructing opinion observatories and facilitators foregrounds matters of ‘openness’ of data and tools, security, ensuring data quality and representative sampling, accounting for evolving data legislation and policy, building communities and trust, and envisioning beneficial implications. By documenting the development process for a potential facilitation device, the present paper aims to contribute to these on-going investigations and debates. Furthermore, every effort has been made to protect the identities of the commenters involved. In the words of media and technology visionary Jaron Lanier, developers and computational social scientists entering this space should remain fundamentally aware of the fact that ‘digital information is really just people in disguise’ (Lanier, 2013, 19).

With these reservations in mind, the proposed approach can be situated among ongoing efforts that lead from debate observation to facilitation. One such pathway, for instance, involves the construction of filters to detect hate speech, misinformation and other forms of expression that might render debates toxic (see for instance De Smedt et al., 2018; Van Hee et al., 2018). Combined with community outreach, language-based filtering and detection tools have proven to raise awareness among social media users about the nature and potential implications of their on-line contributions (see Grey, 2019). Similarly, advances can be expected from approaches that aim to extend the scope of analysis beyond descriptions of a present debate situation in order to model how a debate might evolve over time and how intentions of the participants could be included in such an analysis.

Progress in any of these areas hinges on a further integration of real-world data in the modelling process, as well as a further socio-technical and media-theoretical investigation of how activity on social media platforms and technologies correlate to real-world conflicts. The remainder of this section therefore ventures to explore how conceptual argument communication models for polarization and alignment (Banisch and Olbrich, 2018) might be reconciled with real-world data, and how such models might inform debate facilitation efforts.

4.0.1 Debate facilitation through models of alignment and polarization

As discussed in previous sections, news websites like TheGuardian.com establish a communicative settings in which agents (users, commenters) exchange arguments about different issues or topics. For those seeking to establish a healthy debate, it could thus be of interest to know how different users relate to each other in terms of their beliefs about a certain issue or topic (in this case climate change). Which beliefs are for instance shared by users and which ones are not? In other words, can we map patterns of alignment or polarization among users?

Figure 5 ventures to demonstrate how representations of opinion landscapes (generated using the methods outlined above) can be enriched with user information to answer such questions. Specifically, the graph represents the beliefs of two among the most active commenters in the corpus. The opinions of each user are marked using a colour coding scheme: red nodes represent the beliefs of the first user, blue nodes represent the beliefs of the second user. Nodes with a green colour represent beliefs that are shared by both users.

Figure 5: A representation of the opinion landscapes of two active commenters on TheGuardian.com. Statements by the first commenter are marked with a blue colour, opinions by the second commenter with a red colour. Overlapping statements are marked in green. The graph reveals that the commenters’ beliefs are positioned most closely to each other on the most general aspects of the debate, whereas there is less relatedness on the social and more technical aspects of the discussion.

Taking into account again the factors of aggregation that were discussed in the previous section, Figure 5 supports some preliminary observations about the relationship between the two users in terms of their beliefs. Generally, given the fact that the graph concerns the two most active commenters on the website, it can be seen that the rendered opinion landscape is quite extensive. It is also clear that the belief systems of both users are not unrelated, as nodes of all colours can be found distributed throughout the graph. This is especially the case for the right-hand top cluster and right-hand bottom cluster of the graph, where green, red, and blue nodes are mixed. Since both users are discussing on articles on climate change, a degree of affinity between opinions or beliefs is to be expected.

Upon closer examination, a number of disparities between the belief systems of the two commenters can be detected. Considering the left-hand top cluster and center of the graph, it becomes clear that exclusively the red commenter is using a selection of terms related to the economical and socio-political realm (e.g. ‘people’, ‘american’, ‘nation’, ‘government’) and industry (e.g. ‘fuel’, ‘industry’, ‘car’, etc.). The blue commenter, on the other hand, exclusively engages in using a range of terms that could be deemed more technical and scientific in nature (e.g. ‘feedback’, ‘property’, ‘output’, ‘trend’, ‘variability’, etc.). From the graph, it also follows that the blue commenter does not enter into the red commenter’s ‘social’ segments of the graph as frequently as the red commenter enters the more scientifically-oriented clusters of the graph (although in the latter cases the red commenter does not use the specific technical terminology of the blue commenter). The cluster where both beliefs mingle the most (and where overlap can be observed), is the top right cluster. This overlap is constituted by very general terms (e.g. ‘climate’, ‘change’, and ‘science’). In sum, the graph reveals that the commenters’ beliefs are positioned most closely to each other on the most general aspects of the debate, whereas there is less relatedness on the social and more technical aspects of the debate. In this regard, the depicted situation seemingly evokes currently on-going debates about the role or responsibilities of the people or individuals versus that of experts when it comes to climate change (see for instance Katz, 2016; Mäki, 2019; Fibieger Byskov, 2019).

What forms of debate facilitation, then, could be based on these observations? And what kind of collective effects can be expected? As follows from the above, beliefs expressed by the two commenters shown here (which are selected based on their active participation rather than actual engagement or dialogue with one another) are to some extent complementary, as the blue commenter, who displays a scientifically-oriented system of beliefs, does not readily engage with the social topics discussed by the red commenter. As such, the overall opinion landscape of the climate change could potentially be enriched with novel perspectives if the blue commenter was invited to engage in a debate about such topics as industry and government. Similarly, one could explore the possibility of providing explanatory tools or additional references on occasions where the debate takes a more technical turn.

However, argument-based models of collective attitude formation (Mäs et al., 2013; Banisch and Olbrich, 2018) also tell us to be cautious about such potential interventions. Following the theory underlying these models, different opinion groups prevailing during different periods of a debate will activate different argumentative associations. Facilitating exchange between users with complementary arguments supporting similar opinions may enforce biased argument pools (Sunstein, 2002) and lead to increasing polarization at the collective level. In the example considered here the two commenters agree on the general topic, but the analysis suggests that they might have different opinions about the adequate direction of specific climate change action. A more fine–grained automatic detection of cognitive and evaluative associations between arguments and opinions is needed for a reliable use of models to predict what would come out of facilitating exchange between two specific users. In this regard, computational approaches to the linguistic analysis of texts such as semantic frame extraction offer productive opportunities for empirically modelling opinion dynamics. Extraction of causation frames allows one to disentangle cause-effect relations between semantic units, which provides a productive step towards mapping and measuring structures of cognitive associations. These opportunities are to be explored by future work.

5 Conclusion

Ongoing transitions from a print-based media ecology to on-line news and discussion platforms have put traditional forms of news production and consumption at stake. Many challenges related to how information is currently produced and consumed come to a head in news website comment sections, which harbour the potential of providing new insights into how cultural conflicts emerge and evolve. On the basis of an observatory for analyzing climate change-related comments from TheGuardian.com, this article has critically examined possibilities and limitations of the machine-assisted exploration and possible facilitation of on-line opinion dynamics and debates.

Beyond technical and modelling pathways, this examination brings into view broader methodological and epistemological aspects of the use of digital methods to capture and study the flow of on-line information and opinions. Notably, the proposed approaches lift questions of computational analysis and interpretation that can be tied to an overarching tension between ‘distant’ and ‘close reading’ (Moretti, 2013). In other words, monitoring on-line opinion dynamics means embracing the challenges and associated trade-offs that come with investigating large quantities of information through computational, text-analytical means, but doing this in such a way that nuance and meaning are not lost in the process.

Establishing productive cross-overs between the level of opinions mined at scale (for instance through the lens of causation frames) and the detailed, closer looks at specific conversations, interactions and contexts depends on a series of preliminaries. One of these is the continued availability of high-quality, accessible data. As the current on-line media ecology is recovering from recent privacy-related scandals (e.g. Cambridge Analytica), such data for obvious reasons is not always easy to come by. In the same legal and ethical vein, reproducibility and transparency of models is crucial to the further development of analytical tools and methods. As the experiments discussed in this paper have revealed, a key factor in this undertaking are human faculties of interpretation. Just like the encoding schemes introduced by Axelrod and others before the wide-spread use of computational methods, present-day pipelines and tools foreground the role of human agents as the primary source of meaning attribution.

<This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732942 (Opinion Dynamics and Cultural Conflict in European Spaces – www.Odycceus.eu).>


  • Axelrod (2016) Axelrod R (2016) The cognitive mapping approach to decision making. In: Axelrod R (ed.) Structure of Decision: The Cognitive Maps of Political Elites, chapter 1. Princeton, New Jersey: Princeton University Press, pp. 3–17.
  • Baker et al. (1998) Baker CF, Fillmore CJ and Lowe JB (1998) The berkeley framenet project. In: Proceedings of the 17th international conference on Computational linguistics-Volume 1. Association for Computational Linguistics, pp. 86–90.
  • Banisch and Olbrich (2018) Banisch S and Olbrich E (2018) An argument communication model of polarization and ideological alignment. arXiv preprint arXiv:1809.06134 .
  • Bastian et al. (2009)

    Bastian M, Heymann S and Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks.

    In: Third international AAAI conference on weblogs and social media.
  • De Smedt et al. (2018) De Smedt T, De Pauw G and Van Ostaeyen P (2018) Automatic detection of online jihadist hate speech. arXiv preprint arXiv:1803.04596 .
  • Farzindar et al. (2017) Farzindar A, Inkpen D and Hirst G (2017) Natural Language Processing for Social Media: Second Edition. Synthesis Lectures on Human Language Technologies. San Rafael, CA: Morgan & Claypool Publishers.
  • Fibieger Byskov (2019) Fibieger Byskov M (2019) Climate change: focusing on how individuals can help is very convenient for corporations. URL https://theconversation.com/climate-change-focusing-on-how-individuals-can-help-is-very-convenient-for-corporations-108546.
  • Fillmore (1982) Fillmore C (1982) Frame semantics. Linguistics in the Morning Calm : 111–137.
  • Floridi (2013) Floridi L (2013) The Philosophy of Information. Oxford: Oxford University Press.
  • Floridi (2014) Floridi L (2014) The Fourth Revolution: How the Infosphere is Reshaping Human Reality. Oxford: Oxford University Press.
  • Framenet (2001) Framenet (2001) Causation. URL https://framenet2.icsi.berkeley.edu/fnReports/data/frameIndex.xml?frame=Causation.
  • Grey (2019) Grey P (2019) What is project grey? URL https://projectgrey.eu/about-project-grey/?lang=en.
  • Guardian (2009) Guardian (2009) Frequently asked questions about the community on the guardian website. URL https://www.theguardian.com/community-faqs.
  • Guardian (2010) Guardian (2010) Sign in and registration faq. URL https://www.theguardian.com/help/identity-faq.
  • Guardian (2018) Guardian (2018) The guardian openplatform. URL https://open-platform.theguardian.com/.
  • Guardian (2019a) Guardian (2019a) The guardian advertising: Modern advertising. URL https://advertising.theguardian.com/advertising.
  • Guardian (2019b) Guardian (2019b) The guardian climate change. URL https://www.theguardian.com/environment/climate-change.
  • Homer-Dixon et al. (2014) Homer-Dixon T, Milkoreit M, Mock SJ, Schröder T and Thagard P (2014) The conceptual structure of social disputes: Cognitive-affective maps as a tool for conflict analysis and resolution. SAGE Open 4(1): 2158244014526210.
  • Honnibal and Montani (2019) Honnibal M and Montani I (2019) Industrial-strength natural language processing (nlp) with python and cython. URL https://github.com/explosion/spaCy.
  • Katz (2016) Katz C (2016) Climate change and individual responsibility. URL https://blog.apaonline.org/2016/05/24/climate-change-and-individual-responsibility/.
  • Lanier (2013) Lanier J (2013) Who Owns the Future? New York: Simon & Schuster.
  • Laukkanen and Wang (2015) Laukkanen M and Wang M (2015) Comparative Causal Mapping: The CMAP3 Method. Farnham: Gower Publishing, Ltd.
  • Lichfield (2018) Lichfield G (ed.) (2018) MIT Technology Review. The Politics Issue. Technology is threatening our democracy. How do we save it? MIT Technology Review. URL https://www.technologyreview.com/magazine/2018/09/.
  • Mäki (2019) Mäki S (2019) Who is responsible for tackling climate change? you, me, politicians or energy producers? URL https://www.tuni.fi/unit-magazine/en/articles/who-responsible-tackling-climate-change-you-me-politicians-or-energy-producers.
  • Mäs et al. (2013) Mäs M, Flache A, Takács K and Jehn KA (2013) In the short term we divide, in the long term we unite: Demographic crisscrossing and the effects of faultlines on subgroup polarization. Organization science 24(3): 716–736.
  • Milano et al. (2019) Milano S, Taddeo M and Floridi L (2019) Recommender systems and their ethical challenges. URL http://dx.doi.org/10.2139/ssrn.3378581.
  • Moretti (2013) Moretti F (2013) Distant reading. London and New York: Verso Books.
  • ProCon (2019) ProCon (2019) Is human activity primarily responsible for global climate change? URL https://climatechange.procon.org.
  • Reid (2018) Reid A (2018) Guardian.co.uk most read newspaper site in uk in march. URL https://www.journalism.co.uk/news/nrs-guardian-co-uk-is-uk-s-top-monthly-news-site/s2/a553108/.
  • Rogers (2013) Rogers R (2013) Digital methods. Cambridge, MA and London: MIT press.
  • Rogers (2018) Rogers R (2018) Social media research after the fake news debacle. Partecipazione e Conflitto 11(2): 557–570.
  • Rogers (2019) Rogers R (2019) Doing Digital Methods. Los Angeles, London, New Delhi, Singapore, Washington DC, Melbourne: SAGE Publications. ISBN 9781526476067. URL https://books.google.be/books?id=DLuODwAAQBAJ.
  • Rusbridger (2018) Rusbridger A (2018) Breaking News: The Remaking of Journalism and Why It Matters Now. Edinburgh: Canongate Books Limited.
  • Shaw et al. (2017) Shaw D, Smith CM and Scully J (2017) Why did brexit happen? using causal mapping to analyse secondary, longitudinal data. European Journal of Operational Research 263(3): 1019–1032.
  • Singer and Brooking (2018) Singer P and Brooking E (2018) LikeWar: The Weaponization of Social Media. New York: Houghton Mifflin Harcourt Publishing Co.
  • Stede et al. (2018) Stede M, Schneider J and Hirst G (2018) Argumentation Mining. Synthesis Lectures on Human Language Technologies. San Rafael, CA: Morgan & Claypool Publishers.
  • Sunstein (2018) Sunstein C (2018) #Republic: Divided Democracy in the Age of Social Media. Princeton and Oxford: Princeton University Press.
  • Sunstein (2002) Sunstein CR (2002) The law of group polarization. Journal of political philosophy 10(2): 175–195.
  • Tucker Wrightson (2016) Tucker Wrightson M (2016) The documentary coding method. In: Axelrod R (ed.) Structure of Decision: The Cognitive Maps of Political Elites, chapter Appendix 1. Princeton, New Jersey: Princeton University Press, pp. 291–332.
  • Van Hee et al. (2018) Van Hee C, Jacobs G, Emmery C, Desmet B, Lefever E, Verhoeven B, De Pauw G, Daelemans W and Hoste V (2018) Automatic detection of cyberbullying in social media text. PloS one 13(10): e0203794.
  • Watts (2013) Watts DJ (2013) Computational social science: Exciting progress and future directions. The Bridge on Frontiers of Engineering 43(4): 5–10.