Encoding Heterogeneous Social and Political Context for Entity Stance Prediction

08/09/2021
by   Shangbin Feng, et al.
Xi'an Jiaotong University
3

Political stance detection has become an important task due to the increasingly polarized political ideologies. Most existing works focus on identifying perspectives in news articles or social media posts, while social entities, such as individuals and organizations, produce these texts and actually take stances. In this paper, we propose the novel task of entity stance prediction, which aims to predict entities' stances given their social and political context. Specifically, we retrieve facts from Wikipedia about social entities regarding contemporary U.S. politics. We then annotate social entities' stances towards political ideologies with the help of domain experts. After defining the task of entity stance prediction, we propose a graph-based solution, which constructs a heterogeneous information network from collected facts and adopts gated relational graph convolutional networks for representation learning. Our model is then trained with a combination of supervised, self-supervised and unsupervised loss functions, which are motivated by multiple social and political phenomenons. We conduct extensive experiments to compare our method with existing text and graph analysis baselines. Our model achieves highest stance detection accuracy and yields inspiring insights regarding social entity stances. We further conduct ablation study and parameter analysis to study the mechanism and effectiveness of our proposed approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 5

page 6

page 7

08/09/2021

Knowledge Graph Augmented Political Perspective Detection in News Media

Identifying political perspective in news media has become an important ...
06/02/2021

Who Blames or Endorses Whom? Entity-to-Entity Directed Sentiment Extraction in News Text

Understanding who blames or supports whom in news text is a critical res...
01/23/2018

Entity Retrieval and Text Mining for Online Reputation Monitoring

Online Reputation Monitoring (ORM) is concerned with the use of computat...
09/09/2021

Identifying Morality Frames in Political Tweets using Relational Learning

Extracting moral sentiment from text is a vital component in understandi...
10/14/2019

Going Negative Online? – A Study of Negative Advertising on Social Media

A growing number of empirical studies suggest that negative advertising ...
10/13/2020

Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks

Joint-event-extraction, which extracts structural information (i.e., ent...
11/29/2017

The Wisdom of Polarized Crowds

As political polarization in the United States continues to rise, the qu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Over the last decade we have witnessed an intensification of civil discourse and increasingly polarized political ideologies. As the Associated Press put it, ”Americans are more divided than ever, gridlocked over social issues, race, gender and the economy.”111ap.org/explore/divided-america/. This trend of polarization holds true for the rest of the world [carothers2019democracies]. Political stances have become a dominant factor in daily activities while we live in a time of echo chambers, partisan misinformation and campaigns for ideological extremes. These unprecedented circumstances have called for research efforts to study individuals’ political perspectives and automatically identify them based on external information.

Traditionally, the task of political perspective detection focuses on detecting stances in natural language texts such as social media posts and news articles. For perspective mining in social media, [CIKMstance33]

adopts self-attention networks to classify tweets.

[CIKMstance23] predicts user stances on major events with user interactions and network dynamics. [CIKMstance17]

uses sentiment analysis techniques to identify political perspectives expressed on Twitter. For perspective detection in news media, deep neural networks 

[li2021mean, HLSTM, CNNglove], bias features [MEANbiasfeature] and linguistic features [MEANlinguistic] are adopted by various works. [ACL19] collects Twitter users that interact with certain news outlets and supplement news text with online graph structures for stance detection. [Pan2016inMEAN] jointly leverages news text and social information for perspective detection. [feng2021knowledge] introduces external knowledge to boost task performance.

Figure 1: Examples of social entities that have quotes with consistent political stances over many years of time.

Although existing works focus on identifying stances in news articles and tweets, they fail to recognize the fact that individuals and organizations are the actual producers of news and tweets. As a result, political perspectives expressed in texts are merely snapshots of these social entities’ stances. For example, Figure 1 presents two U.S. senators and their quotes with a time span of years. Perspective detection on these quotes would identify their stances, but the issues of quote abundance and semantic ambiguity prevent text-based stance detection methods from accurately predicting their stances as individuals. That being said, research efforts should shift the focus of political stance detection from natural language texts to social entities.

In light of the fact that social entities are the true holders of political stances, we propose a novel task of entity stance prediction, which aims to identify individuals and organizations’ perspectives towards political ideologies. Specifically, we select social entities from contemporary U.S. politics and retrieve facts about them from Wikipedia to form the knowledge base for entity stance prediction. We then annotate these entities’ perspectives towards political ideologies with the help of domain experts. After defining the task, we propose to construct a heterogeneous information network to represent the social and political context. We then adopt relational graph convolutional networks to learn representations and train our model with three losses of different levels of supervision to simulate social and political phenomena. Our main contributions are summarized as follows:

  • We propose a novel task, entity stance prediction, which predicts perspectives of social entities since they actually take stances. We also collect and annotate a contemporary U.S. politics dataset for entity stance prediction.

  • We propose a graph-based solution to the new task, which constructs a heterogeneous information network from social and political context, adopt graph neural networks and train with three levels of supervision to incorporate social phenomena and political insights.

  • We conduct extensive experiments to compare our method with existing text and graph analysis techniques. Our model consistently outperform baselines, successfully addresses the task of entity stance prediction and yields new insights into the stances of unlabled entities such as U.S. governors, states and political parties.

Entity Example Item Count Value
President Joe Biden # entity 1,069
Senator Elizabeth Warren # liberal label 777
Congressperson Kevin McCarthy # liberal SO-O-N-F-SF 143-214-76-26-318
Governor Ron DeSantis # conservative label 679
Government Institution the U.S. Senate # conservative SO-O-N-F-SF 296-12-92-45-234
State California # Wikipedia summary 1,069
 Political Party Democratic Party # sentences 10,401
Supreme Court Justice Amy Coney Barrett # tokens 154,934
Office Term 117th Congress / /
Table 1: Summary of our entity stance prediction dataset. SO, O, N, F, SF follow that of Figure 2 without further notice.

Related Work

Political Perspective Detection

Previous works focused on identifying political stances expressed in news articles and social media. Perspective detection in news media is often treated as text classification. Text analysis techniques such as linguistic features [MEANbiasfeature, MEANlinguistic], multi-head attention networks [li2021mean]

, convolutional neural networks 

[CNNglove]

and recurrent neural networks 

[HLSTM] are adopted to identify stances in news documents. Later proposals attempt to leverage information in addition to news text to boost task performance.  [ACL19] supplements news with Twitter users who interact with various news outlets and form graph structures.  [Pan2016inMEAN] studies the problem of text representation learning with the help of social information.

Apart from news articles, perspective detection in social media is also generally treated as a text classification task on tweets and posts. Text analysis techniques such as neural attention networks [CIKMstance9], sentiment analysis [CIKMstance32]

, language models 

[CIKMstance24] and self-attention [CIKMstance33] are adopted to identify stances in social media posts. Other research efforts explored identifying user stances instead of individual tweets.  [CIKMstance29] uses label propagation algorithms to propose a semi-supervised approach of identifying perspectives in social media.  [CIKMstance7] proposes to cluster users into different stance groups.  [CIKMstance23] predicts user perspectives based on the Twitter network dynamics and interactions between users.

Figure 2: Illustration of annotating elected officials’ stances with the help of domain experts. Scores of 10, 25, 75 and 90 partition both political spectrums into five stances, namely SF, F, N, O, SO, which are short for strongly favor, favor, neutral, oppose and strongly oppose respectively.

Graph Neural Networks

Graph neural networks (GNNs) have broadened the horizons of deep learning from structured data of images and text to unstructured data types such as graphs and manifolds. Among them, GCN 

[GCN], GAT [GAT] and GraphSAGE [SAGE] are effective GNN architectures. Many works have also contributed to scaling graph neural networks [S-GCN, FastGCN, ClusterGCN, GraphSAINT] and unsupervised graph representation learning [Min, Min11, SAGE, Min41].

For heterogeneous graphs that consists of different types of nodes and edges,  [RGCN] proposes relational graph convolutional networks to learn representations.  [wang2019heterogeneous] proposes heterogeneous graph attention networks to extend GAT to heterogeneous graphs. In tasks such as fake news detection [nguyen2020fang], Twitter bot detection [feng2021botrgcn] and question answering [de2018question], there are proposals to transform the task into learning on heterogeneous graphs. Since the social and political context of entities is often heterogeneous, we build on these works to learn representations for the context graph and conduct entity stance prediction.

Entity Stance Prediction

Task Definition

Existing research on political perspective detection typically focuses on identifying stances in texts such as tweets and news articles. In this paper, we propose a new task to identify political stances of social entities since they actually take stances. We firstly define social entities:

[ standard jigsaw, opacityback=0, boxrule=0.5pt] Definition 1. Social entities are people or groups of people that share common characteristics.

For instance, elected officials, political parties, social organizations and geographical locations are considered social entities. We then define the task of entity stance prediction:

[ standard jigsaw, opacityback=0, boxrule=0.5pt] Definition 2. The task of entity stance prediction is to predict social entities’ stances towards issues or ideologies with their social and political context.

For instance, entity stance prediction models could learn to predict U.S. senators’ stances towards gun control or conservative values in general based on their home states, years in office, party affiliations and voting records.

Our proposed task aims to shift the focus of stance detection from text to entities. This approach is closer to the root of the problem since individuals and organizations actually take stances. Apart from that, entity stance prediction results could predict one’s positions and serve as external knowledge for relevant tasks such as fake news detection and sentiment analysis.

Relation Example
party affiliation Joe Biden affiliated to Democratic Party
home state Bernie Sanders from Vermont
hold office Ted Cruz is a senator
elected tenure Mitt Romney serves in 117th congress
appoint Donald Trump appoint Neil Gorsuch
Table 2: Five types of extracted relation between entities in their Wikipedia pages. These relations serve as different types of edges in our social and political context HIN.

Data Collection

After defining the novel task, we collect and annotate a dataset for it. Firstly, we select contemporary U.S. politics as the scenario and retrieve relevant entities. Specifically, we select diversified entities that were active in the past decade and present them in Table 1. A wide range of social entities from elected officials, political parties to government institutions and geographical locations are covered in our dataset. We then retrieve Wikipedia pages of these entities to serve as social and political context for entity stance prediction.

Model Liberal Conservative Overall Metrics
Macro-F1 Micro-F1 Macro-F1 Micro-F1 Accuracy Macro-F1 Micro-F1
Linear BoW 0.4990 0.6795 0.3338 0.6912 0.6849 0.4000 0.6853
Bias Features 0.2463 0.4872 0.1695 0.4559 0.4726 0.2008 0.4710
Average WEs 0.3174 0.5256 0.2340 0.5147 0.5205 0.2694 0.5201
RoBERTa 0.4936 0.6667 0.5004 0.7794 0.7192 0.4970 0.7187
LongFormer 0.4395 0.6538 0.4072 0.7206 0.6849 0.4227 0.6856
GCN 0.5012 0.6795 0.5891 0.8235 0.7466 0.5416 0.7446
GAT 0.5831 0.7564 0.5353 0.8088 0.7808 0.5582 0.7817
GraphSAGE 0.5518 0.7308 0.4809 0.7794 0.7534 0.5139 0.7543
TransformerConv 0.5814 0.7564 0.5332 0.7941 0.7740 0.5563 0.7748
ResGatedGraphConv 0.5111 0.7051 0.5793 0.8235 0.7603 0.5431 0.7597
Ours 0.6167 0.7949 0.5913 0.8235 0.8082 0.6037 0.8089
Table 3: Performance of our model and competitive text and graph analysis baselines on the entity stance prediction dataset.

After determining the input of entity stance prediction, we annotate elected officials’ stances towards political ideologies with the help of domain experts. Specifically, AFL-CIO222aflcio.org/scorecard and Heritage Action333heritageaction.com/scorecard are representative organizations on the left and right of the political spectrum. They score U.S. representatives and senators in a scale from 0 to 100 based on their voting records to evaluate how liberal or conservative elected officials are. We adapt our annotations from their scoreboards with a process illustrated in Figure 2.

In the end, we obtain a contemporary U.S. politics dataset for entity stance prediction. A summary of the dataset is presented in Table 1. We randomly conduct a 7:2:1 split on annotated entities to serve as train, validation and test set.

Methodology

In this section, we present our proposal for the novel task of entity stance prediction. We firstly construct a heterogeneous information network to represent the social and political context. We then adopt text encoders and gated relational graph convolutional networks to learn representations for social entities. We train the model to conduct entity stance prediction with a combination of three levels of supervision to simulate social and political phenomena.

Graph Construction

The task of entity stance prediction aims to predict social entities’ perspectives towards political ideologies with their social and political context. To better capture the interactions and relations between entities, we propose to construct a heterogeneous information network (HIN) from Wikipedia pages. Specifically, we take social entities as nodes in the HIN. We then conduct named entity recognition 

[NER] and coreference resolution [CoreferenceResolution] to identify entity mentions across all Wikipedia pages. We extract five types of relations among entity mentions, which are listed in Table 2. We then adopt extracted relations between entities as edges. As a result, we obtain the social and political HIN with 1,069 nodes and 9,248 edges. A sample of the context HIN is presented in Figure 3.

Figure 3: Sample of our social and political context HIN.

Model Architecture

Let be entities in the social and political context HIN and be the Wikipedia summary of . Let be the set of edge types in Table 2 and be the -th entity’s neighborhood with regard to edge type . Firstly, we encode Wikipedia summary text with the robust pre-trained language model RoBERTa [liu2019roberta].

(1)

where is the pre-trained text encoder and denotes the representation of the -th entity’s Wikipedia summary. We then transform it with a fully connected layer to serve as the initial features for nodes in the HIN.

(2)

where

is leaky-relu,

and are learnable parameters. We then propagate node messages and aggregate them with gated relational graph convolutional networks (gated R-GCN). For the -th layer of gated R-GCN,

(3)
RoBERTa size 768 GNN size 512
optimizer Adam learning rate 1e-3
batch size 64

max epochs

100
2 Leaky-ReLU
-0.1 #negative sample 2
0.01 0.2
1 1e-5
Table 4:

Implementation details and hyperparameter settings of our political perspective detection model.

where and are parameterized linear functions for self loops and edges of relation ,

is the hidden representation for entity

at layer . We then calculate gate levels,

(4)

where and are learnable parameters,

denotes the sigmoid function and

denotes the concatenation operation. We then apply the gate mechanism to and ,

(5)

where is the Hadamard product operation. After layer(s) of gated R-GCN, we obtain representation of entities as . We then predict their stances towards liberal and conservative values,

(6)

where and are predictions of stances towards liberal and conservative values respectively, , , and are learnable parameters.

Figure 4:

Prediction of state stances averaged with probabilities compared to results of the 2020 U.S. presidential election.

Learning and Optimization

We propose to train our model with a combination of unsupervised, self-supervised and supervised losses, which simulates the echo chamber phenomenon, ensures stance consistency and learns from entity annotations respectively. The total loss of our model is as follows:

(7)

where is the weight of loss and are all learnable parameters in the model. We then present the motivation and details of each loss , and .

Unsupervised Loss: Echo Chamber

The unsupervised loss is motivated by the echo chamber phenomenon, where social entities tend to reinforce their narratives by forming small and closely connected interaction circles. We simulate echo chambers by assuming that neighboring nodes on the context HIN have similar representations while non-neighboring nodes have different representations. We firstly define the positive and negative neighborhood of entity ,

(8)

We then calculate the unsupervised loss,

(9)

where denotes the transpose of and is the weight for negative samples.

Self-supervised Loss: Stance Consistency

The self-supervised loss is motivated by the fact that liberalism and conservatism are opposite ideologies, thus individuals often take inversely correlated stances towards them. We firstly speculate entities’ stance towards the opposite ideology by taking the opposite of the predicted stance,

(10)

where

is the one-hot encoder,

calculates the index with the largest value, is the number of stance labels, and are self-supervised labels derived by stance consistency. We then calculate the self-supervised loss measuring stance consistency,

(11)

Supervised Loss: Entity Annotation

We annotated certain entities’ stances according to Figure 2. The supervised loss aims to train the model to correctly identify these known stances. Let and denote the liberal and conservative training set, and denote the ground-truth of liberal and conservative stances. We calculate the supervised loss,

(12)

Experiments

In this section, we conduct extensive experiments to compare our model with competitive text and graph analysis methods on the entity stance prediction dataset. We then present the task’s novel findings of social entity stances and conduct ablation study and parameter analysis to analyze the mechanism and effectiveness of our proposed approach.

Baselines

The task of entity stance prediction involves both textual data and graph structures to represent real-world context. We select representative methods from text and graph analysis techniques as baselines.

For the following text-based baselines, we encode Wikipedia summaries of entities with these methods and predict their stances with fully connected layers.

Figure 5: United States sitting governors’ stance towards liberal and conservative values predicted by our model.
Figure 6:

Major political parties in the United States and their members’ stance distribution predicted by our model. CDF stands for cumulative distribution function.

  • Linear BoW

    encodes Wikipedia summaries with TFIDF unigram vectors extracted with the help of scikit-learn 

    [pedregosa2011scikit].

  • Bias Features are content based features drawn from a wide range of approaches described in political bias literatures. [MEANbiasfeature]

  • Average Word Embeddings (WEs) uses an average of the pre-trained Glove [pennington2014glove] word embeddings.

  • RoBERTa is a pre-trained language model that could effectively encode text sequences. [liu2019roberta]

  • LongFormer is short for long-document transformer, which aims to effectively encode longer text sequences. [beltagy2020longformer]

Ablation Settings Acc Macro-F1 Micro-F1
full model 0.8082 0.6037 0.8089
loss only 0.7808 0.5643 0.7817
and 0.7603 0.5191 0.7611
and 0.7945 0.5597 0.7952
structure no edge 1 0.7877 0.5902 0.7881
no edge 2 0.7877 0.5712 0.7885
no edge 3 0.7945 0.5880 0.7954
no edge 4 0.7808 0.5717 0.7815
no edge 5 0.7740 0.5501 0.7748
operator GCN 0.7603 0.5831 0.7808
GAT 0.7740 0.5901 0.7885
SAGE 0.7877 0.5804 0.7881
R-GCN 0.7808 0.5561 0.7815
Table 5:

Overall model performance in different ablation settings. Definition of evaluation metrics follows that of Table

3. Edge numbers follow that of Table 2. We substitute the gated R-GCN in our method with different graph operators, which are different settings from graph-based baselines.

For the following graph-based models, we use them to conduct node classification in a supervised manner.

  • GCN extends convolution to graphs, propagates node messages and learns representations for nodes and graphs. [GCN]

  • GAT incorporates the attention mechanism in graph neural networks. [GAT]

  • GraphSAGE is a general and inductive framework that leverages node features to learn representations. [SAGE]

  • TransformerConv

    adopts a graph transformer network, which takes feature and label embeddings to learn node representations.

    [TransformerConv]

  • ResGatedGraphConv is a graph neural network framework that handles graphs of arbitrary length and size. [ResGatedGraphConv]

Implementation

We use pytorch 

[paszke2019pytorch], pytorch lightning [pytorchlightning]

, torch geometric 

[torchgeometric] and the transformers library [wolf-etal-2020-transformers] for an efficient implementation of our entity stance prediction model. We present hyperparameter settings in Table 4. We submit our entity stance prediction data set and all implemented codes as supplementary material to facilitate reproduction.

Experiment Results

Performance of our method and competitive baselines in the entity stance prediction task is presented in Table 3

. We use macro and micro averaging F1 scores to evaluate methods’ ability to predict stances towards liberal and conservative ideologies. We also calculate the overall accuracy and balanced F1 scores defined as the harmonic mean of two F1 scores on the liberal and conservative labels. It is demonstrated that our model outperforms all text and graph analysis baselines. This indicates that our proposal for the entity stance prediction task is generally effective. Apart from that, graph-based models generally outperform text-based models, which suggests that the graph structure of the context HIN is essential in the task of entity stance prediction.

Entity Stance Findings

An important objective of entity stance prediction is to predict entity stances that are not evaluated by domain experts. We analyze political stances of governors, states and political parties that are not annotated in the dataset and examine whether they follow conventional wisdom.

Figure 7: Model accuracy when trained with different unsupervised and self-supervised loss weights and .

State Stances

States are geographical zones in the United States. Conventional wisdom often classify these states into red, blue and swing states. We present our model’s predictions of state stances and compare them with results in the 2020 U.S. presidential election in Figure 4. It is illustrated that our stance predictions highly correlate with 2020 presidential election results. Our model also predicts that the ideological structures of traditional swing states, such as North Carolina and Ohio, are actually more conservative than expected. Predictions also suggest that among states that went to Republicans in 2020, Florida and Montana might be easier to flip due to their rather liberal context.

Governor Stances

Political experts typically study and evaluate the stances of legislators, while state-level officials such as governors are also essential in governance and policy making. We present our model’s predictions of governor stances in all 50 U.S. states in Figure 5444This figure is created with the help of mapchart.net.. It is no surprise that governors from partisan strongholds such as California and Utah hold firm stances. Conventional wisdom often assumes that in order to win electorally challenging or difficult races, one has to sacrifice political ideologies. However, our model’s predictions indicate exceptions to this rule, such as Andy Beshear (D-KY) and Ron DeSantis (R-FL).

Political Party Stances

Elected officials in the United States are typically republicans or democrats. We illustrate the model’s prediction of party stances in Figure 6. It is illustrated that there are generally more moderates in the Republican party of the United States than the Democratic party.

To sum up, the task of entity stance prediction and our proposed model yields valuable insights about the political stances of governors, states and political parties. In fact, our proposal could evaluate any social entity, not necessarily in the politics realm, to predict their stances towards political ideologies given their social and political context.

Figure 8: Task accuracy when our model is trained with zero to five layers of gated R-GCN.

Ablation Study

Our proposed entity stance prediction model constructs a HIN to represent social and political context, adopts gated relational graph convolution networks for representation learning and is trained with a combination of differently supervised losses. To examine these design choices and their contributions to the model’s performance, we conduct ablation study of graph structure, graph operator and loss functions and present the results in Table 5. It is demonstrated that our strategy to train the entity stance prediction model with different levels of supervision is generally effective. Besides, different edges in the context HIN and the gated R-GCN operator also contribute to our model’s performance.

Parameter Analysis

There are two significant set of parameters in our proposed entity stance prediction model. Firstly, the weights of different losses governs the training process of our model. Specifically, and determines the importance of auxiliary tasks of evaluating the phenomena of echo chambers and stance consistency on the social and political context HIN. We fix and and train our model with different combinations of and . We present the results in Figure 7. It is illustrated that and would generally lead to an effective balance of differently supervised loss functions.

Besides, the layer count of gated R-GCNs governs the range of message propagation and aggregation across the topological structure of the context HIN. We train our model with different s and present the results in Figure 8. Figure 8 shows that layers of gated R-GCN would lead to the best performance, where information in the context HIN are sufficiently but not excessively propagated and aggregated.

Conclusion and Future Work

Political stance detection is an important and challenging task, while previous efforts focus on identifying perspectives in textual data. In this paper, we shift the focus from text to social entities and propose the novel task of entity stance prediction. We firstly collect and annotate a dataset for the novel task. We then propose a graph-based approach to address the problem, conduct extensive experiments to evaluate different methods and gain insights about entity stances. In the future, we plan to broaden the scope of entity stance prediction to scenarios outside politics.

References