DeepAI
Log In Sign Up

Modelling Social Context for Fake News Detection: A Graph Neural Network Based Approach

07/27/2022
by   Pallabi Saikia, et al.
IEEE
1

Detection of fake news is crucial to ensure the authenticity of information and maintain the news ecosystems reliability. Recently, there has been an increase in fake news content due to the recent proliferation of social media and fake content generation techniques such as Deep Fake. The majority of the existing modalities of fake news detection focus on content based approaches. However, most of these techniques fail to deal with ultra realistic synthesized media produced by generative models. Our recent studies find that the propagation characteristics of authentic and fake news are distinguishable, irrespective of their modalities. In this regard, we have investigated the auxiliary information based on social context to detect fake news. This paper has analyzed the social context of fake news detection with a hybrid graph neural network based approach. This hybrid model is based on integrating a graph neural network on the propagation of news and bi directional encoder representations from the transformers model on news content to learn the text features. Thus this proposed approach learns the content as well as the context features and hence able to outperform the baseline models with an f1 score of 0.91 on PolitiFact and 0.93 on the Gossipcop dataset, respectively

READ FULL TEXT VIEW PDF

page 1

page 2

page 5

04/26/2019

Fake News Early Detection: A Theory-driven Model

The explosive growth of fake news and its erosion of democracy, justice,...
02/10/2019

Fake News Detection on Social Media using Geometric Deep Learning

Social media are nowadays one of the main news sources for millions of p...
09/13/2021

Hetero-SCAN: Towards Social Context Aware Fake News Detection via Heterogeneous Graph Neural Network

Fake news, false or misleading information presented as news, has a grea...
02/25/2022

GAME-ON: Graph Attention Network based Multimodal Fusion for Fake News Detection

Social media in present times has a significant and growing influence. F...
12/13/2022

Exploring Fake News Detection with Heterogeneous Social Media Context Graphs

Fake news detection has become a research area that goes way beyond a pu...
09/28/2020

Transformers Are Better Than Humans at Identifying Generated Text

Fake information spread via the internet and social media influences pub...
07/07/2020

Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

Although significant effort has been applied to fact-checking, the preva...

I Introduction

Social platforms like Facebook and Twitter are becoming increasingly popular for day-to-day news consumption due to ease of access, low cost, and fast news dissemination [1]. As a result, these platforms have increasingly become the dominant source of information. However, the authenticity of news on these platforms is suspicious without any regulatory mechanism. Hence, social media also enable the wide propagation of fake news, implanted with false information. False or misleading information, commonly known as fake news, can cause significant damage to individuals and society. A series of recent incidents have demonstrated the potential of fake news in damaging personal, economic and national integrity. For example, a recent rumor caused the death of 800 people after the consummation of alcohol-based cleaning products as a cure for Covid-19 [2], and the influence of fake news in the democratic process of a country [13]. Looking at these consequences of fake news, many organizations have considered it a global challenge.

Fake news can be classified as parody, satire, fabricated news, propaganda, etc.

[34]. Moreover, the term also confine the concepts of disinformation (intentionally misleading information), misinformation (information that can be proven to be false), manipulation, and rumors [9]. Although many definitions of fake news exist, there is no universally accepted one. One generalized definition of fake news introduced in literature [43] is ”fake news is intentionally and verifiably false news published by a news outlet.”

Traditional sources of media such as television and newspapers have a structure of one-to-many. However, with its millions of monthly active users, social platforms such as Twitter are examples of a many-to-many approach. Therefore, surveillance of information diffused in such platforms is relatively complicated. Moreover, news on social platforms is emerging unprecedented, making it increasingly difficult to fact-check. Many fact-checking websites such as Snopes [32], and Politifact [23]

exist to combat the problem of fake news. However, most of these are purely on manual methods and are difficult to scale up. Therefore, researchers are now focusing on data-driven or machine learning-based approaches to automatically and accurately detect fake news. Most of these approaches are based on user and content-based features, which are insufficient to address state-of-art generative modalities. However, a recent study shows that the propagation characteristics of news vary based on their nature, irrespective of their content properties

[37]. Therefore, features corresponding to the propagation patterns of news could be effectively used as a basis for fake news detection on social media [5] [14] [31]. Propagation patterns are useful in incorporating the context of social influence, but in contrast to content-based features, the features have the advantage of being language and content-agnostic.

This paper explores a method of constructing the propagation graph of the social media network, following the propagation structure of Twitter posts. We then explored the graph neural network-based representation learning algorithm to extract the propagation features from the structured graph automatically. A hybrid model is built that exploits both the context features extracted with the graph neural network and the content features with the transformer model and embeds both the textual and structural information into a high-level abstract representation that can be effective for better analysis of the propagated tweet in a social network. We empirically evaluate our proposed model on two public datasets from Twitter, Politifact, and Gossipcop. The proposed model is compared with the baseline models on multiple evaluation indicators such as Accuracy, Precision, Recall, F1-score, and AUC. Moreover, we have also analyzed and compared the performance of the proposed model with the model based on manually extracted propagation characteristics of news for characterization of its authenticity. The details of the proposed approach are discussed in the following sections.

Ii Background and Related Work

Fake news detection has gotten much attention in recent years as a research subject. The existing approaches of fake news detection in the literature typically are of three types (Shu et al. (2017)) [29], namely news-based, user-based and propagation-based, which are based on the use of different types of information available in the social media. Moreover, news-based approaches fall into the category of content-based approaches, whereas the other two approaches lead to context-based. Content-based and Context-based approaches merged are also getting popular nowadays for effective detection of propagation of fake news, which is also called the mixed approach [31].

Content-based approaches attempt to solve the problem of fake news classification by using the news article’s headline and body; hence it is called content-based. The underlying idea of this kind of approach is that fake news exhibit a significantly different presentation style than real news. In a work presented by Perez et al. [22], different text-based content features, namely, Ngrams, Punctuation, Psycholinguistics features, readability, and syntax, are extracted from the text of the news. The linear SVM model was trained on such features. In another work presented by Horne et al. [6], similar feature engineering was applied but considered satire as one of the classes along with fake and real. In work proposed by Nor et al. [19], weak labeling was utilized, where labeling of news articles was done based on which category their source belonged to. The considered features extracted from the news content are Style, Complexity, Bias, Affect, Moral, and Event. Wang et al. [39] experimented on the Politifact dataset and used six unique labels for the target variable. Traditional ML algorithms were trained mainly focusing on neural nets, which provided promising results over all the considered models.

Context-based approaches mainly rely on propagation patterns of news on social media networks (e.g., Twitter) to classify news articles. Propagation patterns are constructed by considering interactions between tweets and users’ following, retweets, and likes. In work proposed by Wu et al. [40], a hybrid kernel function based on a random walk graph kernel and an RBF kernel using propagation features was proposed to model the propagation behavior of fake news. Fake news spreads can be easily modeled as graphs on social media platforms, and Graph Neural Networks recently have been popular for automatically extracting propagation features of graphs and designing better models for fake news detection [16, 41, 14, 5]. Propagation patterns have the distinct advantage of being language and content-agnostic. Comparatively limited studies [5] [14] [31] have been found in leveraging propagation features for detection of fake news. The graph classification approach aims to optimize the use of propagation features, and the success of graph neural network approaches in the prior works [5, 42, 42, 11] motivates further investigation of the graph neural network model for characterization of fake news.

Mixed approaches are getting the most attention nowadays to combat the limitations of both content and context-based approaches by combining their advantages. The mixed approach uses both propagation pattern and content, usually in the text, to verify the validity of news articles. Ruchansky et al. [26]

proposed a model consisting of multiple components to extract representations of articles and to extract representations of users. Both these components are then integrated to get a resultant vector to be used for the final classification task. Kai et al.

[30] proposed a framework consisting of five components: a semi-supervised classification component, a publisher-news relation embedding component, a user embedding component, a news content embedding component, and a user-news interaction embedding component. The latent representations for news content and users are learned via non-negative matrix factorization, and the problem is then formalised as an optimization over the above components. In work proposed by Nguyen et al. [18], a propagation graph was built using news sources, news articles, social users, and interactions between two entities at a given time. Additionally, the stance of the tweet with respect to the news title was also taken into account. Bi-directional LSTM (Bi-LTSM) was used to optimize the fake news detection objective. The approach emphasizes learning generalization representations for social entities by optimizing three concurrent losses, namely, Stance loss, Proximity loss, and Fake News Detection Loss.

Iii Dataset description and Pre-processing

(a) Politifact Fake
(b) Politifact Real
(c) GossipCop Fake
(d) GossipCop Real
Fig. 1: Word clouds of news content

In this work, we have used a public data repository, FakeNewsNet [28]. The repository consists of comprehensive datasets from two popular fact-checking websites, Politifact and GossipCop. The datasets include social context, news content, and other dynamic information for the fact-checking websites. Politifact [23] project is based on U.S. politics that reports on accuracy of statements made by elected officials, their staffs, lobbyists, candidates, interest groups and many others involved in. Whereas, GossipCop [4] website fact-checks celebrity reporting. Both datasets have a number of articles whose ground truth is provided by its source (i.e. assigned by independent journalists). The word clouds representations of both the types of news are provided in Figure 1.

Iii-a Data Collection and Pre-preprocessing

We extracted the available information of the news based on the approach mentioned in work [28]: news body, tweets, retweets and user profiles, that is relevant to tweet ids for every news article in the datasets. For every news article in the dataset relevant tweet ids are available. However, we discarded the news articles with missing text content. Some statistics of the collected datasets are provided in Table I.

Politifact Gossipcop
Fake Real Fake Real
News Articles 432 624 5323 16817
Tweets 164,892 399,237 519,581 876,967
Unique Users 201,748 596,435 504,638 199,031
TABLE I: Dataset Statistics

Iii-B Propagation Graph Construction

For every news article in the dataset, Twitter API was used to retrieve its Tweets and Retweets. Propagation graph is then constructed to model how information disseminates from one user to another. However, Twitter API does not provide an immediate source of a retweet. For example, has been retweeted in . If is retweeted again in , twitter would store as source of both retweets. In order to determine the immediate source of a retweet, all tweets and retweets in the given set are sorted based on timestamp. Let {, , …} be a set of sorted tweets and retweets. For each

, its immediate source is searched from tweets/retweets in the same set which were published earlier using the following heuristics,

  1. is identified as source of if owner of mentions owner of .

  2. Or if is published within a certain period of time after .

  3. Else is considered as source of .

Following the above mentioned heuristics return an information cascade for tweet. All such cascades are connected to the source news node to construct a propagation graph of the news articles as shown in Figure 2. Followers/Following information is not considered in construction of propagation graphs due to the strict Twitter API rate limits which would limit their availability at inference time.

Fig. 2: Yellow nodes indicate real news while Black nodes indicate fake news. Square are tweets and triangles are retweets of the particular news considered. Orange colored nodes indicate author of that node(tweet/retweet) has more than 10000 followers. Blue nodes indicate author of the node is verified. Size of a node is proportional to the number of followers of its author

Iii-C Feature Engineering

Feature extraction performed in this work is divided into two parts - node-level and graph-level.

Iii-C1 Node-level features

Each node in the propagation graph is either a tweet or retweet with a corresponding user. Node-level features are usually extracted on the basis of its characteristics and neighborhood. Different node level features are considered including user-based, text-based and temporal as mentioned in Table II. User-based features are basically the characteristics of author of node, which includes verified status, number of followers and number of friends. Friends in twitter terminology are accounts that a user follows. Text-based features are extracted from text content of the node (tweet/retweet) and try to capture the sentiment of the tweet. These include number of hastags, number of users mentioned, sentiment score using VADER[7] and frequency of positive and negative words. The temporal features we collected include difference in publication time with source node, parent and neighbors, which take into account the timeline of the node and its neighbors.

Feature Class Feature Name
User Based Is verified user
Number of Friends
Number of Followers
Text Based Number of Hashtags
Number of mentions
Sentiment score computed using VADER
Frequency of positive words
Temporal Frequency of negative words
User account timestamp
Time difference with source node
Time difference with immediate predecessor
Average time difference with the immediate successors
TABLE II: Node-level features

Iii-C2 Graph-level features

A simple approach to extract a vector representation of a propagation graph would be applying aggregation techniques (averaging, max, min) on handcrafted node-level features [14]. In this paper, we have used simple averaging of node-level features along with meta information of graph as proposed by Meyers et. al. [14] as a baseline. We start with collection of basic statistics at graph-level - number of nodes, number of tweets and number of users. Next, mean aggregation is used to incorporate node-level information such as average number of friends, followers, retweets per tweets and time between tweet and its retweets. Finally, we collect temporal features such as total amount of time news article of referenced on twitter which is essentially the difference between the publication time of first and the last tweet, number of users involved in propagation after 10 hours of news publication and percentage of tweets/retweets published in first 60 minutes. Table III shows different graph-level features collected.

num_nodes Total number of tweets and retweets for a news article
num_tweets Number of tweets
avg_num_retweets Average number of retweets per tweet
retweet_perc Average number of retweets per tweet
num_users Number of unique users
total_propagation_time Amount of time news was referenced on twitter
avg_num_followers Number of followers averaged over all users
avg_num_friends Number of friends averaged over all users
perc_posts_1_hour Percentage of tweets in first hour of news publish time
users_10h Users reached in 10 hours
avg_time_diff Avg. time between a tweet and its retweet
TABLE III: Graph-level Features
Fig. 3: Proposed Workflow of early and late fusion methods.

Iv Methodology

Our proposed methodology incorporates basically three modules: (a) Graph Neural Network for modelling propagation context of news, (b) Pretrained Transformer model to learn from the news content, and (c) finally a mixed module to combine the representation of the above two modules.

Iv-a Graph Neural Network for modelling propagation pattern

Graph Neural Networks (GNN) are a class of neural network that operates directly on graph structures. Social media platforms, like twitter, can be modelled as graphs, as shown in Figure 2. GNN work on the principle of message passing or neighborhood aggregation which is a iterative process to generate node embeddings by aggregating information from local neighborhood. Consider a graph with a corresponding node feature matrix and adjacency matrix . After iterations of message-passing, the node embeddings can be represented by,

(1)

where is node embedding matrix or output of GNN after iterations, is message passing function with trainable parameters . GNNs aggregate the neighbourhood representation within hops and then apply a pooling such as , , to obtain the final representation of the node. The representation which incorporates the social context information can then be used to classify the graphs. The general steps involved in training of GNN involves:

  1. Generate node embeddings using multiple iterations of message passing

  2. Extract graph embedding by aggregation of node embedding

  3. Feed the embeddings into fully-connected layers for classification

Our work is based on building on the apparent potential of abstract features extracted by GNN on propagation network of twitter to detect fake news. The working principle can be defined as: Given a news propagation graph of a specific news item, that consists of a sets of tweets and retweets, how significant the propagation features are at classifying the news as fake or real. We applied here few most recent networks of GNN, namely GCNConv[8], GATConv[36] and GraphConv[17] for modelling propagation behaviours of fake news. The details of each of model are discussed below:

GCNConv: GCNConv [8]

is a graph-based semi-supervised learning algorithm outlined in where the learner is provided with an adjacency matrix,

, and node features, , as input and a subset of node labels, , for training. GCN is spectral based, where eigen-decomposition of the graph Laplacian is used in network propagation. This spectral method is used to aggregate neighboring nodes in a graph to infer the value of the current node. In GCNConv, eigen-decomposition is performed via approximation to reduce runtime. The propagation rule of GCNConv, can be represented by the following equation:

(2)

where, indicates the adjacency matrix with self-loops for every nodes, is intermediate node embedding matrix or output after applying message passing function on embedding with layer specific trainable parameters , is the degree matrix to normalise large degree nodes. is the corresponding diagonal degree matrix that acts as a normaliser to circumvent numerical instabilities. The adjacency matrix

consists of edge weights via the optional edge_weight tensor. The node-wise formulation is provided below:

(3)

where, is the neighboring nodes of node , , where denotes the edge weight from source node to target node .

GraphConv: GraphConv [17] is a generalization of graph neural networks capable of taking into account higher order graph structures at multiple scales. The message passing function for GraphConv is given by,

(4)

where denotes the edge weight from source node to target node .

GATConv: GATConv [36] is a attention-based graph neural network algorithm. It is an extension of GCNConv where instead of assigning same weight to each neighboring node, different weights are assigned through attention coefficients. This is achieved without use of expensive matrix calculations or prior knowledge of graph structure as provided below:

(5)

where the attention coefficients are computed as

(6)

Where, denotes the importance of node ’s features to node , and is the Neighbourhood of node .

Iv-B News content representation

Text content of a news article can provide important signals in distinguishing fake and real news. We adopt two approaches to get vector representation of text content, Doc2Vec[10] and Embeddings from pretrained transformer models [25]. Doc2Vec is an unsupervised algorithm and extension of Word2Vec [15] that computes vector representation of variable length documents. The difference between Word2Vec and Doc2Vec is the addition of a special token called Document ID which learns the vector representation of entire document. Another approach that we considered for encoding text content was making use of pretrained transformer models from sentence-transformers library [25]. Specifically, we considered (1) all-MiniLM-L12-v2 based on [38] (2) all-distilroberta-v1 based on distilled version of [12] (3) all-mpnet-base-v2 based on [33]

. Transformer networks output an embedding for each token in the input text which are then averaged to obtain fixed length embedding for the document. Since all the models that we considered had a maximum sequence length of 512 tokens (500 english words), we consider different parts of text as input when length of text is greater than 512 tokens. Specifically, we consider first 512 tokens, last 512 tokens and combination of first 256 and last 256 tokens.

Iv-C Mixed Approach: Combine context and content features

Research on multi-modal fusion has shown that models trained by combining data from multiple sources have a clear advantage over those trained using only one source [27, 35]. In our research we explore two fusion techniques to combine the content and context features - Early fusion and Late fusion. The mixed approach takes the benefits of both the modalities and hence can be assumed to be more effective. Early fusion based mixed approach is shown in Figure 3 involves concatenating vector representations of text content and propagation to be used as input to fully connected layers for classification. Dimensions of both modalities are reduced to 32 to prevent one modality from overwhelming the other modality. Conversely, Late fusion outputs final prediction by aggregating predictions from base-classifiers. We explore aggregation strategies for late fusion - mean and classifier-based. In Late Fusion mean approach, predictions from aggregated using simple while in classifier approach a meta-classifiers is trained on out-of-fold predictions of base-classifiers. Figure 3 also illustrates the late fusion architecture of our approach.

V Experimental Results and Analysis

The experiments were conducted on Google Colab Pro with 25 GB RAM and the codes were developed using python 3. Different libraries are used for the experimentation are Pytorch-Geometric

[3], sklearn[21], sentence-transformers[25], Gensim[24], and Pandas[20]. The classification metrics used are Accuracy, F1-score, Precision, Recall.

Dataset preparation (splitting and sampling) details for modelling are provided in Table IV

. For Politifact, the train-test dataset ratio was kept at 4:1. For training of models, 90% of samples are randomly selected for training from train dataset and remaining are used for validation. This process is repeated for 10 times and average of the considered evaluation metrics are reported. This strategy is used because of small size of Politifact dataset. For GossipCop, the dataset is split into train-test-val in ratio of 70:15:15. Cross validation is not used for GossipCop because of large size of dataset. In case of Late fusion classifier, 3-fold inner cross validation(CV) is used to generate out-of-fold predictions. Splitting of dataset is done in a stratified manner such that ratio of fake to real news remains same in all sets. Same splits are used for all experiments to ensure consistency and fair estimate of performance. Wherever applicable, each model is trained for maximum of 50 epochs and best model weights are selected from epoch with lowest validation loss. Learning is set to 0.001 and batch size of 64 is used.

Politifact GossipCop
Train-Test split 80:20 85:15
Train-Val split 10-fold CV 82.35:17.65
Sampling Random over sampling None
Class Weights None Uniform
TABLE IV: Dataset Splitting
Classifier Politifact GossipCop
F1 Precision Recall Accuracy F1 Precision Recall Accuracy
PassiveAggressiveClassifier 0.394982 0.475000 0.347086 0.601018 0.554933 0.898990 0.496567 0.571099
RidgeClassifier 0.443599 0.481250 0.289315 0.720028 0.678024 0.874576 0.775107 0.742265
LogisticRegression 0.464448 0.480000 0.325717 0.766790 0.680060 0.882470 0.760515 0.738644
SGDClassifier 0.441990 0.469565 0.330019 0.709389 0.697815 0.894500 0.767811 0.752469
ExtraTreesClassifier 0.463866 0.473913 0.341212 0.773682 0.875082 0.933305 0.954936 0.913101
RandomForestClassifier 0.484482 0.475000 0.403515 0.816466 0.882283 0.935565 0.959657 0.918367
TABLE V: Classification scores using graph-level features
Convolutional
Layer
Politifact GossipCop
F1 Precision Recall Accuracy F1 Precision Recall Accuracy
GraphConv 0.78861 0.755794 0.79047 0.788201 0.90702 0.96215 0.94935 0.93252
GATConv 0.76672 0.72966 0.79047 0.766926 0.90483 0.95616 0.95493 0.93186
GCNConv 0.794199 0.73147 0.866664 0.794632 0.9022 0.95372 0.95536 0.93021
TABLE VI: Classification scores using Graph Neural Networks
Encoding Technique Truncation type Politifact GossipCop
F1 Precision Recall Accuracy F1 Precision Recall Accuracy
all-mpnet-base-v2 First 512 0.85044 0.81605 0.8619 0.85032 0.85088 0.88491 0.9339 0.85615
Last 512 0.85044 0.81179 0.87142 0.85027 0.84626 0.88537 0.92489 0.85055
First 256 Last 256 0.87043 0.848 0.87619 0.8715 0.84859 0.87699 0.94549 0.85648
all-distilroberta-v1 First 512 0.8418 0.80286 0.8619 0.84167 0.83795 0.86042 0.96309 0.85187
Last 512 0.84329 0.81243 0.8619 0.8438 0.83589 0.87144 0.9339 0.84364
First 256 Last 256 0.86712 0.8375 0.88571 0.86734 0.83935 0.87656 0.92961 0.84891
all-MiniLM-L12-v2 First 512 0.83946 0.81125 0.84761 0.84761 0.83808 0.87606 0.92832 0.8443
Last 512 0.82649 0.79331 0.83809 0.82678 0.83517 0.87243 0.93047 0.84233
First 256 Last 256 0.83733 0.81724 0.83333 0.83741 0.84094 0.8824 0.92103 0.84529
doc2Vec 0.74438 0.69244 0.79047 0.74574 0.7239 0.78986 0.96952 0.778801
TABLE VII: Classification scores using Text features
Fusion Technique Politifact GossipCop
F1 Precision Recall Accuracy F1 Precision Recall Accuracy
Early Fusion 0.871431 0.853061 0.871426 0.871642 0.93632 0.95681 0.96051 0.93647
Late Fusion - Mean 0.885229 0.873164 0.88095 0.88649 0.93111 0.96386 0.97296 0.95128
Late Fusion - Classifier 0.914531 0.891666 0.940691 0.917442 0.93011 0.9569 0.95567 0.93245
TABLE VIII: Classification scores using fusion techniques

For the purpose of classification using graph-level features, we experimented with traditional machine learning algorithms such as ensemble methods, logistic regression and stochastic gradient descent. Specificaly, following algorithms are considered - PassiveAggressiveClassifier, RidgeClassifier, LogisticRegression, SGDClassifier, ExtraTreesClassifier and RandomForestClassifier. The performance of different classifiers using graph-level features on Politifact and GossipCop is illustrated in Table

V.

We performed comprehensive experiments on the considered dataset to gauge effectiveness of different modalities (Text features and context features) in classification of fake news. Four sets of experimentation are performed as provided below:

  1. Classification based on manually extracted Graph-level features

  2. Automatic Graph-level classification (GNNs directly applied on propagation graph of news)

  3. Classification based on Content (i.e Text) features of news

  4. Mixed model classification, i.e. combination of both text based and propagation based features

Classifiers could not perform that well on Politifact with RandomForestClassifier reaching maximum f1-score of 0.48. The likely reason for this is the small size of Politifact dataset which does not allow for learning of meaningful patterns. However, a decent f1-score of 0.88 is reached on GossipCop dataset. In both cases, RandomForestClassifier performs best closely followed by ExtraTreesClassifier.

Using Graph Neural Networks, we achieve a maximum f1-score of 0.79 on Politifact with GCNConv and 0.907 on GossipCop with GraphConv. We experimented with different number of layers and embeddings size for convolutional and found that 4 layers and 64 dimensions provides comparable results to other settings while having a shorter training time. Results are shown in Table VI.

Text content of a news article provide important signals in distinguishing fake and real news. The results on this modality using different models as discussed in section methodology is provided in Table VII. When classifying using text content, we consider a specific part of text input if it is longer than maximum sequence length of 512 tokens. In case of Politifact dataset, we found that using combination of first 256 and last 256 provides significantly better results than considering first 512 or last 512 tokens as shown in Table VII. Similar but less pronounced trend can be observed in GossipCop as well. This can be attributed to fact that taking first 256 and last 256 tokens effectively summarizes the entire news article. all-mpnet-base-v2 outperforms all-distilroberta-v1 and all-MiniLM-L12-v2 on all truncation techniques on both datasets. At cost of slight decrease in accuracy, all-distilroberta-v1 computes the embeddings in half the time as all-mpnet-base-v2.

Proposed fusion techniques perform significantly better than unimodal models (text and GNN), as shown in Table VIII. All fusion techniques achieved significant improvements over baseline text and GNN models when using GossipCop dataset. An improvement of 3% is achieved over best performing unimodal model. On Politifact, Late Fusion with mean aggregation performs sligthly better than Early fusion but has the advantage of having significantly lesser training time than classifier approach while Late Fusion with classifier outperforms both fusion and baseline model by atleast 3%. Likely conclusion that can be derived from these reuslts is that Late Fusion Mean can be used as a good trade-off between accuracy and computational expense when training data is sparse. Since there was no significant difference between f1-scores of fusion techniques on GossipCop, Early fusion is the best choice here because of low training time compared to other fusion techniques.

Vi Conclusion and Future Scope

In this work, we explored a method of detecting fake news based on its propagation characteristics on social media and its content. Experiments were performed to demonstrate that using both content and propagation characteristics provides better performance than relying on a single modality. Fake news detection will be most beneficial when fake news can be identified at an early propagation stage. GNN approaches that can learn on structured data like graphs seem to be promising for investigating such directions. Moreover, most past approaches used in fake news detection are not interpretable. Hence, exploring the proposed model for interpretation and explanation of the achieved results can be another future direction of our research.

References

  • [1] P.R. Centre (2021) News consumption across social media in 2021. Note: https://www.pewresearch.org/journalism/2021/09/20/news-consumption-across-social-media-in-2021/Accessed: 2022-01-19 Cited by: §I.
  • [2] A. Coleman (2020) ’Hundreds dead’ because of covid-19 misinformation. Note: https://www.bbc.com/news/world-53755067Accessed: 2021-07-15 Cited by: §I.
  • [3] M. Fey and J. E. Lenssen (2019) Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, Cited by: §V.
  • [4] Gossipcop.com. Note: https://www.suggest.com/c/entertainment/gossip-cop/ Cited by: §III.
  • [5] Y. Han, S. Karunasekera, and C. Leckie (2020) Graph neural networks with continual learning for fake news detection from social media. arXiv preprint arXiv:2007.03316. Cited by: §I, §II.
  • [6] B. Horne and S. Adali (2017) This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. Cited by: §II.
  • [7] C. Hutto and E. Gilbert (2014)

    Vader: a parsimonious rule-based model for sentiment analysis of social media text

    .
    In Proceedings of the international AAAI conference on web and social media, Vol. 8, pp. 216–225. Cited by: §III-C1.
  • [8] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §IV-A, §IV-A.
  • [9] D. M. Lazer, M. A. Baum, Y. Benkler, A. J. Berinsky, K. M. Greenhill, F. Menczer, M. J. Metzger, B. Nyhan, G. Pennycook, D. Rothschild, et al. (2018) The science of fake news. Science 359 (6380), pp. 1094–1096. Cited by: §I.
  • [10] Q. Le and T. Mikolov (2014) Distributed representations of sentences and documents. In International conference on machine learning, pp. 1188–1196. Cited by: §IV-B.
  • [11] Y. Liu and Y. B. Wu (2018) Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In

    Thirty-second AAAI conference on artificial intelligence

    ,
    Cited by: §II.
  • [12] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. Cited by: §IV-B.
  • [13] A. Marwick and R. Lewis (2017) Media manipulation and disinformation online. New York: Data & Society Research Institute, pp. 7–19. Cited by: §I.
  • [14] M. Meyers, G. Weiss, and G. Spanakis (2020) Fake news detection on twitter using propagation structures. In Multidisciplinary International Symposium on Disinformation in Open Online Media, pp. 138–158. Cited by: §I, §II, §III-C2.
  • [15] T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Cited by: §IV-B.
  • [16] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein (2017)

    Geometric deep learning on graphs and manifolds using mixture model cnns

    .
    In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 5115–5124. Cited by: §II.
  • [17] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe (2019) Weisfeiler and leman go neural: higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 4602–4609. Cited by: §IV-A, §IV-A.
  • [18] V. Nguyen, K. Sugiyama, P. Nakov, and M. Kan (2020) Fang: leveraging social context for fake news detection using graph representation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1165–1174. Cited by: §II.
  • [19] J. Nørregaard, B. D. Horne, and S. Adalı (2019) Nela-gt-2018: a large multi-labelled news dataset for the study of misinformation in news articles. In Proceedings of the international AAAI conference on web and social media, Vol. 13, pp. 630–638. Cited by: §II.
  • [20] Pandas-dev/pandas: pandas External Links: Document, Link Cited by: §V.
  • [21] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §V.
  • [22] V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea (2018) Automatic fake news detection. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, Cited by: §II.
  • [23] Politifact.com. Note: https://www.politifact.com/ Cited by: §I, §III.
  • [24] R. Rehurek and P. Sojka (2011) Gensim–python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic 3 (2). Cited by: §V.
  • [25] N. Reimers and I. Gurevych (2019-11) Sentence-bert: sentence embeddings using siamese bert-networks. In

    Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing

    ,
    External Links: Link Cited by: §IV-B, §V.
  • [26] N. Ruchansky, S. Seo, and Y. Liu (2017) Csi: a hybrid deep model for fake news detection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 797–806. Cited by: §II.
  • [27] A. Sebastianelli, M. P. Del Rosso, P. P. Mathieu, and S. L. Ullo (2021) Paradigm selection for data fusion of sar and multispectral sentinel data applied to land-cover classification. arXiv preprint arXiv:2106.11056. Cited by: §IV-C.
  • [28] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu (2018) Fakenewsnet: a data repository with news content, social context and spatialtemporal information for studying fake news on social media. arXiv preprint arXiv:1809.01286. Cited by: §III-A, §III.
  • [29] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD explorations newsletter 19 (1), pp. 22–36. Cited by: §II.
  • [30] K. Shu, S. Wang, and H. Liu (2019) Beyond news contents: the role of social context for fake news detection. In Proceedings of the twelfth ACM international conference on web search and data mining, pp. 312–320. Cited by: §II.
  • [31] A. Silva, Y. Han, L. Luo, S. Karunasekera, and C. Leckie (2021) Propagation2Vec: embedding partial propagation networks for explainable fake news early detection. Information Processing & Management 58 (5), pp. 102618. Cited by: §I, §II, §II.
  • [32] Snope.com. Note: https://www.snopes.com/ Cited by: §I.
  • [33] K. Song, X. Tan, T. Qin, J. Lu, and T. Liu (2020) Mpnet: masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems 33, pp. 16857–16867. Cited by: §IV-B.
  • [34] E. C. Tandoc Jr, Z. W. Lim, and R. Ling (2018) Defining “fake news” a typology of scholarly definitions. Digital journalism 6 (2), pp. 137–153. Cited by: §I.
  • [35] B. Tardy, J. Inglada, and J. Michel (2017) Fusion approaches for land cover map production using high resolution image time series without reference data of the corresponding period. Remote Sensing 9 (11), pp. 1151. Cited by: §IV-C.
  • [36] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §IV-A, §IV-A.
  • [37] S. Vosoughi, D. Roy, and S. Aral (2018) The spread of true and false news online. Science 359 (6380), pp. 1146–1151. Cited by: §I.
  • [38] W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou (2020) Minilm: deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems 33, pp. 5776–5788. Cited by: §IV-B.
  • [39] W. Y. Wang (2017) ” Liar, liar pants on fire”: a new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648. Cited by: §II.
  • [40] K. Wu, S. Yang, and K. Q. Zhu (2015) False rumors detection on sina weibo by propagation structures. In 2015 IEEE 31st international conference on data engineering, pp. 651–662. Cited by: §II.
  • [41] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip (2020) A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32 (1), pp. 4–24. Cited by: §II.
  • [42] X. Zhou and R. Zafarani (2018) Fake news: a survey of research, detection methods, and opportunities. arXiv preprint arXiv:1812.00315 2. Cited by: §II.
  • [43] X. Zhou and R. Zafarani (2020) A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR) 53 (5), pp. 1–40. Cited by: §I.