Rumour Detection via News Propagation Dynamics and User Representation Learning

by   Tien Huu Do, et al.

Rumours have existed for a long time and have been known for serious consequences. The rapid growth of social media platforms has multiplied the negative impact of rumours; it thus becomes important to early detect them. Many methods have been introduced to detect rumours using the content or the social context of news. However, most existing methods ignore or do not explore effectively the propagation pattern of news in social media, including the sequence of interactions of social media users with news across time. In this work, we propose a novel method for rumour detection based on deep learning. Our method leverages the propagation process of the news by learning the users' representation and the temporal interrelation of users' responses. Experiments conducted on Twitter and Weibo datasets demonstrate the state-of-the-art performance of the proposed method.


Rumor Detection on Social Media: Datasets, Methods and Opportunities

Social media platforms have been used for information and news gathering...

Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media

Breaking news leads to situations of fast-paced reporting in social medi...

Vulnerable to Misinformation? Verifi!

We present Verifi2, a visual analytic system to support the investigatio...

Sensing Social Media Signals for Cryptocurrency News

The ability to track and monitor relevant and important news in real-tim...

Modelling the Behavior Classification of Social News Aggregations Users

This paper deals with actual fuzzy logic approach for modelling the beha...

ICE: Information Credibility Evaluation on Social Media via Representation Learning

With the rapid growth of social media, rumors are also spreading widely ...

RP-DNN: A Tweet level propagation context based deep neural networks for early rumor detection in Social Media

Early rumor detection (ERD) on social media platform is very challenging...

I Introduction

Rumours are items of unverified circulating information [1], which have been known for serious consequences. The growth of social media platforms creates fertile ground for rumours, thereby rendering rumour detection of great significance. However, detecting rumours is a challenging task; studies have reported that humans are not good at identifying rumours [2]. On the other hand, researchers have studied rumours from different points of view. There exist two prominent approaches for rumour detection: the content-based and social-context-based approaches. In the content-based approach, rumours are detected based on the content of news and prior knowledge extracted from vast data sources [3] or the writing style of the news [4]. Alternatively, the social-context-based approach exploits the social engagements of social media users, e.g., replies on Twitter [5]. Using this approach, the massive quantity of user opinion can be aggregated, revealing the credibility level of the news [6]. Furthermore, the social-context-based methods can uncover the hidden temporal propagation pattern of the news [7]. As such, the social-context-based approach has recently become popular thanks to its good performance and the availability of additional information [1].

Fig. 1: Responses of social media users toward news on the Twitter (left) and Weibo (right) datasets [8, 9, 10]. For each one-hour interval, we calculate the average number of social posts associated to all news articles. The blue and red lines show the average number of posts for genuine news and rumours, respectively.

In this work, we address the problem of rumour detection on social media using social context information. We consider it a binary classification problem with two classes, i.e., non-rumour and rumour. By analyzing existing datasets, i.e., the Twitter and Weibo datasets [8, 9, 10], we observed some peculiarities in the propagation process of news through social media users. Firstly, there is a difference in the numbers of posts towards rumours and genuine news across time instances, as illustrated in Fig. 1

. Secondly, some users are more vulnerable to misleading information than others. As a result, these users tend to be involved in the spreading of many rumours in social media. Inspired by these observations, we aim to detect rumours by recognizing the peculiarities of the propagation process of the news. To this end, we design a novel propagation-driven model based on recurrent neural networks (RNNs) for rumour detection, which we name

Dual RNN for Rumour Detection (DRRD).

Our contributions in this paper are: (i) we propose the DRRD model, which can effectively learn the propagation pattern of news via its social engagements. We conjecture that the propagation pattern is an important factor to detect rumours. Furthermore, (ii) we design a novel padding-and-scaling procedure to improve the input features of the proposed model leveraging our observations; (iii) we propose a novel user representation learning technique exploiting the historical interactions of social media users across multiple news articles; and (iv) we perform a series of experiments on two benchmark datasets and show that our model outperforms the existing methods in detecting rumours.

The rest of this paper is organized as follows. In Section II we review related studies. The details of our method are given in Section III, and the experimental study is presented in Section IV. Section V concludes our work.

Ii Related Work

The content-based rumour detection approach considers the textual content of news. Methods following this approach can be further divided into knowledge-based and style-based. Knowledge-based methods often rely on domain experts to perform rumour detection, and thus, require a huge amount of laborious effort. Moreover, human experts cannot keep up with the enormous volume of online information. Therefore, computational knowledge-based methods have been introduced, including the key fact extraction [11] and the knowledge graph [3]

methods. On the other hand, the style-based methods leverage the language peculiarities of the news to detect rumours by using natural language processing (NLP) features, such as

lexical, part-of-speech, linguistic inquiry and word count (LIWC) or deep syntax features [12, 4, 13, 14]. Style-based methods do not require additional data; however, their performance is limited as misleading information is often manipulated meticulously, making it difficult to detect deceptive writing styles. There also exist content-based methods that exploit news’ creator profiles, partisan information or enclosed media. These methods often employ deep learning models, leveraging their advantage of fusing high-level features [15, 16, 17]. Although different types of information about news are integrated in these models, the propagation pattern of the news is ignored. In contrast, our method is not based on the news content; instead, we focus on the propagation process of the news and the interactions of social media users.

Alternative methods rely on the reactions of social media users towards news. These methods can be subcategorized into stance-based and propagation-based ones. In the stance-based methods, the viewpoints of relevant posts are taken into account to assess the veracity of the news. This idea has been realized in [6][18] using label propagation and boolean label crowdsourcing (BLC), respectively. Alternatively, a number of studies have proposed to leverage the propagation process by means of retweet trees [8], temporal interrelation [10], conditional random fields [19], or a hierarchical propagation model [20]. Recently, many studies have applied deep learning for debunking rumours based on the propagation pattern by using recurrent neural networks (RNNs) [9, 21, 22]

, convolutional neural networks (CNNs) 

[23, 24] and combined CNN-RNN models [25]. In [26]

, a deep neural network model was proposed for fake news classification. While the model is able to effectively capture the temporal propagation pattern of the news, its capacity to generalize to unseen users is restricted because of the singular-value-decomposition (SVD) based approach deployed to learn the user feature. Motivated by 

[26], we design a novel model capable of learning the propagation pattern from multiple perspectives. Furthermore, we devise a special padding-and-scaling procedure to support the learning of the propagation pattern. To overcome the limitation of the SVD-based approach in [26], we propose using a doc2vec [27] model to learn the users’ representation, which is generalizable to unseen users and less computationally expensive to calculate.

Iii The proposed rumour detection method

Fig. 2: The architecture of the proposed DRRD model.

Iii-a Problem Formulation and Notation

We address the rumour detection problem using social context information. Let us assume that a news article reports a unique event and let be the set of such events. An event has multiple social engagements, which refer to posts on social networks created by users that share or like the corresponding news article. Let  define the set of social engagements concerning the event , then , where represents the social post, is the user who makes the post, and is the corresponding timestamp. Let be the binary label set of the events. Our goal is to establish a mathematical model

predicting the probability for an event

to be a rumour given its social engagements , that is,

Concerning the early detection of rumours, we consider a set of social engagements within a deadline . Let define the set of social engagements established before the deadline , then the rumour probability of the event within is .

Iii-B Data Partitioning Strategy

In order to exploit the propagation pattern of news on social media, the relevant social posts have to be organized following a chronological order, i.e., by means of partitioning. For example, [23] divided the posts into partitions of different time intervals such that the numbers of posts in the intervals are equal. However, we argue that the partitioning technique in [23] ignores the intrinsic variation in the number of posts across the propagation process of the news, as indicated in Fig. 1. Therefore, we follow a natural way of partitioning by grouping posts by hour [10, 26]. Specifically, the timestamp of the earliest post concerning an event indicates the first appearance of the event. Moreover, the difference in hour(s) of a relevant post and the earliest post defines the hour index of the post. The posts of an event with the same hour index are then put into the same partition. An event is thereby represented by a sequence of hour partitions. We introduce a special padding-and-scaling technique to promote the variation of posts in partitions, presented in the following section.

Iii-C Model Intuition and Structure

Our model, which is depicted in Fig. 2, is based on recurrent neural networks. It consists of three modules, namely, the TextUser and Integration modules.

Iii-C1 The Text Module

In [9], it was shown that the frequency of question words in rumour posts is much higher than in non-rumour posts in certain time windows. Furthermore, as indicated in Fig. 1, there exists a difference in the number of social posts regarding rumours and true news. The text module is designed to capture these patterns.

Firstly, using the corpus of social posts associated with the events in the training set we train a doc2vec model [27], which has been proven useful in many NLP-related tasks [28, 5]. Using the trained doc2vec model, we obtain an embedding with

dimensions for each social post. Subsequently, the embeddings of the posts in the same hour partition are averaged element-wise, constructing the representation of the partition. We employ identity vectors, i.e., vectors with all 

entries to represent partitions that contain no posts. An event is, therefore, represented by a matrix , where is the number of hours partitions. Each partition embedding is then scaled by a logarithmic coefficient defined by


where is the number of posts of the -th partition. The purpose of this scaling is to capture the variation of the number of posts across partitions. Moreover, the logarithm is used to smoothen the coefficients as the values of may vary significantly across the partitions; for instance, the number of posts within an hour in the Weibo dataset ranges from to posts.

The padded and scaled representation is then passed to a two-layer RNN [29]. We choose the gated recurrent units (GRUs) architecture [30] as it is easier to train compared to the long short-term memory counterpart (LSTMs) [31], which was deployed in [26]. We, then, track the outputs of the RNN for all time steps , with  denoting the dimension of the output vector, and apply max-pooling-over-time to obtain the output feature vector  of the Text module. Namely, the -th element of the output feature vector is calculated as


Iii-C2 The User Module

The user module is designed to capture the involvement of social media users in the propagation of news. In [26], it was shown that suspicious users tend to present a group behaviour, namely, most suspicious users are often involved in the rumours. To leverage this behaviour, [26] established a user adjacency matrix, which was factorized using the SVD to obtain a representation for all users. However, the method is computationally expensive, especially for a large number of users, and non-scalable, since the adjacency matrix and, in turn, the SVD need to be re-calculated for every new user.

Unlike [26], we do not focus on the group behaviour but the sequence of user interactions with events across time. Specifically, we encode each user as a short document whose words are the names of the events the user interacts with. For instance, if the user tweets about the events , and , we use the document of the names  to represent . The resulting document is then used to learn the user representation by means of the doc2vec [32] model. Per hour partition, the embeddings of users are averaged and scaled [using (1)], similarly to the operations in the text module. The resulting embedding per partition is passed to a two-layer RNN network; then, max-pooling-over-time is applied yielding the output of this module.

It is worth noting that, as shown in Table I, a user makes on average only a few posts. This means that a user appearing in the training set is less likely to be present in the test set as well. Even in this case, user embeddings are still effectively learned thanks to the generalizability of the doc2vec model.

Iii-C3 Integration

The outputs and of the text and user modules are concatenated to achieve a high-level representation characterizing the propagation dynamics of news. The concatenated vector is then fed to a fully connected layer, performing linear and softmax transformations to obtain the final prediction. We use the cross entropy loss for binary classification with labels {rumour, non-rumour} as objective function, and we minimize it using the Adam algorithm [33].

Weibo Twitter Twitter (incomplete)
Num. users 2.819.338 233.719 210.838
Num. events 4664 992 991
Num. posts 3.752.459 592.391 510.147
Num. rumours 2313 498 498
Num. non-rumours 2351 494 493
TABLE I: The description of the Weibo and Twitter datasets.

Iv Experiments

Iv-a Datasets

We employed two real-world datasets to evaluate the proposed model, which are collected from Weibo [9] and Twitter [8, 10, 9], respectively. Table I gives the description of these datasets. Only the IDs of relevant posts and the labels for each event are provided in each dataset, which means that one needs to crawl the data from the Weibo and Twitter application programming interfaces (APIs). The posts in the Weibo dataset can be retrieved completely, while in the Twitter dataset, many tweets were removed, thus it cannot be retrieved completely via the Twitter API. According to our calculation, the number of missing tweets is about of the original number reported in [9]. Our experiments are conducted on the Weibo and the incomplete Twitter datasets. In what follows, when we mention the Twitter dataset we refer to the incomplete Twitter dataset.

Iv-B Experimental Setting

Model Class Weibo Twitter
Accuracy Precision Recall Accuracy Precision Recall
SVM-RBF R 0.818 0.822 0.812 0.817 0.715 0.698 0.809 0.749
N 0.815 0.824 0.819 0.741 0.610 0.669
DTC R 0.831 0.847 0.815 0.831 0.718 0.721 0.711 0.716
N 0.815 0.847 0.830 0.715 0.725 0.720
RFC R 0.849 0.786 0.959 0.864 0.728 0.742 0.737 0.740
N 0.947 0.739 0.830 0.713 0.718 0.716
SVM-TS R 0.857 0.878 0.830 0.857 0.745 0.707 0.864 0.778
N 0.947 0.739 0.830 0.809 0.618 0.701
GRU-2 R 0.910 0.876 0.956 0.914 0.757 0.732 0.815 0.771
N 0.952 0.864 0.906 0.788 0.698 0.771
CAMI R 0.933 0.921 0.945 0.933 0.777 0.744 0.848 0.793
N 0.945 0.921 0.932 0.820 0.705 0.758
CSI R 0.932 0.938 0.924 0.931 0.787 0.755 0.854 0.802
N 0.926 0.94 0.933 0.828 0.719 0.77
SRRD R 0.949 0.953 0.944 0.949 0.748 0.764 0.723 0.743
N 0.946 0.955 0.950 0.732 0.773 0.752
DRRD R 0.968 0.959 0.979 0.969 0.806 0.817 0.795 0.804
N 0.978 0.958 0.968 0.798 0.804 0.809
TABLE II: Extended rumour detection performance of the DRRD model in comparison with baseline models (R:Rumour, N:Non-Rumour)

For the doc2vec model, we employ the Distributed Bag-of-Word (DBOW) version with dimensions for both the text and user embeddings. In the RNN network, we set the number of hidden units to for both hidden layers. Similarly, the final fully connected layer has hidden units. In all layers, we use the tanh

as activation function. To avoid overfitting, we use dropout regularization 

[34] for the RNN and the final fully connected layer. We empirically choose a dropout rate of

. Our model is implemented using Tensorflow.

In order to evaluate the performance of the proposed model, we conduct experiments using two settings. In the first setting, all the posts in the entire time-span of the given dataset are considered. We call it the extended detection setting. In the second setting, we consider only the posts appeared within specific deadlines; this setting is referred to as early detection. In both settings, we adhere to the data splitting that is considered in previous studies [9, 23]. Namely, for each dataset, we hold a random set of 10% of events for model fine-tuning. The rest of the events are split with a 3:1 ratio for training and testing, respectively, leading to a 4-fold cross validation scheme. Similar to [9, 23], we compare our method against the following schemes: 1) SVM-RBF [35], 2) DTC [8], 3) RFC [36], 4) SVM-TS [10], 5) GRU-2 [9], 6) CAMI [23] and 7) CSI [26]. The results of the first six methods are taken from [23, 10], whereas those of the CSI model [26] are obtained by our implementation. This is because the evaluation in [26] considers a different dataset splitting strategy. Furthermore, in order to validate the capacity of our DRRD model in learning user representations, we replace the proposed user module with the SVD-based module presented in [26]. We refer to this modified DRRD model as the SRRD model (SVD-based RNN rumour detection). We assess the performance of the considered models in terms of the accuracy, precision, recall and F1-score metrics.

Iv-C Extended Rumour Detection Results

Fig. 3: Early detection performance of baselines and our method on the incomplete Twitter (left) and Weibo (right) datasets.

The results for the proposed model (both the DRRD and the SRRD versions) and the baselines are reported in Table II

. The CAMI and CSI models, which are deep-learning-based models, achieve good performance; nevertheless, the proposed model delivers the best performance for both datasets. Specifically, our model yields the best detection results in terms of accuracy, precision, recall and F1-score on the Weibo dataset. On the Twitter dataset, our model achieves comparable results with other models in terms of the precision and recall metrics, and the best results in terms of the accuracy and F1-score metrics.

Furthermore, the results in Table II corroborate the superior performance of the proposed user module in learning user representation in comparison with the SVD-based approach (see results obtained with the SRNN version). Specifically, using the proposed user module improves the accuracy by more than 2% compared to the SVD-based counterpart on both the Weibo and Twitter datasets. It also leads to better performance in terms of the precision, recall and F1-score metrics on both datasets.

Iv-D Early Rumour Detection Results

Figure 3 shows the accuracy of the DRRD model and the baseline models on the Weibo and Twitter datasets for the early detection setting. On the Weibo dataset, the proposed model outperforms the other models at all considered deadlines and the best performance of the DRRD is achieved when h. Although we observe some fluctuations in the performance on the Twitter dataset, the DRRD model still outperforms the other models at most of the deadlines, with the best performance obtained when h.

The reasons that explain why the proposed model can detect rumours effectively within the very first hours after an event starts circulating on social media are as follows. Firstly, as illustrated in Fig. 1, most of the social media posts are made during the first few hours following the publication of an article. Secondly, the variation in the number of posts is more pronounced during these first hours. The higher the variation in the number of posts, the more information it reveals about the propagation process. Alternatively, one may notice that the performance of DRRD slightly decreases when more data is available (e.g., ). This is because the propagation patterns of rumours and genuine news tend to be similar over time.

V Conclusion

Misleading information is an important issue nowadays with serious consequences. There have been many studies addressing this problem, however, detecting this kind of disinformation effectively and timely still remains a challenging task. In this work, we presented a deep neural-network-based model capable of detecting rumours via learning propagation dynamics and user representations. The proposed model was shown to achieve superior results compared to various state-of-the-art models on two benchmark datasets.