With the rapid development of the Internet, social media has become a convenient online platform for users to obtain information, express opinions and communicate with each other. As more and more people are keen to participate in discussions about hot topics and exchange their opinions on social media, many rumors appear. Due to a large number of users and easy access to social media, rumors can spread widely and quickly on social media, bringing huge harm to society and causing a lot of economic losses. Therefore, regarding to the potential panic and threat caused by rumors, it is urgent to come up with a method to identify rumors on social media efficiently and as early as possible.
, Support Vector Machine (SVM). Some studies apply more effective features, such as user comments , temporal-structural features , and the emotional attitude of posts . However, those methods mainly rely on feature engineering, which is very time-consuming and labor-intensive. Moreover, those handcrafted features are usually lack of high-level representations extracted from the propagation and the dispersion of rumors.
Recent studies have exploited deep learning methods that mine high-level representations from propagation path/trees or networks to identify rumors. Many deep learning models such as Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recursive Neural Networks (RvNN)[14, 17]
are employed since they are capable to learn sequential features from rumor propagation along time. However, these approaches have a significant limitation on efficiency since temporal-structural features only pay attention to the sequential propagation of rumors but neglect the influences of rumor dispersion. The structures of rumor dispersion also indicate some spreading behaviors of rumors. Thus, some studies have tried to involve the information from the structures of rumor dispersion by invoking Convolutional Neural Network (CNN) based methods[26, 27]. CNN-based methods can obtain the correlation features within local neighbors but cannot handle the global structural relationships in graphs or trees . Therefore, the global structural features of rumor dispersion are ignored in these approaches. Actually, CNN is not designed to learn high-level representations from structured data but Graph Convolutional Network (GCN) is .
So can we simply apply GCN to rumor detection since it has successfully made progress in various fields, such as social networks , physical systems , and chemical drug discovery ? The answer is no. As shown in Figure 1(a), GCN, or called undirected GCN (UD-GCN), only aggregates information relied on the relationships among relevant posts but loses the sequential orders of follows. Although UD-GCN has the ability to handle the global structural features of rumor dispersion, it does not consider the direction of the rumor propagation, which however has been shown to be an important clue for rumor detection . Specifically, deep propagation along a relationship chain  and wide dispersion across a social community  are two major characteristics of rumors, which is eager for a method to serve both.
To deal with both propagation and dispersion of rumors, in this paper, we propose a novel Bi-directional GCN (Bi-GCN), which operates on both top-down and bottom-up propagation of rumors. The proposed method obtains the features of propagation and dispersion via two parts, the Top-Down graph convolutional Networks (TD-GCN) and Bottom-Up graph convolutional Networks (BU-GCN), respectively. As shown in Figure 1(b) and 1(c), TD-GCN forwards information from the parent node of a node in a rumor tree to formulate rumor propagation while BU-GCN aggregates information from the children nodes of a node in a rumor tree to represent rumor dispersion. Then, the representations of propagation and dispersion pooled from the embedding of TD-GCN and BU-GCN are merged together through full connections to make the final results. Meanwhile, we concatenate the features of the roots in rumor trees with the hidden features at each GCN layer to enhance the influences from the roots of rumors. Moreover, we employ DropEdge  in the training phase to avoid over-fitting issues of our model. The main contributions of this work are as follows:
We leverage Graph Convolutional Networks to detect rumors. To the best of our knowledge, this is the first study of employing GCN in rumor detection of social media.
We propose the Bi-GCN model that not only considers the causal features of rumor propagation along relationship chains from top to down but also obtains the structural features from rumor dispersion within communities through the bottom-up gathering.
We concatenate the features of the source post with other posts at each graph convolutional layer to make a comprehensive use of the information from the root feature and achieve excellent performance in rumor detection.
Experimental results on three real-world datasets show that our Bi-GCN method outperforms several state-of-the-art approaches; and for the task of early detection of rumors, which is quite crucial to identify rumors in real time and prevent them from spreading, Bi-GCN also achieves much higher effectiveness.
In recent years, automatic rumor detection on social media has attracted a lot of attention. Most previous work for rumor detection mainly focuses on extracting rumor features from the text contents, user profiles and propagation structures to learn a classifier from labeled data [3, 24, 11, 12, 28]. Ma et al.  classified the rumor by using the time-series to model the variation of handcrafted social context features. Wu et al.  proposed a graph kernel-based hybrid SVM classifier by combining the RBF kernel with a random-walk-based graph kernel. Ma et al.  constructed a propagation tree kernel to detect rumors by evaluating the similarities between their propagation tree structures. These methods not only were ineffective but also heavily relied on handcrafted feature engineering to extract informative feature sets.
In order to automatically learn high-level features, a number of recent methods were proposed to detect rumor based on deep learning models. Ma et al. utilized Recurrent Neural Networks (RNN) to capture the hidden representation from temporal content features. Chen et al.  improved this approach by combining attention mechanisms with RNN to focus on text features with different attentions. Yu et al.  proposed a method based on Convolutional Neural Network (CNN) to learn key features scattered among an input sequence and shape high-level interactions among significant features. Liu et al.  incorporated both RNN and CNN to get the user features based on time series. Recently, Ma et al.  employed the adversarial learning method to improve the performance of rumor classifier, where the discriminator is used as a classifier and the corresponding generator improves the discriminator by generating conflicting noises. In addition, Ma et al. built a tree-structured Recursive Neural Networks (RvNN) to catch the hidden representation from both propagation structures and text contents . However, these methods are too inefficient to learn the features of the propagation structure, and they also ignore the global structural features of rumor dispersion.
Compared to the deep-learning models mentioned above, GCN is able to capture global structural features from graphs or trees better. Inspired by the success of CNN in the field of computer vision, GCN has demonstrated state-of-the-art performances in various tasks with graph data[1, 5, 7]. Scarselli et al.  firstly introduced GCN as a special massage-passing model for either undirected graphs or directed graphs. Later on, Bruna et al.  theoretically analyzed graph convolutional methods for undirected graphs based on the spectral graph theory. Subsequently, Defferrard et al.  developed a method named the Chebyshev Spectral CNN (ChebNet) and used the Chebyshev polynomials as the filter. After this work, Kipf et al.  presented a first-order approximation of ChebNet (1stChebNet), where the information of each node is aggregated from the node itself and its neighboring nodes. Our rumor detection model is inspired by the GCN.
We introduce some fundamental concepts that are necessary for our method. First the notation used in this paper is as follows.
Let be the rumor detection dataset, where is the -th event and is the number of events. , where refers to the number of posts in , is the source post, each represents the -th relevant responsive post, and refers to the propagation structure. Specifically, is defined as a graph with being the root node [23, 16], where , and that represents the set of edge from responded posts to the retweeted posts or responsive posts, as shown in Figure 1(b). For example, if has a response to , there will be an directed edge , i.e., . If has a response to , there will be an directed edge , i.e., . Denote as an adjacency matrix where
Denote as a feature matrix extracted from the posts in , where represents the feature vector of and each other row feature represents the feature vector of .
Moreover, each event is associated with a ground-truth label (i.e., False Rumor or True Rumor). In some cases, the label is one of the four finer-grained classes (i.e., Non-rumor, False Rumor, True Rumor, and Unverified Rumor) [16, 29]. Given the dataset, the goal of rumor detection is to learn a classifier
where and are the sets of events and labels respectively, to predict the label of an event based on text contents, user information and propagation structure constructed by the related posts from that event.
Graph Convolutional Networks
Recently, there is an increasing interest in generalizing convolutions to the graph domain. Among all the existing works, GCN is one of the most effective convolution models, whose convolution operation is considered as a general ”message-passing” architecture as follows:
where is the hidden feature matrix computed by the Graph Conventional Layer (GCL) and is the message propagation function, which depends on the adjacency matrix A, the hidden feature matrix and the trainable parameters .
In the above equation is the normalized adjacency matrix, where = (i.e., adding self-connection), = that represents the degree of the node; ; and
DropEdge is a novel method to reduce over-fitting for GCN-based models 
. In each training epoch, it randomly drops out edges from the input graphs to generate different deformed copies with certain rate. As a result, this method augments the randomness and the diversity of input data, just like rotating or flapping images at random. Formally, suppose the total number of edges in the graphA is and the dropping rate is , then the adjacency matrix after DropEdge, , is computed as below:
where is the matrix constructed using edges randomly sampled from the original edge set.
Bi-GCN Rumor Detection Model
In this section, we propose an effective GCN-based method for rumor detection based on the rumor propagation and the rumor dispersion, named as Bi-directional Graph Convolutional Networks (Bi-GCN). The core idea of Bi-GCN is to learn suitable high-level representations from both rumor propagation and rumor dispersion. In our Bi-GCN model, two-layer 1stChebNet are adopted as the fundamental GCN components. As shown in Figure 2, we elaborate the rumor detection process using Bi-GCN in 4 steps.
We first discuss how to apply the Bi-GCN model to one event, i.e., for the -th event. The other events are calculated in the same manner. To better present our method, we omit the subscript in the following content.
1 Construct Propagation and Dispersion Graphs
Based on the retweet and response relationships, we construct the propagate structure for a rumor event . Then, let and X be its corresponding adjacency matrix and feature matrix of based on the spreading tree of rumors, respectively. A only contains the edges from the upper nodes to the lower nodes as illustrated in Figure 1(b). At each training epoch, percentage of edges are dropped via Eq. (3) to form , which avoid penitential over-fitting issues . Based on and X, we can build our Bi-GCN model. Our Bi-GCN consists of two components: a Top-Down Graph Convolutional Network (TD-GCN) and a Bottom-Up Graph Convolutional Network (BU-GCN). The adjacency matrices of two components are different. For TD-GCN, the adjacency matrix is represented as . Meanwhile, for BU-GCN, the adjacency matrix is . TD-GCN and BU-GCN adopt the same feature matrix X.
2 Calculate the High-level Node Representations
After the DropEdge operation, the top-down propagation features and the bottom-up propagation features are obtained by TD-GCN and BU-GCN, respectively.
By substituting and X to Eq. (2) over two layers, we write the equations for TD-GCN as below:
where and represent the hidden features of two layer TD-GCN. and are the filter parameter matrices of TD-GCN. Here we adopt ReLU function as the activation function, . Dropout  is applied on GCN Layers (GCLs) to avoid over-fitting. Similar to Eqs. (4) and (5), we calculate the bottom-up hidden features and for BU-GCN in the same manner as Eq. (4) and Eq. (5).
3 Root Feature Enhancement
As we know, the source post of a rumor event always has abundant information to make a wide impact. It is necessary to better make use of the information from the source post, and learn more accurate node representations from the relationship between nodes and the source post.
Consequently, besides the hidden features from TD-GCN and BU-GCN, we propose an operation of root feature enhancement to improve the performance of rumor detection as shown in Figure 2. Specifically, for TD-GCN at the -th GCL, we concatenate the hidden feature vectors of every nodes with the hidden feature vector of the root node from the -th GCL to construct a new feature matrix as
with . Therefore, we express TD-GCN with the root feature enhancement by replacing in Eq. (5) with , and then get as follows:
4 Representations of Propagation and Dispersion for Rumor Classification
The representations of propagation and dispersion are the aggregations from the node representations of TD-GCN and BU-GCN, respectively. Here we employ mean-pooling operators to aggregate information from these two sets of the node representations. It is formulated as
Then, we concatenate the representations of propagation and the representation of dispersion to merge the information as
Finally, the label of the event
is calculated via several full connection layers and a softmax layer:
is a vector of probabilities for all the classes used to predict the label of the event.
We train all the parameters in the Bi-GCN model by minimizing the cross-entropy of the predictions and ground truth distributions, , over all events, .
regularizer is applied in the loss function over all the model parameters.
In this section, we first evaluate the empirical performance of our proposed Bi-GCN method in comparison with several baseline models. Then, we investigate the effect of each variant of the proposed method. Finally, we also examine the capability of early rumor detection for both the proposed method and the compared methods.
Settings and Datasets
We evaluate our proposed method on three real-world datasets: Weibo , Twitter15 , and Twitter16 . Weibo and Twitter are the most popular social media sites in China and the U.S., respectively. In all the three datasets, nodes refer to users, edges represent retweet or response relationships, and features are the extracted top-5000 words in terms of the TF-IDF values as mentioned in the Bi-GCN Rumor Detection Model Section. The Weibo dataset contains two binary labels: False Rumor (F) and True Rumor (T), while Twitter15 and Twitter16 datasets contains four labels: Non-rumor (N), False Rumor (F), True Rumor (T), and Unverified Rumor (U). The label of each event in Weibo is annotated according to Sina community management center, which reports various misinformation . And the label of each event in Twitter15 and Twitter16 is annotated according to the veracity tag of the article in rumor debunking websites (e.g., snopes.com, Emergent.info, etc) . The statistics of the three datasets are shown in Table 1.
|# of posts||3,805,656||331,612||204,820|
|# of Users||2,746,818||276,663||173,487|
|# of events||4664||1490||818|
|# of True rumors||2351||374||205|
|# of False rumors||2313||370||205|
|# of Unverified rumors||0||374||203|
|# of Non-rumors||0||372||205|
|Avg. time length / event||2,460.7 Hours||1,337 Hours||848 Hours|
|Avg. # of posts / event||816||223||251|
|Max # of posts / event||59,318||1,768||2,765|
|Min # of posts / event||10||55||81|
We compare the proposed method with some state-of-the-art baselines, including:
DTC : A rumor detection method using a Decision Tree classifier based on various handcrafted features to obtain information credibility.
SVM-RBF : A SVM-based model with RBF kernel, using handcrafted features based on the overall statistics of the posts.
SVM-TS : A linear SVM classifier that leverages handcrafted features to construct time-series model.
SVM-TK : A SVM classifier with a propagation Tree Kernel on the basis of the propagation structures of rumors.
RvNN : A rumor detection approach based on tree-structured recursive neural networks with GRU units that learn rumor representations via the propagation structure.
PPC_RNN+CNN : A rumor detection model combining RNN and CNN, which learns the rumor representations through the characteristics of users in the rumor propagation path.
Bi-GCN: Our GCN-based rumor detection model utilizing the Bi-directional propagation structure.
We implement DTC and SVM-based models with scikit-learn111https://scikit-learn.org
; PPC_RNN+CNN with Keras222https://keras.io/
; RvNN and our method with Pytorch333https://pytorch.org/. To make a fair comparison, we randomly split the datasets into five parts, and conduct 5-fold cross-validation to obtain robust results. For the Weibo dataset, we evaluate the Accuracy (Acc.) over the two categories and Precision (Prec.), Recall (Rec.), F1 measure () on each class. For the two Twiter datasets, we evaluate Acc. over the four categories and
on each class. The parameters of Bi-GCN are updated using stochastic gradient descent, and we optimize the model by Adam algorithm. The dimension of each node’s hidden feature vectors are 64. The dropping rate in DropEdge is 0.2 and the rate of dropout is 0.5. The training process is iterated upon 200 epochs, and early stopping  is applied when the validation loss stops decreasing by 10 epochs. Note that we do not employ SVM-TK on the Weibo dataset due to its exponential complexity on large datasets.
First, among the baseline algorithms, we observe that the deep learning methods performs significantly better than those using hand-crafted features. It is not surprising, since the deep learning methods are able to learn high-level representations of rumors to capture valid features. This demonstrates the importance and necessity of studying deep learning for rumor detection.
Second, the proposed method outperforms the PPC_RNN+CNN method in terms of all the performance measures, which indicates the effectiveness of incorporating the dispersion structure for rumor detection. Since RNN and CNN cannot process data with the graph structure, PPC_RNN+CNN ignores important structural features of rumor dispersion. This prevents it from obtaining efficient high-level representations of rumors, resulting in worse performance on rumor detection.
Finally, Bi-GCN is significantly superior to the RvNN method. Since RvNN only uses the hidden feature vector of all the leaf nodes so that it is heavily impacted by the information of the latest posts. However, the latest posts are always lack of information such as comments, and just follow the former posts. Unlike RvNN, the root feature enhancement allows the proposed method to pay more attention to the information of the source posts, which helps improve our models much more.
To analyze the effect of each variant of Bi-GCN, we compare the proposed method with TD-GCN, BU-GCN, UD-GCN and their variants without the root feature enhancement. The empirical results are summarized in Figure 3. UD-GCN, TD-GCN, and BU-GCN represent our GCN-based rumor detection models utilize the UnDirected, Top-Down and Bottom-Up structures, respectively. Meanwhile, ”root” refers to the GCN-based model with concatenating root features in the networks while ”no root” represents the GCN-based model without concatenating root features in the networks. Some conclusions are drawn from Figure 3. First, Bi-GCN, TD-GCN, BU-GCN, and UD-GCN outperforms their variants without the root feature enhancement, respectively. This indicates that the source posts plays an important role in rumor detection. Second, TD-GCN and BU-GCN can not always achieve better results than UD-GCN, but Bi-GCN is always superior to UD-GCN, TD-GCN and BU-GCN. This implies the importance to simultaneously consider both top-down representations from the ancestor nodes, and bottom-up representations from the children nodes. Finally, even the worst results in Figures 3 are better than those of other baseline methods in Table 2 and 3 by a large gap, which again verifies the effectiveness of graph convolution for rumor detection.
Early Rumor Detection
Early detection aims to detect rumor at the early stage of propagation, which is another important metric to evaluate the quality of the method. To construct an early detection task, we set up a series of detection deadlines and only use the posts released before the deadlines to evaluate the accuracy of the proposed method and baseline methods. Since it is difficult for the PPC_RNN+CNN method to process the data of variational lengths, we cannot get the accurate results of PPC_RNN+CNN at each deadline in this task, so it is not compared in this experiment.
Figure 4 shows the performances of our Bi-GCN method versus RvNN, SVM-TS, SVM-RBF and DTC at various deadlines for the Weibo and Twitter datasets. From the figure, it can be seen that the proposed Bi-GCN method reaches relatively high accuracy at a very early period after the source post initial broadcast. Besides, the performance of Bi-GCN is remarkably superior to other models at each deadline, which demonstrates that structural features are not only beneficial to long-term rumor detection, but also helpful to the early detection of rumors.
In this paper, we propose a GCN-based model for rumor detection on social media, called Bi-GCN. Its inherent GCN model gives the proposed method the ability of processing graph/tree structures and learning higher-level representations more conducive to rumor detection. In addition, we also improve the effectiveness of the model by concatenating the features of the source post after each GCL of GCN. Meanwhile, we construct several variants of Bi-GCN to model the propagation patterns, i.e., UD-GCN, TD-GCN and BU-GCN. The experimental results on three real-world datasets demonstrate that the GCN-based approaches outperform state-of-the-art baselines in very large margins in terms of both accuracy and efficiency. In particular, the Bi-GCN model achieves the best performance by considering both the causal features of rumor propagation along relationship chains from top to down propagation pattern and the structural features from rumor dispersion within communities through the bottom-up gathering.
The authors would like to thank the support of Tencent AI Lab and Tencent Rhino-Bird Elite Training Program. This work is supported by the National Natural Science Foundation of Guangdong Province (2018A030313422), National Natural Science Foundation of China (Grant No. 61773229, No. 61972219) and Overseas Cooperation Research Fund of Graduate School at Shenzhen, Tsinghua University (Grant No. HW2018002).
-  (2016) Interaction networks for learning about objects, relations and physics. In Advances in neural information processing systems, pp. 4502–4510. Cited by: Introduction, Related Work.
-  (2014) Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR2014), CBLS, April 2014, pp. http–openreview. Cited by: Introduction, Related Work, Graph Convolutional Networks.
-  (2011) Information credibility on twitter. In Proceedings of the 20th international conference on World wide web, pp. 675–684. Cited by: Introduction, Related Work, 1st item.
-  (2018) Call attention to rumors: deep attention based recurrent neural networks for early rumor detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 40–52. Cited by: Related Work.
-  (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: Introduction, Related Work, Graph Convolutional Networks.
-  (2010) Crowdsourcing credibility: the impact of audience feedback on web page credibility. In Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem-Volume 47, pp. 59. Cited by: Introduction.
-  (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: Introduction, Related Work.
-  (2014) Energy model for rumor propagation on social networks. Physica A: Statistical Mechanics and its Applications 394, pp. 99–109. Cited by: Introduction.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: Experimental Setup.
-  (2017) Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations, Cited by: Introduction, Related Work, Graph Convolutional Networks.
-  (2013) Prominent features of rumor propagation in online social media. In 2013 IEEE 13th International Conference on Data Mining, pp. 1103–1108. Cited by: Introduction, Related Work.
-  (2015) Real-time rumor debunking on twitter. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1867–1870. Cited by: Introduction, Related Work.
Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks.
32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 354–361. Cited by: Related Work, 6th item.
-  (2016) Detecting rumors from microblogs with recurrent neural networks. In Ijcai, pp. 3818–3824. Cited by: Introduction, Related Work, Datasets.
-  (2015) Detect rumors using time series of social context information on microblogging websites. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1751–1754. Cited by: Related Work, 3rd item.
-  (2017) Detect rumors in microblog posts using propagation structure via kernel learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 708–717. Cited by: Related Work, Notation, Notation, 4th item, Datasets.
-  (2018) Rumor detection on twitter with tree-structured recursive neural networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1980–1989. Cited by: Introduction, Related Work, 5th item.
-  (2019) Detect rumors on twitter by promoting information campaigns with generative adversarial learning. In The World Wide Web Conference, pp. 3049–3055. Cited by: Related Work.
-  (2019) The truly deep graph convolutional networks for node classification. arXiv preprint arXiv:1907.10903. Cited by: Introduction, DropEdge, 1 Construct Propagation and Dispersion Graphs.
-  (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: Related Work.
Dropout: a simple way to prevent neural networks from overfitting.
The journal of machine learning research15 (1), pp. 1929–1958. Cited by: 2 Calculate the High-level Node Representations.
-  (2007) Lies, damn lies, and rumors: an analysis of collective efficacy, rumors, and fear in the wake of katrina. Sociological Spectrum 27 (6), pp. 679–703. Cited by: Introduction.
-  (2015) False rumors detection on sina weibo by propagation structures. In 2015 IEEE 31st international conference on data engineering, pp. 651–662. Cited by: Introduction, Introduction, Related Work, Notation.
-  (2012) Automatic detection of rumor on sina weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, pp. 13. Cited by: Introduction, Related Work, 2nd item.
-  (2007) On early stopping in gradient descent learning. Constructive Approximation 26 (2), pp. 289–315. Cited by: Experimental Setup.
-  (2017) A convolutional approach for misinformation identification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 3901–3907. Cited by: Introduction, Related Work.
-  (2019) Attention-based convolutional approach for misinformation identification from massive and noisy microblog posts. Computers & Security 83, pp. 106–121. Cited by: Introduction.
-  (2015) Enquiring minds: early detection of rumors in social media from enquiry posts. In Proceedings of the 24th International Conference on World Wide Web, pp. 1395–1405. Cited by: Related Work.
-  (2018) Detection and resolution of rumours in social media: a survey. ACM Computing Surveys (CSUR) 51 (2), pp. 32. Cited by: Notation.