With the increasing popularity of social media, news posted by users can spread rapidly. At the same time, due to the circulation and openness of social media, it has also become an aid to the rapid spread of rumors. Studies have shown that rumors spread more widely and quickly than real news on social media networks, with rumors spreading 6-20 times faster than non-rumors (Lazer et al, 2018). Furthermore, experience shows that rumors on social media may mislead the public on important issues such as those related to society (Domm, 2013), the economy (Cao et al, 2018), and healthcare (Friggeri et al, 2014) , and can even endanger personal safety (Dinh and Parulian, 2020; Yu et al, 2019). Accordingly, detecting and debunking rumors on social media is an urgent issue that needs to be addressed (Tan et al, 2022).
Such problems have prompted many academics and industries to investigate automated methods that detect rumors on social media platforms. In recent surveys (Zubiaga et al, 2018), rumors are often defined as ”an item of circulating information whose veracity status is yet to be verified at the time of posting.” Early work focuses on manual feature-based approaches to implement online rumor detection (Castillo et al, 2011; Wu et al, 2015; Morris et al, 2012)
. With advances in deep learning methods, machines can learn richer, deeper features, such as multimodal features(Jin et al, 2017; Zhang et al, 2022), sentiment (Zhang et al, 2021), and language style (Popat, 2017), from social media. However, recent work has proposed that since rumor posts are highly similar to regular posts in linguistic features, it is difficult to guarantee the effectiveness of rumor detection by relying on content features alone (Zhu et al, 2021; Naumzik and Feuerriegel, 2022). Recent works have addressed this research gap by modeling and proving the effectiveness of the propagation structure of rumors through methods such as building a transfer graph (Roitero et al, 2020; Ma et al, 2018; Lin et al, 2021). Some studies have shown promising performance in recent works by modeling the propagation information of rumors through graph neural network (GNN) approaches (Bian et al, 2020; Yang et al, 2020; Lao et al, 2021). For example, Bian et al. (Bian et al, 2020) obtained structural semantics by a 2-layer bidirectional GNN modeling rumor propagation and diffusion direction propagation. On this basis, Wei et al. (Wei et al, 2021)
proposed an edge-enhanced graph convolutional neural network to enhance structural and contextual features through propagation uncertainty.
Despite the success of GNNs, existing methods mainly focus on the fusion of propagation structure and node content features. Still, they neglect the study of propagation patterns, especially the characteristics of regionalized propagation, which is insufficient for GNNs to learn more rumor features, thus leading to a bottleneck in the performance of rumor detection methods based on propagation information. Previous studies have shown that the propagation pattern of rumors is usually a scope-like propagation after a certain node’s retweet (Doerr et al, 2012; Vosoughi et al, 2018; Loeb et al, 2020), which spreads to more social regions after spreading through the propagation nodes to independent social regions. Figure 1 shows an example of a scope-like propagation graph. In addition, due to the problem of GNN oversmoothness, existing rumor detection methods based on GNNs that learn the propagation structure only have two layers, which leads to difficulty in learning more hidden knowledge of the rumor structure.
The authors of the present paper believe that the learning ability of the regionalized propagation of propagation patterns should be enhanced to improve the performance of rumor detection using propagation structure information. In addition, to further improve the learning ability of GNNs for the deeper knowledge of propagation structure, the challenge of improving their overs moothing will be addressed while simultaneously increasing the number of layers of GNNs. In this regard, Li et al. (Li et al, 2020, 2019) proposed the ResGCN approach borrowed from the residual module in 3D reconstruction. However, the rumor detection task differs significantly from the 3D reconstruction.
Therefore, a novel region-enhanced deep graph convolutional neural network framework is proposed in the present paper to enhance the learning ability of the model for propagation information through the unsupervised learning of regionalized propagation patterns. In addition, by introducing a source text-enhanced residual graph convolutional neural network approach, which increases the depth of the GNN while effectively improving its smoothing problem, deeper knowledge of rumor structure can be effectively learned. The experiments are conducted on two real-world benchmark datasets—Twitter15 and Twitter16—and the effectiveness of the proposed model is demonstrated after extensive experimental results.
The main contributions of this paper can be summarized as follows:
The novel regionally-enhanced deep graph convolutional neural network framework (RDGCN) is proposed to improve the model’s ability to learn rumor propagation information through the learning of region propagation.
For the first time, a source text-enhanced residual graph convolutional neural network layer is proposed to improve the depth limit of GNNs in rumor detection tasks and improve the over smoothing of the GNN.
A new unsupervised learning module is designed for regionalized features in propagated messages.
Experiments on two real-world benchmark datasets demonstrate the effectiveness of the proposed model on both rumor detection and early rumor detection tasks.
2 Related Work
For the rumor detection task, early studies usually focused on traditional machine learning approaches based on manual features, including semantic(Enayet and El-Beltagy, 2017), sentiment (Castillo et al, 2011), temporal (Yang et al, 2012), and salient features based on rumor propagation. Among them, Hassan (2018) were the first to construct a network of users, messages, and events to accomplish the rumor detection task using a propagation science approach. Ma et al (2017) modeled propagation trees and then used SVMs with different kernels to detect rumors.
A large amount of recent work addresses the task of rumor detection using deep learning and demonstrates its effectiveness (Ma et al, 2016; Jin et al, 2017; Wang et al, 2018; Ran et al, 2022; Wang et al, 2022). Ma et al (2016)
learned the temporal knowledge of rumor propagation using recurrent neural networks. Many subsequent works improved upon this, for example, through attention mechanisms(Chen et al, 2018), LSTM (Kumar and Carley, 2019), and Transformer (Khoo et al, 2020). However, these studies put too much focus on the temporal information of propagation and ignored the topology of rumor propagation.
Ma et al (2018) learned rumor propagation structure information by building a recurrent neural network (RvNN) based on bidirectional propagation trees to fill this research gap. Bian et al (2020) then learned two propagation features of rumors, namely, diffusion and propagation, by building a 2-layer bidirectional graph convolutional neural network. Lu et al.  used graph convolutional neural networks to learn the structural features of rumors to enhance the learning ability oLu and Li (2020)f the semantic context of rumors. A large body of literature has demonstrated the effectiveness of GNNs in learning the structure of rumor propagation.
The current study only considers the information on rumor propagation structure and ignores the importance of rumor propagation patterns, which limits the ability of GNNs for rumor propagation information. In addition, due to the problem of the over smoothing of graph convolutional neural networks (Li et al, 2020, 2019), deep learning models in the field of rumor detection only use GNNs with no more than two layers, limiting their ability to mine deep-level propagation information. In this paper, an attempt is made to enhance the learning ability of GNNs for propagation information using the regionalized propagation pattern of rumors. In addition, the model is deepened by a source text-enhanced residual graph convolutional neural network layer to improve the over smoothing problem and enhance its learning ability.
3 Problem Statement
In this work, we address the realistic rumor detection scenario facing by social media platforms. In general, rumor detection be defind as a multi-classification task. Formally, a rumor detection dataset regarded as where is the claim and is claims’ number. For each claim , is the propagation graph, is the source tweet, is the relavant retweet, and is the tweets’ number in the claim . To be Specific, defined as a propagation graph , and is the root node of (Bian et al, 2020), in which represents the node set and is a set of directed edges from the tweet to its corresponding retweets. is represented as a feature matrix which extracted from posts in ,where
is the feature vector ofand each other row feature is the feature vector of . is represented as an adjacency matrix with the initial value is
Moreover, each claim has a corresponding a ground-truth label , where
is fine-grained classes. Our purpose is to train a classifier from claims to labels, which is.
4 The Proposed Model
This section proposes RDGCN for rumor detection, as shown in Figure 2. It is first described how the linear sequence learning module extracts the textual features of rumors. Then, a source-enhanced residual graph convolution layer (SRGCL) is proposed to improve the smoothness of the GCN while increasing the depth of the model. The process of regional propagation learning is then described, and a Bernoulli-Poisson module is introduced to optimize the effect of regional propagation learning with unsupervised learning.
4.1 Linear Sequence Learning
The spreading of information and rumors on social media form heterogeneous propagation structures and contain features of linear interactions. In other words, the public’s responses to rumor claims are dynamic and change over time. Since social contextual information can effectively explain the information diffusion process over time with sequential interactions (Lao et al, 2021)
, a linear sequence learning approach is proposed in the present work to aggregate the features of contextual nodes and represent sequential propagation. Here, long short-term memory (LSTM) aggregates contextual and sequential information from the propagating nodes. All nodes are arranged chronologically in a nonlinear structure to learn temporal features, where neighboring temporal nodes are constructed along the propagation nodes. The textual information of each node is input into the LSTM aggregator, and then the representation of the nodes can be enhanced to compute the vector representation for sequential propagation. Finally, the sequential features of the propagation graphwith as the source node are passed to the last hidden state , which is the linear sequence of features. The calculation is shown in Eq.1:
4.2 Source-enhanced Residual Graph Convolution Layer
. However, the over smoothing problem of GNNs leads to performance degradation due to the smoothing of the probability distribution when the GCLs exceed two layers. However, it is difficult to learn the deep knowledge implied due to the small number of network layers in the rumor propagation structure. Therefore, an SRGCL is proposed to improve the over smoothing phenomenon by the residual module and source text enhancement, which deepens the network layers while enhancing the ability of the GNN to learn and increase the network depth limit.
Inspired by ResNet (Verma et al, 2017) and ResGCN (Li et al, 2019), the residual module is introduced in the graph convolutional neural network layers. Since the increase in the number of layers leads to probabilistic smoothing problems, the residual module is transferred to the GCN to unleash its full potential. It allows deeper GCNs to converge in training and achieve superior performance in inference reliably. In the original graph learning framework, the underlying mapping takes the hidden node feature matrix of graph as the input and output to learn the new graph representation . The original graph learning method is shown in the Eq.2:
where and are the learnable weights of the aggregation and update functions, respectively.
Here, the underlying mapping is learned by fitting another mapping through residual graph learning. The graph
is summed with the hidden representationof the original graph after the transformation of layer to obtain . After representing as a residual mapping, the graph is taken as the input, the residual representation is the output that becomes the input of the next layer, and is the learnable parameter of layer , as shown in the Eq.3:
It is well known that the source of a rumor event is the node that has the richest information and widest influence within the whole rumor propagation graph. Therefore, it is meaningful to use the information from source posts and learn more accurate node representation from the relationship between nodes and source nodes.
The performance of rumor detection is improved by heel feature enhancement to enhance the node hidden representation using the source node. Specifically, for the graph feature representation of GraphSAGE after the SRGCL, a new feature matrix is constructed by connecting the hidden feature vector of each node with the hidden vector of the root node from the GraphSAGE layer of SRGCL (cf. Eq.4).
where . In addition, bidirectional SRGCL is implemented to learn the structural features in the directions of rumor propagation and dispersion , respectively. The calculation formulas are as Eq.5-Eq.8:
To improve the computational efficiency and reduce the computational parameters, the hidden vectors of each layer are dimensionalized using the fully connected layer. For ease of representation, the reduced formula is represented as before. The calculation formulas as Eq.9-Eq.10:
After the last layer of SRGCL, the model can effectively accumulate the hidden features of rumor propagation and diffusion structures, defined as and . They can finally be represented as graph representation features in two directions by average pooling. The calculation formulas are as shown in Eq.11-Eq.12:
4.3 Regionalized Propagation Learning
4.3.1 Regionalized Propagation Encoder
Rumors have a special propagation pattern, and the process of rumor propagation is often passed from one propagation region to a new one, as shown in Figure 1. Therefore, to learn the propagation pattern of such rumors, regionalized propagation learning is designed in the present study to delineate the propagation region features of rumors. The regionalized propagation features are computed by building an encoder that encodes the node features of the propagation graph as the input to its regionalized high-dimensional representation. The encoder encodes the propagation region features (cf.Eq.13).
In recent years, GCNs have demonstrated their strong effectiveness and strong coding power in learning heterogeneous structural features and community discovery algorithms (Su et al, 2021). Therefore, GCL is implemented here as an encoding layer to encode the learning of rumor region propagation pattern features. The calculation formula can be understood as Eq.14:
The output layer uses the activation function to ensure the non-negativity of the encoded features, whereand is the number of rumor ranges set before training.
4.3.2 Bernoulli-Poisson model
Due to the lack of relevant labels with region patterns in rumor detection, a Bernoulli-Poisson (BP) model is introduced to learn an unsupervised graph generation model for region propagation . The adjacency matrix terms of the propagation region features are sampled as Eq.15:
where and are the propagation pattern representation features of nodes and .
The maximum negative log-likelihood estimate of the BP model is shown in the Eq.16.
To further conform to the propagation pattern of the propagation graph, the second term of the above equation will provide a larger contribution to the loss because propagation graphs are usually sparse matrices. This is then counteracted by balancing the weights of these two terms using the imbalanced classification approach, as shown in Eq.17.
denote the uniform distribution of edges and non-edges present in the graph.
In addition, to enhance the training trend of SRGCL on the propagation pattern, the hidden representation of each layer is input into the BP model to calculate the loss. The final unsupervised loss can be calculated as Eq.18.
Finally, the graph representation features are calculated in directions and ,
The linear sequence features and the rumor region propagation pattern features
and the label probabilities of all classes can be defined by a fully connected layer and a function as shown in Eq.19.
where and are trainable matrices.
Then, the cross-entropy of the predictions and ground truth are calculated for supervised learning losses (cf. Eq.20).
In summary, training is used as the optimization objective for the sum of weights by minimizing cross-entropy with unsupervised losses, as shown in Eq.21.
where is the weight factor.
5 Experimental Setup
The model is evaluated on two real-world benchmark datasets 111https://github.com/majingCUHK/RvNN: Twitter15 (Ma et al, 2017) and Twitter16 (Ma et al, 2018). The statistics are shown in Table 1. Twitter15 and Twitter16 contain 1490 and 818 claims, respectively, and each claim is labeled as Non-rumor (NR), False Rumor (F), True Rumor (T), or Unverified Rumor (U). The dataset is randomly divided into five parts for five-fold cross-validation to obtain robust results.
|#of False rumors||370||205|
|#of Unverified rumors||374||203|
5.2.1 Comparison experiment baseline
The state-of-the-art baselines involved in the comparative experiments include:
SVM-TS (Ma et al, 2015): A linear SVM-based classifier that uses hand-craft features to construct a time series model.
GRU-RNN (Ma et al, 2016): A deep learning method for modeling sequence-structure features using RNN.
SVM-TK (Ma et al, 2017): An SVM classifier with propagation kernel based on rumor propagation structure.
RvNN (Ma et al, 2018): A rumor detection method based on a tree-structured recurrent neural network with GRU units that learns rumor representation by propagation structure.
BiGCN (Bian et al, 2020): A bidirectional GCN that simulates the rumor detection method with two propagation structures: bottom-up propagation and top-down diffusion.
EBGCN (Wei et al, 2021): A rumor detection method based on an edge-enhanced Bayesian graph convolutional neural network for improving uncertainty propagation.
5.2.2 Ablation experiment baseline
To demonstrate the effectiveness of each module of the model, we design ablation experiment baselines, including:
RDGCN (2-layers): The linear sequence learning module and the regionalized propagation learning module are added, and the propagation structure learning module is the same as BiGCN with 2-layer GCL.
BiGCN (4-layers): The GCL of the BiGCN model is stacked from two layers to four layers.
DeepGCN (4-layers): The SRGCL containing four layers is used to learn the propagation structure of rumors.
DeepGCN (6-layers): A source-enhanced residual GCN with six layers is used to understand the rumor propagation structure.
RDGCN (4-layers): The same method as the one presented in the paper containing a 4-layer SRGCL.
RDGCN (6-layers): The same method as the one presented in the paper, which includes a 6-layer SRGCL.
5.3 Parameter Settings
The node features used in this paper are Top-5000 words in terms of TF-IDF values extracted from retweeted posts. A 6-layer SRGCL is used, the convolution unit is GraphSAGE, and the hidden dimension output is [256, 256, 128, 128, 64, 64]. Additionally,
is set to [0.7, 0.6] for Twitter15 and Twitter16, respectively. The corresponding unsupervised loss weights are [0.3, 0.4], and the propagation range number K is set to [15, 20]. We set the learning rates of Twitter15 and Twitter16 to 0.0002 and 0.0005, respectively We set the maximum of training iterations to 200 epochs and verify that the loss reduction stops training before 10 epochs(Yuan et al, 2019)
. The optimal set of hyperparameters is determined by testing the performance on the fold-0 set of Twitter15 and Twitter16. The optimizer used was the AdamW optimization algorithm(Loshchilov and Hutter, 2017), and the l2 loss was set to 0.00001. Fifty iterations of five-fold cross-validation were used during the training process.
6 Results and Analysis
6.1 Performance Comparison with Baseline
Tables 2 and 3 show the rumor detection results on the Twitter15 and Twitter16 datasets. The proposed RDGCN achieves the best performance in the baseline. Specifically, for Twitter15, RDGCN performs better than the state-of-the-art model in terms of accuracy by 3.9% and F1 scores of the Non-rumor, False Rumor, True Rumor, and Unverified Rumor by 3.5%, 5.3%, 2.9%, and 2%, respectively. For Twitter16, the proposed model improves the accuracy by 5.8% and improves the F1 scores by 10.5%, 4.3%, 3.8%, and 9.9% for the four categories.
Moreover, it can be seen that deep learning-based methods are superior to traditional methods due to the use of manual features, which also reveals the superiority of deep learning for the advanced representation of rumor detection.
Furthermore, RDGCN indicates that topology learning for propagation structure will improve the approach that captures only temporal features and ignores topology that compare to the sequence-based model,.
On the other hand, the method proposed in the present study performs better than the state-of-the-art GNN-based BiGCN and EBGCN. The authors of the current study believe that there are two reasons for this. Firstly, the introduction of linear sequence features may enhance the model’s ability to learn the rumor-featured text and temporal features, which is not possible with the original BiGCN and EBGCN. Secondly, since the regionalization feature of the propagation pattern is considered, this propagation pattern in the model further determines the credibility of intraregional propagation. As the stability of intraregional propagation is further confirmed, the difference between rumors and non-rumors in terms of propagation pattern can be better distinguished, which is a feature that BiGCN and EBGCN also do not have.
Finally, through ablation experiments, the 2-layer RDGCN model here uses the same 2-layer GCL structure as BiGCN. By comparing the 2-layer RDGCN with BiGCN, it is further demonstrated that the learning module of regionalized propagation patterns can enhance the learning ability of the understanding of rumor propagation structure. Second, by comparing the RDGCN with 2, 4, and 6 layers, it can be seen that the accuracy rate increases with the growth of model depth. When comparing the BiGCN structure with 2-layer GCL and 4-layer GCL, it is found that its effectiveness decreases instead, further demonstrating that the proposed SRGCL effectively improves the GCN over smoothing problem on the rumor detection task while improving the GNN in the rumor detection. The proposed SRGCL is further shown to improve the over smoothing problem of GCNs for rumor detection tasks while increasing the depth limit of GNNs in the rumor detection domain. Further, by comparing 4-layer DeepGCN with 4-layer RDGCN and 6-layer DeepGCN with 6-layer RDGCN, it is further verified that the regionalized propagation model will be beneficial for learning the propagation information of rumors.
6.2 Model Analysis
In this section, further experiments evaluate the critical experimental effects of RDGCN.
6.2.1 Effect of Layers
In this section, different depths of SRGCL in RDGCN were explored, and SRGCL was implemented for models with 2, 4, 6, 7, and 11 layers, with other parameters kept the same as in Section 5.3. The experimental results are presented in Figure 3. The proposed method shows an upward trend in both datasets. The effect gradually increases from 2 to 6 layers of SRGCL, further illustrating the feasibility of the proposed SRGCL and improving the bottleneck of the original model depth. When the number of layers exceeds 6, the model effect processes a downward trend. The authors believe that this phenomenon may be due to the sparsity of the propagation graph and the limited learnability of the propagation structure information, which leads to the deepening of the network to learn a large amount of noise instead of covering the original propagation information features.
6.3 Effect of Weight on Unsupervised Loss
This section discusses the impact of RDGCN on different unsupervised loss weights. Similarly, experiments are conducted on different unsupervised loss weights from 0 to 0.9 in both datasets, and the other model parameters remain the same as in Section 5.3. The experimental results are shown in Figure 4. With different regionalized unsupervised loss weights, it can be determined that the best performance is reached at 0.3 and 0.4 in Twitter15 and Twitter16, respectively. In contrast, when the weight is 0.0, which means that the learning of the propagation pattern of the region is neglected, this is also the case when the model obtains the worst performance, regardless of whether it is Twitter15 or Twitter16. These results also demonstrate the effectiveness of region propagation pattern learning in RDGCN.
6.4 Effect of
In this section, the effect of RDGCN is explored for a given number of different regions, . Similarly, both datasets are experimented on for different values of from 2 to 50, keeping the other model parameters the same as in Section 5.3. The experimental results are shown in Figure 5. It can be seen that the performance is optimized when is 15 in Twitter15, and similarly, the performance is optimized when is 20 in Twitter16. The effect is better than that when is 2, and it can be seen that the performance gradually improves when the optimal value is not reached. It also further shows the rationality of the regional propagation model for the rumor detection task.
6.5 Early Rumor Detection
The early detection of rumors means detecting rumors before they spread widely on social media, allowing people to take appropriate action earlier. This is important for a real-time rumor detection system. The detection deadline and number of retweets posted by self-sourced tweets are controlled to evaluate early rumor detection performance. The earlier the detection deadline, or the lower the number of tweets, the less information is available for dissemination.
Figure 6 shows the performance of early detection. It can be seen that the performance of all the models increases as the detection time or number increases. The proposed RDGCNs can achieve accuracy scores exceeding those of other comparison experiments, further proving that RDGCN can better perform with limited propagation information and confirming the model’s effectiveness. Overall, the proposed model is better for long-term rumor detection and improves the performance of early rumor detection.
7 Conclusion and Future Work
This paper explores rumor propagation patterns for rumor detection, starting from regionalized propagation patterns. Specifically, we propose region-enhanced deep graph convolutional networks (RDGCN) to enhance the learning ability of rumor propagation knowledge through regional propagation pattern learning. It was then combined with unsupervised learning to supervise the optimization of regional propagation pattern features. In addition, we designed a source-enhanced residual graph convolution layer (SRGCL) to improve the over smoothing of the original GCN and improve the bottleneck of the depth of the GNN-based rumor detection model. Extensive experiments on two real benchmark datasets demonstrate the effectiveness of learning about rumor propagation. RDGCN significantly outperforms the baseline model for rumor detection and early rumor detection tasks.
Funding from 2022 Postgraduate Research Capability Improvement Program are gratefully acknowledged.
Conflict of Interests: There is no interest dispute with this article
Data availability: The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Bian T, Xiao X, Xu T, et al (2020) Rumor detection on social media with bi-directional graph convolutional networks. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, pp 549–556, URLhttps://ojs.aaai.org/index.php/AAAI/article/view/5393
Li G, Müller M, Thabet AK, et al (2019) Deepgcns: Can gcns go as deep as cnns? In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, pp 9266–9275,https://doi.org/10.1109/ICCV.2019.00936, URL https://doi.org/10.1109/ICCV.2019.00936
Lin H, Ma J, Cheng M, et al (2021) Rumor detection on twitter with claim-guided hierarchical graph attention networks. In: Moens M, Huang X, Specia L, et al (eds) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. Association for Computational Linguistics, pp 10,035–10,047,https://doi.org/10.18653/v1/2021.emnlp-main.786, URL https://doi.org/10.18653/v1/2021.emnlp-main.786