Social networking services (SNSs), such as Facebook and Twitter, provide many people with instant and convenient access to news. However, SNSs constitute an effective platform for obtaining and sharing news that are not carefully fact-checked and may include false or uncertain information, called “fake news.”  define “fake news” as “a news article or message published and propagated through media, carrying false information regardless the means and motives behind it.” In our paper, the same definition is used.
The wide spread of fake news cannot only harm social media platforms but society in general. For example, during the US 2016 presidential election, fake news favoring different candidate were shared more than 37 million times on SNSs and strongly affected the election results [2, 1]. Consequently, the unprecedented growth of fake news reflects a strong need for detecting and mitigating fake news circulation . To confront these societal challenges, websites such as Snopes.com111https://www.snopes.com/ and PolitiFact.com222https://www.politifact.com/
track and debunk rumors and manually assess rumor credibility based on evidence. These fact-checking sites are expensive to operate legitimately and require a considerable amount of time to validate and publish the credibility of a rumor. Contrary to fact-checking websites, existing work on fake news detection mainly applies machine-learning methods based on various characteristics of SNSs, e.g., text content, user characteristics  and propagation paths/trees .
In addition to existing features, it is assumed that the temporal movements of SNS posts are also useful for detecting fake news . Recent research  showed that social bots influence the initial spread of fake news. Time series of posts referring to fake news exhibit different movement from those of real news. Nevertheless, few studies have considered the amount of attention fake news attract over time.
This study proposes a fake news detection model that takes advantage of the attention to news changing over time, i.e., the temporal features. The attention is calculated using a self-exciting point process from the post publication time and the likelihood of people reading the post (determined by the number followers). In this study, we designate the attention to the news as an “infectiousness value” because it can be measured based on the probability of re-share of the information by each new user. The infectiousness value can be regarded as an index of the public interest in the news and, for real news, it normally decreases over time. Conversely, our underlying assumption is that the infectiousness value of fake news upsurges twice: the first upsurge results from the original news (including the false information), and the second results from news items for which people doubt or correct the false information.
The infectiousness value of the information is more robust than that of existing features, which depend on fake news propagators. For example, text features of early users can be easily manipulated by providing fake comments for diffusion. User features and user-article relationship are being transformed by the regulation of platforms and account suspension. Propagation paths/trees are difficult to manipulate but it is expensive to obtain them. Infectiousness values are also difficult to manipulate because the values are calculated from a series of posts, not by early movement. Furthermore, the number of followers and post publication time, which are used for calculating the infectiousness values, can be easily obtained.
The proposed fake news detection model leverages three features: combing existing features, texts, and users with an Attention-based mechanism and implementing the infectiousness value. As preliminary research, we investigate whether temporal features can distinguish real news from fake news to validate their effectiveness. Then, experiments are carried out to demonstrate that each module, such as the temporal features, is useful for detecting fake news.
The contributions of this study are as follows. (1) We elucidate the differences of infectiousness values associated with real and fake news and consider the differences for fake news detection using a point process. (2) We propose a new multi-modal method that combines text and user features with infectiousness values. (3) We show the effectiveness of the proposed model for fake news detection on SNSs through experimental procedures.
. Recent studies used deep learning models to capture temporal–linguistic features.
used recurrent neural networks (RNNs), which capture temporal–linguistic features from a bag-of-words of user posts.
used recursive neural networks based on the texts of a reply tree. Further examples include convolutional neural networks, hierarchical attention networks , and neural-network models using discourse-level structures .
Moreover, several methods were examined for detecting fake news using the characteristics of users who post the information. In fact, [3, 27, 12] used various models based on user characteristics, such as the number of followers, number of friends, and registered age. Recently, the relationship between news articles and users is used to determine news credibility assuming that if a strong relation exists between two articles as determined by the number of users who re-shared them, the two articles are likely to share the same label .
Other studies employ detection methods based on propagation paths/trees or networks of posts on SNS. 
Multi-modal approaches combine features of different types to detect fake news. For example,  combined texts and user behavior, while  combined texts and visual features extracted from SNS posts. Our model effectively combines text and user features using contextual inter-modal attention  to catch the relationship between a user and a post content.
In a method of fake news detection using temporal features similar to the proposed method,  demonstrated the importance of using post temporal information for rumor stance classification.  used SpikeM  to mathematically capture the time series behavior of information for long-term rumor detection, in addition to using other features (e.g., linguistic, user, network). In this study, we demonstrate that temporal features are also useful for short-term fake news detection. The proposed multi-modal framework utilizes linguistic, user, and temporal features, which are easy to obtain, to capture the characteristics of fake news.
We validated the contribution of temporal features in SNS posts to judge whether the news are fake or real (not fake). Using Twitter API 333https://developer.twitter.com, we obtained real and fake news items published in 2019 in the U.S. and Japan. Additionally, for the U.S. news, we collected posts about fake news from the URLs and keywords, as extracted from Snopes.com and PolitiFact.com articles. Because Japan has no major fast-checking websites, for the Japan news, we collected posts about fake news from major media, public organizations, and companies denied in Japan. We also collected real news from the URLs of news articles by major media.
Figure 1 presents the time series of the two fake news examples in the U.S. (a) and Japan (b). Each news item has three time series. The upper one indicates the number of tweets on each hour and the middle one indicates the infectiousness values calculated using the self-exciting point process described in the “Fake News Detection Model” section, which represents the probability of re-share. It is thought that the time series of the number of tweets about real news shows a large upsurge in a few hours but decays quickly over time . Contrarily, the time series of the number of tweets about fake news (see upper panels) shows a second upsurge after approximately a day following the unstable behavior in the infectiousness value of fake news. These behaviors are observed in other fake news and other countries. An earlier study  indicated that the time series of rumors have multiple upsurges during long-term observation periods (56 days), unlike those of non-rumors. In contrast, our results demonstrate that time series of fake news have multiple upsurges in short-term observations (4 days), unlike those of real news. The time series graphs and a description of the collected news are presented in the URL 444https://docs.google.com/document/d/193Xv0AqmHB1F-UuaRuXpZOeMtfjNMnNrmTBUTjkoFIw/edit?usp=sharing.
We assumed that the multiple upsurges in the time series of fake news are caused by the attention received by posts questioning or denying the news. To test this hypothesis, we examined whether the second upsurge coincides with the increase of posts expressing doubt and denial. As shown in the bottom panels of Figure 1, posts expressing doubt and denial appeared multiple times after the first upsurge within 48 hours. For example, the result of (a) indicates few explicit words such as “fake,” but the question mark which represents doubt appeared many times in the same timing as the second upsurge around 20 hours. The result of (b) shows that explicit words indicating news as fake/false, appeared around 22 hours. These results support our assumption and are mostly in agreement with a previous study , which indicated a characteristic time lag between fake news and fact-checking. Additionally, we have inferred that the multiple upsurges related to fake news are caused by renewed public interest because the meaning of news changes after questioning or denial (Fig. 1). The differences between the time series of fake and real news suggest that temporal features, which are more difficult to manipulate than others, can be useful for detecting fake news.
Fake News Detection Model
Although temporal features are useful, fake news detection using temporal features alone cannot achieve sufficient performance. Consequently, we propose a novel multi-modal method to detect fake news from many SNS posts. The proposed model effectively combines linguistic and user features using an Attention module and then implements temporal features. The overall model architecture is presented in Figure 2.
|post of news story|
|linguistic feature of post|
|user feature of post|
|temporal features of news story|
|infectiousness values at each point|
|-dimensional post embedding of post|
|,||hidden state of the post|
|through GRU and FC in each module|
|each hidden states through MaxPooling|
|z||the final output representing class probability|
|each module output consisting of a sequence |
|Number of sequence lengths of each module|
|Number of dimensions about hidden states|
|in each module|
The task of fake news detection is the prediction of the news label (real or fake), given the SNS posts related to the news. Let be a news story consisting of posts; . Each post consists of a linguistic feature and a user feature . The temporal features of a news story are represented as . Additionally, each news story is associated with a label
, which has categorical variables. We aim to learn a fake news detection function that maximizes the prediction accuracy.
The model comprises various components. The linguistic, user, and temporal modules convert inputs to latent features. The contextual inter-modal attention module combines the latent features generated by the linguistic and user modules with attention. Finally, the classification module outputs the prediction. Table 1 represents the major notations.
We first converted the raw text of each post to the linguistic feature for model interpretation. Then, we used the tf-idf values of the vocabulary terms of each post. We used the top- vocabularies according to their tf-idf values. Therefore, for each post, we extracted the linguistic feature , which is a K-dimensional vector. The linguistic feature
created from the post corresponds to sparse high-dimensional data. Therefore, we convert the vectorinto a low-dimensional representation. Instead of using pre-trained vectors based on external collections, we learn the embedding matrix through our model; , where denotes the -dimensional post embedding vector of .
From each post consisting of a sequence of embedded posts
, we extract the latent linguistic features to use gated recurrent units (GRUs). Actually, GRUs based on an RNN can capture long-term dependencies to learn the temporal–linguistic features of early posts on SNS. A GRU takes and as input and produces as output. The respective formulas are described below:
where and represent the reset and update gate at time , respectively. Furthermore, , are parameters used for the respective gates. denotes the output dimension of the GRU. Then, the hidden state of the GRU is applied by the fully connected (FC) layer, resulting in , as shown below:
We used eight common characteristics extracted from SNS user profiles as the user features; the length of user description, length of user name, number of followers, number of follows, number of posts, registration age, and whether verified mark and geo information are attached to the account. These are similar to . The eight common features for a post are represented by . As with the linguistic features, we use GRUs to capture long-term dependencies and FC layers for the user features, as shown below:
In the previous section, we described the differences between the appearance time of posts about real and fake news. To capture the potential components of this behavior, we convert the time series of posts to infectiousness values, which represent the re-share probability and drop as the news gets stale, via a self-exciting point process model (designated as SEISMIC) . SEISMIC, based on the Hawkes process , calculates the infectiousness value at time using the number of posts until time and the intensity . is the input of the GRUs in the temporal module.
where represents the number of people accessing the news (number of followers). Additionally,
denotes the memory kernel, which quantifies the delay between the arrival and re-share of a post by a user. These parameters are estimated by: is 5 min, is , and . This process is designated as self-exciting because each previous observation contributes to the intensity .
The estimation of the temporal variance ofrelies on a sequence of one-sided kernels , which up-weights the most recent posts and down-weights older posts. These one-sided kernels keep the estimator close to the ever-changing real values.
Eqs. (5) and (6) are used to calculate the infectiousness values from the publication time and number of followers of each post up to time . As described herein, is the input of the function to convert to the infectiousness values, where represents the time elapsed from the first post. Then, is converted to the infectiousness values at each point, e.g., every hour. As with the linguistic and user features, we utilize the GRUs and FC layers for the temporal features that are converted from every post information , as explained below:
Contextual Inter-model Attention
Each post comprises linguistic and user features, which often have mutual interdependence. However, GRUs are unable to capture characteristics of their interdependence. Therefore, we used a pairwise contextual inter-modal attention mechanism (designated as CIM) , using the latent representations generated by the GRUs.
We compute the attention between the output of the linguistic features and that of user features to leverage the contextual information related to each post to detect fake news, where and . First, a pair of matching matrices , are computed as .
Furthermore, we obtained the probability distribution scoresover the respective matching matrices and to compute the attention weights on contextual posts using a softmax function. Then, we computed the modality-wise attentive representations.
Finally, we computed the element-wise matrix multiplication for the attention to the important components. Then, we concatenated the calculation values and to obtain the attention representations between and .
After obtaining the features through the modules, we applied them to MaxPooling and concatenated each feature into a single vector ,
where indicates hidden states through MaxPooling, i.e., .
For predicting the class label for each news item, we used FC layers with an activation function, such asthat consists of two layers, to identify the complex relations between the respective features. The final output represents the probability distribution over the set of classes through the softmax function.
To experimentally evaluate our model, we used three publicly available datasets: Weibo released by , and Twitter15 and Twitter16 released by . Each dataset of posts related to fake news was collected from the most popular social media platforms, i.e., Weibo 555https://www.weibo.com in China and Twitter 666https://twitter.com in the U.S. The Weibo dataset is annotated with one of two class labels: “true” or “fake.” The Twitter datasets are annotated with one of four class labels: “true,” “fake,” “unverified” or “debunking of fake.” Table 2 presents a summary of the datasets. It should be noted that the dataset size is smaller at the time of release because some SNS stories and posts cannot be acquired owing to changes in disclosure statements and post deletion.
For the experiments, we divided each dataset into training, validation, and test sets. Each dataset was split following a ratio of 3:1 for acquiring the training and test sets, respectively. A 15% of the training set was held for the validation set.
|No. of true news||2351||371||204|
|No. of fake news||2313||363||205|
|No. of unverified news||-||373||205|
|No. of debunking||-||372||199|
|No. of training posts||2973||942||517|
|No. of validation posts||525||167||97|
|No. of test posts||1166||370||204|
We made comparisons between the proposed model and the following existing baseline methods of fake news detection.
SVM-TS : A linear SVM classifier that uses time-series to model the variation of social context features. This model also uses diffusion-based features, such as the average number of re-shares, in addition to linguistic and user features.
CSI : CSI is a hybrid deep-learning model that uses information from user texts, responses, and behaviors. This model calculates the source characteristic based on the user behavior, and classifies an article as fake or not.
GRU-2 : GRU-2 is equipped with two GRU hidden layers and an embedding layer following the input layer for learning rumor representations by modeling the sequential structure of relevant posts over time.
PPC : PPC is a time series classifier that incorporates both recurrent and convolutional networks, which respectively capture user characteristics along the propagation path.
Proposed (w/o CIM): This is the proposed model without the contextual inter-modal attention module used for validating the effectiveness of CIM.
Proposed (w/o time): This model comprises two features for learning; i.e., it uses linguistic and user features for validating the effectiveness of the temporal features.
Proposed (freq): This model replaces the infectiousness values with the number of posts during each period for validating the effectiveness of the infectiousness values.
Our model has been trained to minimize the binary/categorical loss function while predicting the class label of each news item in the training set. During training, all model parameters were updated using gradient-based methods following the AdaDelta update rule. Additionally, Dropout, for which the value was set to 0.5, was applied on hidden layers, and
to avoid overfitting. The number of training epochs was set to 500. Early stopping was applied as the validation loss saturated for 10 epochs.
The network structure and hyper-parameters were set based on the validation set and on previous studies [15, 12]. We set 5,000 vocabularies as top- based on the tf-idf values as input to the linguistic module. These tf-idf values were converted to embedding vectors with a dimension of 100. was set to eight, as described in the user module in the “Fake News Detection Model” section. The sequence lengths of the GRUs for the linguistic and user features, and , were chosen as above 30 in the Weibo dataset and above 40 in the Twitter15 and Twitter16 datasets, based on the results of a previous study . Namely, we used the first 30 or 40 posts in a story time sorted in ascending order as the input of and .
In the case study, most time series of the number of fake news posts showed a second upsurge after approximately one day after post publication. Therefore, we set the infectiousness values on the first two days with a length of 47 as the input of the GRUs for the temporal features . These 47 infectiousness values were calculated using all data from the point publication time up to at each hourly point; i.e., is calculated by all posts up to 3 hours elapsed from the post publication.
The output size of each GRU (, , and ) is selected from (16, 32, 64, and 128) and the hidden dimension of the output FC layer is selected from (, and ) in the validation period, where is the size of , equal to .
We used the accuracy and F1-measure as metrics to evaluate the model capabilities. Classification tasks, such as fake news detection, are commonly evaluated by the accuracy while F1-measure works complementary to address class imbalance. We used the accuracy over all categories and the F1-measure for each class to evaluate the model performance.
Results and Discussion
|Proposed (w/o CIM)||0.920||0.922||0.917||0.814||0.807||0.813||0.870||0.745||0.791||0.850||0.782||0.747||0.791|
|Proposed (w/o time)||0.912||0.913||0.910||0.814||0.857||0.806||0.868||0.677||0.791||0.864||0.829||0.717||0.776|
The experimental results are presented in Table 3 and indicate that the proposed model outperforms most baseline methods, confirming the benefits of the multi-modal method and temporal features. The baseline SVM-TS, based on hand-crafted features, was a better model because it combined various features, including linguistic, user, and temporal features. Contrarily, CSI achieved low accuracy. The model calculates the user relation score from the training data and then detects fake news from the test data by using the scores of users who appear in both training and test data. Because few users appeared in both the training and test datasets in our experiments, CSI performed poorly. Most deep learning-based models, such as the Proposed model, GRU-2, and PPC, outperformed feature engineering-based models, such as SVM-TS
. Deep neural networks helped to learn better hidden representations of people’s responses to the news on SNS for fake news detection. The results show thatGRU-2 and PPC, which used linguistic and user features, respectively, to capture complex hidden features indicative of the corresponding responses, achieved a high accuracy and high F1-measure.
To validate the effectiveness of each module, we also conducted experiments using models that excluded CIM and the temporal features of the proposed model. Compared to Proposed (w/o CIM), Proposed model achieved a higher accuracy and F1-measure on all datasets, except for the unverified label data. This result demonstrates that it was insufficient to learn the hidden representations of the user and linguistic features differently. Moreover, inter-dependencies between the linguistic and user features were useful to detect whether a news item was fake or not because posts consist of both features. Compared to Proposed (w/o time), Proposed model achieved higher scores, except for the unverified label data in Twitter15. In a previous study , the time series of rumors is useful to detect rumors in long-term observation periods (56 days). However, these results support our claims that temporal features can be useful for short-term fake news detection (2 days). Proposed (freq) replaced the infectiousness values with the number of posts in each period for validating the conversion to the infectiousness values. Its accuracy was slightly higher than that of Proposed (w/o time) for the Weibo and Twitter16 datasets when the number of posts was added. Simultaneously, the degree of increased accuracy was not significantly higher than that of the Proposed model. This result shows that conversion to infectiousness values is useful to catch latent information from the temporal features for the fake news detection.
Proposed model overall performed the best for most measures and datasets, demonstrating the effectiveness of our model compared to baseline methods. Specifically, our model achieved the highest accuracy for the Weibo test subset (0.937), the Twitter15 test subset (0.831), and the Twitter16 test subset (0.819). Additionally, our model achieved the highest performance in terms of the F1 score on the True, Fake, and Debunking news data labels. However, similarly to the compared methods, our model did not produce good results for classifying unverified labels. Presumably, effectively classifying ambiguous labels, such as unverified, is challenging even when implementing the temporal features.
Finally, we evaluated the details of the contributions of the temporal features. To examine the contributions, we compared the proposed models with varying time frames to obtain the temporal features from zero (w/o time) over six days (see Figure 3). The accuracy of the proposed model improves gradually as the time frame lengthens. However, the proposed model performance remains more or less unchanged for a time frame over three days. Specifically, the model accounting for five days of the Weibo dataset achieved an accuracy of 0.939. When accounting for 4 days of the Twitter15 and Twitter16 datasets, our model achieved an accuracy of 0.867 and 0.830, respectively. Although we set the time frame to the first 2 days in the experimental settings, the results show that time periods of approximately 4 or 5 days would be more appropriate for obtaining the temporal features for fake news detection.
Although our model demonstrates that incorporating temporal features, which are difficult to manipulate, in fake news detection models is useful, limitations also exist; it is difficult to detect “early” fake news. The comparative method PPC claims to achieve fake news detection within an hour. However, it is difficult to accurately estimate the infectiousness values of the information within an hour, so our model is not suitable for detecting early fake news. Therefore, our results suggest the use of different models depending on the circumstances; models without temporal features are better for early detection, while the proposed model with temporal features are better for robust and high-precision detection.
We conclude this paper by highlighting the key points of our study: (1) We ascertained the differences in time series behaviors between real and fake news from short-term observations. (2) We proposed a novel multi-modal method for fake news detection, combining text and user features and infectiousness values. (3) The experimental results empirically showed the effectiveness of the proposed model for the fake news detection problems. However, it remains unclear whether the temporal features are useful in ambiguously labeled data (e.g., debunking label). Future studies must examine how temporal features can be used flexibly effectively classifying ambiguous data labels.
-  (2019) Influence of fake news in twitter during the 2016 us presidential election. Nature communications 10 (1), pp. 7. Cited by: Introduction.
-  (2019) What happened? the spread of fake news publisher content during the 2016 us presidential election. In The World Wide Web Conference, pp. 139–150. Cited by: Introduction.
-  (2011) Information credibility on twitter. In Proc. of the WWW, pp. 675–684. Cited by: Related Work, Related Work.
On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259. Cited by: Linguistic module.
-  (2017) SemEval-2017 task 8: RumourEval: determining rumour veracity and support for rumours. In Proc. of the SemEval-2017, pp. 69–76. Cited by: Related Work.
Contextual inter-modal attention for multi-modal sentiment analysis. In Proc. of the EMNLP, pp. 3454–3466. Cited by: Related Work, Contextual Inter-model Attention.
-  (1971) Spectra of some self-exciting and mutually exciting point processes. Biometrika 58 (1), pp. 83–90. Cited by: Temporal module.
-  (2020) Hierarchical propagation networks for fake news detection: investigation and exploitation. In Proc. of the ICWSM, Cited by: Introduction.
-  (2019) Learning hierarchical discourse-level structure for fake news detection. In Proc. of the NAACL, pp. 3432–3442. Cited by: Related Work.
-  (2017) Rumor detection over varying time windows. PLOS ONE 12 (1), pp. 1–19. Cited by: Related Work, Preliminary Research, Results and Discussion.
-  (2018) The science of fake news. Science 359 (6380), pp. 1094–1096. Cited by: Introduction.
Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks.
AAAI Conference on Artificial Intelligence, pp. 354–361. Cited by: Introduction, Related Work, User module, 4th item, Experimental Settings.
-  (2016) Hawkes processes for continuous time sequence classification: an application to rumour stance classification in twitter. In Proc. of the ACL, Vol. 2, pp. 393–398. Cited by: Related Work.
-  (2019) Sentence-level evidence embedding for claim verification with hierarchical attention networks. In Proc. of the NAACL, pp. 1391–1400. Cited by: Related Work.
-  (2016) Detecting rumors from microblogs with recurrent neural networks. In Proc. of the IJCAI, pp. 3818–3824. Cited by: Introduction, Related Work, 3rd item, Datasets, Experimental Settings.
-  (2015) Detect rumors using time series of social context information on microblogging websites. In Proc. of the CIKM, pp. 1751–1754. Cited by: 1st item.
-  (2017) Detect rumors in microblog posts using propagation structure via kernel learning. In Proc. of the ACL, pp. 708–717. Cited by: Related Work, Datasets.
-  (2018) Rumor detection on twitter with tree-structured recursive neural networks. In Proc. of the ACL, pp. 1980–1989. Cited by: Related Work.
-  (2012) Rise and fall patterns of information diffusion: model and implications. In Proc. of the ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 6–14. Cited by: Related Work.
-  (2019) Fake news detection using deep Markov random fields. In Proc. of the NAACL, pp. 1391–1400. Cited by: Related Work.
-  (2017) Csi: a hybrid deep model for fake news detection. In Proc. of the CIKM, pp. 797–806. Cited by: Related Work, 2nd item.
-  (2016) Hoaxy: a platform for tracking online misinformation. In Proc. of the WWW, pp. 745–750. Cited by: Preliminary Research.
-  (2018) The spread of low-credibility content by social bots. Nature communications 9 (1), pp. 4787. Cited by: Introduction.
-  (2019) Combating fake news: a survey on identification and mitigation techniques. ACM TIST 10 (3), pp. 21. Cited by: Introduction.
-  (2018) Eann: event adversarial neural networks for multi-modal fake news detection. In Proc. of the ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 849–857. Cited by: Related Work.
-  (2015-05) False rumors detection on sina weibo by propagation structures. Proc. of the ICDM 2015, pp. 651–662. Cited by: Introduction.
-  (2012) Automatic detection of rumor on sina weibo. In Proc. of the ACM SIGKDD Workshop on Mining Data Semantics, MDS ’12, pp. 13:1–13:7. Cited by: Related Work.
-  (2017) A convolutional approach for misinformation identification. In Proc. of the IJCAI, pp. 3901–3907. Cited by: Related Work.
-  (2015) Seismic: a self-exciting point process model for predicting tweet popularity. In Proc. of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1513–1522. Cited by: Preliminary Research, Temporal module.