QuickStop: A Markov Optimal Stopping Approach for Quickest Misinformation Detection

03/04/2019 ∙ by Honghao Wei, et al. ∙ 0

This paper combines data-driven and model-driven methods for real-time misinformation detection. Our algorithm, named QuickStop, is an optimal stopping algorithm based on a probabilistic information spreading model obtained from labeled data. The algorithm consists of an offline machine learning algorithm for learning the probabilistic information spreading model and an online optimal stopping algorithm to detect misinformation. The online detection algorithm has both low computational and memory complexities. Our numerical evaluations with a real-world dataset show that QuickStop outperforms existing misinformation detection algorithms in terms of both accuracy and detection time (number of observations needed for detection). Our evaluations with synthetic data further show that QuickStop is robust to (offline) learning errors.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The proliferation of misinformation(lazer2018science, ) (colloquially known as “fake news”) on online social networks has become one of the greatest threats to our national security, has eroded the public trust in news media, and is an imminent threat to the ecosystem of online social platforms like Facebook, Twitter and Sina Weibo. For example, in 2013, a fake tweet claiming that the then US President Barack Obama was injured by explosives from a hacked Twitter account of the Associated Press caused a 150-point drop of the Dow Jones in just two minutes111https://www.washingtonpost.com/news/worldviews/wp/2013/04/23/syrian-hackers-claim-ap-hack-that-tipped-stock-market-by-136-billion-is-it-terrorism/; and fake news in the 2016 US Presidential Election has led to increased political and social polarization and posed a great threat to democracy. Social media companies, such as Facebook and Twitter, are now taking multiple countermeasures to combat misinformation as the proliferation of misinformation is driving users away from these platforms.

Despite the enormous attention it receives and the tremendous efforts from both public and private institutions to counter it, misinformation detection remains a daunting task as of today. Online platforms and news organizations have experimented different methods. Facebook launched its fact-checking project in Spring 2018 to work with third-party publishers to validate facts and accuracy of news articles.222https://www.facebook.com/help/1952307158131536?helpref=faq_content The New York Times has recently published a tip form so that its readers can report misinformation and fake news.333https://www.nytimes.com/2018/09/17/technology/disinformation-tipsheet.html The third-party fact-checking method is often very effective for detecting whether a specific news article is fake or not, but clearly is not a scalable solution and cannot cover even a tiny fraction of news articles and tweets (there are about 500 million tweets per day on Twitter). The crowdsourcing approach used by New York Times is more scalable, but the reports are not always trustworthy because anyone can send a tip. In light of these challenges, machine-learning and data-mining approaches have emerged to tackle misinformation detection in a systematic way (see (shu2017fake, ) for a comprehensive review). It has been shown in (castillo2011information, )

that the features extracted from the content of a news article, the features of the users who spread the news, and the connections of these users can be effectively utilized for misinformation detection. These are exciting discoveries and progresses because “machine-based” methods are much more scalable than “human-based” methods, and can handle a vast number of news articles in a short period of time.

While machine-learning approaches address the scalability issue, another important aspect of misinformation detection, speed or sample complexity (the amount of time or the number of observations needed to detect misinformation), has yet to be tackled. Speed is important because the disruptive nature of misinformation, which often causes significant damages in a very short period of time. For example, it only took less than two minutes to tip the Dow Jones by 150 points with one single fake tweet. Therefore, it is imperative to detect misinformation at the earliest time so that a proper countermeasure can be taken to suppress it. A fact-checking approach may take a few hours because the fact checker needs to gather facts and evidence to validate or invalidate a news article. Therefore, the speed aspect of misinformation detection, which is equally important as accuracy and scalability, is equally important in the design of misinformation detection algorithms.

Motivated by the discussions above, this paper focuses on quickest detection of misinformation. The goal is to develop an algorithm that addresses the three important considerations in misinformation detection: scalability, accuracy and speed. Note that existing machine-learning-based approaches have demonstrated a strong correlation between user features and the spreading models under different types of information (real or fake). We will demonstrate this strong correlation in Section 2

using the Sina Weibo dataset. The signal of a single retweet is often very weak and usually not sufficient for classifying the news article with a reasonable accuracy. But this accuracy can be improved with more and more weak signals. This paper views the problem of misinformation detection as a

sequential hypothesis testing problem. As the platform receives a sequence of weak signals in real time, it determines whether it has collected enough signals to declare the type of the news (real or fake). The more signals collected, the more accurate the detection result will be, but then we are at risk from letting the misinformation spread. Enlightened by these observations, we propose QuickStop, which is a scalable algorithm that performs accurate, quick detection of misinformation. QuickStop combines a data-driven approach with a model-driven approach in the following way.

  • [leftmargin=*]

  • Data-based probabilistic modeling: Since each retweet is a weak signal for the hypothesis testing (whether the news article is real or fake), extracting the statistics of these weak signals is important for establishing an effective probabilistic model for hypothesis testing. QuickStop

    first uses an SVM (Support Vector Machine) algorithm to extract an edge-based probabilistic information spreading model. Section

    2 explains the rationale behind the edge-based model (compared with a node-based model) and shows the effectiveness using the Sina Weibo dataset.

  • Model-based quickest detection: After establishing the probabilistic model, we formulate the quickest misinformation detection problem as an optimal stopping problem. Specifically, we propose a cost model that includes both the cost due to detection error and the cost due to the propagation of misinformation. Note that the propagation cost occurs only in the case of misinformation. With this formulation, the goal is to discover a stopping policy, i.e., a policy that determines when to stop collecting observations and what type to declare after stopping, that minimizes the overall cost. As more observations are collected, the error cost decreases but the propagation cost could increase in the case of misinformation. Therefore, the optimal stopping policy needs to balance the detection accuracy and detection time so that misinformation can be detected confidently at the earliest possible time.

The main contributions of this paper are summarized below.

  • [leftmargin=*]

  • Problem Formulation: We formulate the quickest misinformation detection problem as an optimal stopping problem based on a probabilistic information spreading model and prove that the problem is a Markov optimal stopping time problem. This probabilistic model can be extracted from training datasets and by given classifiers. An important feature of our formulation is that the asymmetric cost functions between real news and misinformation — spreading misinformation causes far more damage than spreading real news so we need to act quickly only in the case of misinformation. This asymmetric nature of the problem distinguishes our formulation from the traditional sequential hypothesis testing problem, and the existing solving techniques, as a result, do not directly apply to our problem.

  • Algorithm and Analysis: We propose an algorithm named QuickStop that detects misinformation based on edge types, where an edge is a connection between two individuals along which a piece of information spreads from one individual to the other. QuickStop consists of two parts: (i) QuickStop

    -Training, an offline algorithm that classifies edges into four types and then calculates transition probabilities between different edge types, where the transition probabilities in the case of real news may be different from those for misinformation; and (ii)

    QuickStop-Detection, an online detection algorithm with low computational and memory complexities. We emphasize that the main computation load is in the offline part. Once the offline training is completed, the online part for detection is very efficient as described below. QuickStop-Detection maintains a scalar variable that describes the current state, and updates the state for each new observation. The update just follows a simple formula and its complexity does not depend on how many observations have been collected. Then the algorithm compares the state with several thresholds calculated offline. Based on the comparison result, it decides whether it will keep collecting observations or declare the type of the information. In the latter case, what type to declare is also determined by the comparison result. Therefore, QuickStop-Detection has very low computational and memory complexities, and is ideal for real-time large-scale misinformation detection.

  • Evaluations: We evaluated the performance of QuickStop using both a real-world social network dataset and synthetic data. The evaluations on the real-world dataset demonstrates the effectiveness of our algorithm in terms of both accuracy and speed compared with state-of-the-art real-time misinformation algorithms. Under QuickStop with a low propagation cost, it took 12 observations on average in the Weibo dataset to detect misinformation, but more to declare real news. This is consistent with the asymmetric cost model. Furthermore, the false negative rate (misinformation classified as real news) is much lower than the false positive rate (real news classified as misinformation), which is also desirable in practice. In contrast, the accuracy of the state-of-the-art early detection algorithms are still lower than ours even with more observations. From the evaluations on synthetic data, we further observed that QuickStop is robust to classification errors.

We finally comment that while several early misinformation detection algorithms have been developed (zhao2015enquiring, ; ma2015detect, ; ma2016detecting, ; ma2017detect, ; ma2018rumor, ; chen2018call, ), these algorithms either use a fixed number of observations as input (ma2018rumor, ; chen2018call, ) or observations over a fixed time period as input (ma2015detect, ; ma2016detecting, ; zhao2015enquiring, ; ma2017detect, ). Therefore, these early detection algorithms do not minimize the detection time (or the number of observations) in real time. Our approach, on the other hand, tackles the problem using the optimal stopping method and optimizes the number of observations needed in real-time for quickest detection. Our numerical evaluations show QuickStop achieves higher accuracy with fewer observations due to the dynamic nature of the algorithm. A detailed review of other related work is presented in Section 7.

2. Model and Problem Statement

We model an online social network as a graph where is the set of vertices representing users and is the set of directed edges representing the connections between users. Information (real news or misinformation) can spread from one user to another via the edge connecting them, e.g., a Twitter user can retweet a post from one of her/his followees. In this paper, we adopt the terminology of Twitter. Given a directed edge user is called the follower of user and user is called the followee of user Information can spread from user to user via this directed edge.

We assume two types of information that may spread in the network: real news articles (simply called news in the remainder of the paper) and misinformation. A user (say user ) decides whether to post (retweet) the information based on the following three factors: (i) the type of the information, (ii) the features of user denoted by and (iii) the set of user ’s neighbors who have posted (retweeted) the information before user

As information spreads in the network, the platform obtains sequential observations (weak signals) for misinformation detection. In this paper, a retweet is considered to be an observation, which is represented by the edge over which this retweet occurs. Specifically, we define the th observation to be where is the feature vector of the th user who retweets the information and is the feature vector of the followee from whom the th user retweets the information. We remark that when complete network and information diffusion information is known, the information spreading trace is likely a tree or a forest (with multiple information sources). However, in practice, it is often not the case because of missing information and partial observations as in the Weibo dataset. Therefore, the observations we have are a sequence of retweets which not necessarily form a tree. In particular, is not necessarily the same as in the trace. Now to model these retweets as weak signals, we considered the following two approaches.

  • User-based Model: In the user-based model, given the type of an article, the probability a user retweets the article depends on the features of the user. Intuitively, an honest user has a lower probability to retweet some misinformation than a malicious user (e.g. a bot). The user-based model is to classify the users based on the user features with a labelled training dataset.

  • Edge-based Model: In the edge-based model, we view each edge as a communication channel and classify edges into different groups. For example, misinformation is more likely to spread over an edge between two malicious socialbots than an edge between two honest users. The edge-based model is to classify the edges based on the edge features (the feature vectors of the two end users ) with a labelled training dataset.

Figure 1 presents the distributions of SVM classification scores of the user-based model and the edge-based model of the Weibo dataset released in (ma2016detecting, ), where -axis is the classification score of the SVM classifier, and -axis is the score distributions (frequencies). A user or an edge with a higher score is considered more likely to spread misinformation. From the figure, we can first observe that the scores of users (or edges) involved in spreading misinformation concentrate around one while the scores of users (or edges) involved in spreading news concentrate around zero. This demonstrates the strong correlation between article types and users (or edges) features. Furthermore, we can see that the score distributions of edges exhibit a stronger correlation with article types than the score distributions of users. For example, for misinformation, the score distribution of edges has a higher frequency around zero than that of users (60% versus 45%). Because of this observation, in this paper, we use the edge-based model.

Figure 1. Classification distribution

We assume the sequential observations form a Markov chain as shown in Figure

2, where we further assume the edge feature vector can be classified into four classes to simplify the model, where 0 is the type of edges that are most likely to be used for spreading news and 3 is the type of edges that are most likely to be used for spreading misinformation. Under this Markov chain model, besides the edge types, additional parameters to be learned are the transition probabilities, denoted by where indicates these are the transition probabilities when spreading news, and indicates these are the transition probabilities when spreading misinformation. When and are different, we can detect misinformation using sequential hypothesis testing.

Figure 2. A Markov chain model for sequential observations

Tables 1 and 2 show the empirical transition probability matrices under news and misinformation obtained from the Weibo dataset. We can clearly observe that the observations are Markovian instead of i.i.d., which justifies our edge-based Markovian model.

0 1 2 3
0 0.828 0.120 0.039 0.013
1 0.651 0.224 0.084 0.041
2 0.500 0.193 0.191 0.116
3 0.279 0.181 0.211 0.329
Table 1. Edge Transition Probability Matrix under News from the Weibo Dataset
0 1 2 3
0 0.163 0.167 0.249 0.421
1 0.105 0.194 0.239 0.462
2 0.080 0.119 0.277 0.524
3 0.052 0.088 0.203 0.657
Table 2. Edge Transition Probability Matrix under Misinformation from the Weibo Dataset

For the edge classifier, we leverage the existing research, in particular, the research in (kwon2013prominent, )

, where it shows that SVM performs the best among several popular machine-learning algorithms, including decision tree and random forest for classifying misinformation. We adopt SVM and the method used in

(castillo2011information, ) to obtain an edge classifier. The details can be found in Section 4. After classifying the edges in the training data, we further obtain transition probabilities from the training data to build a probabilistic information spreading model (details can be found in Section 3).

Our focus is on the quickest detection formulation after training the edge classifier and learning the transition probabilities In the next section, we will formulate the quickest misinformation detect problem and prove that the problem is a Markov optimal stopping time problem and its solution is a time-invariant threshold policy. Furthermore, the thresholds can be efficiently calculated offline based on the probabilistic model. The online algorithm is of constant computational and memory complexities, and is very easy to implement.

3. Optimal Stopping Approach for Quickest Misinformation Detection

Consider an online social network platform that is monitoring the spread of some information in the network. We say that an event occurs when a user retweets or posts the information. When the th event occurs, we obtain an observation by using the trained classifier to learn the edge type. Furthermore, we assume that we have learned the transition probabilities from training data.

With the model introduced above, the detection of misinformation can be formulated as a hypothesis testing problem with the following two hypotheses:

  • [leftmargin=*]

  • : The information is news. In this case, is a four-state Markov process with transition probabilities

  • : The information is misinformation. Then is a four-state Markov process with transition probabilities

Given observations the misinformation detection problem is to determine whether or is true. We assume that in terms of the prior distribution, hypothesis occurs with probability and occurs with probability We assume the first observation

is uniformly distributed over

regardless of the hypothesis.

Now define

so

According to the Bayes rule, we have

From the equation above, we have

which implies that

Therefore, we have the following iterative equation

(1)

for updating our belief on

It is not hard to check that forms a Markov chain. We will refer to this Markov chain as the underlying Markov chain and it will play a central role when we derive our algorithm for misinformation detection.

Note that given the observation sequence we can calculate in real time. The question is when to declare the type of the information. The more observations we have, the more accurate the decision would be but the more widely the information would have spread. Therefore, we need to balance the accuracy and the potential damage of spreading misinformation. Let denote the random time at which the type of information is declared, which is a function of ; i.e., is a stopping time with respect to . Let denote the type of information that is declared by a detection algorithm. We consider the following two types of costs in the misinformation detection problem.

Error Cost

The first type of cost comes from mis-detection. Let

denote the cost of type-I error (also called false positive, where news is declared as misinformation) and

denote the cost of type-II error (also called false negative, where misinformation is declared as news). The expected cost of mis-detection is

where

is the prior probability of

Propagation Cost

The other type of cost is the propagation cost. Information becomes more influential when more people share it. So we need to detect misinformation as quickly as we can to limit its potential damage, while spreading news does not occur any cost. Consequently, the propagation cost in our model is asymmetric and comes only from misinformation. In particular, we assume that there is a cost of associated with each time slot of propagation if the information is misinformation. Thus, at the stopping time , the propagation cost is

where is the indicator function which is equal to when is true and is equal to when is true.

A Markov Optimal Stopping Approach

The goal of the misinformation detection algorithm is to minimize the overall cost. Formally, we aim to find a stopping time and a decision rule both depending on that solve the following problem

(2)

The key to solving this problem is to properly handle the propagation cost term . Note that if this term were , this problem would be the same as the renowned sequential testing problem (see, e.g., (poor2009quickest, )). Specifically, the sequential testing problem solves

(3)

However, the fact that the term has dependence on the hypothesis distinguishes this problem from the sequential testing problem from the following two aspects.

1) The formulation (2) is a general optimal stopping problem without a Markovian representation with respect to the underlying Markov chain ; i.e., the cost that corresponds to stopping after the th event is not determined by since it also depends on whether or is true. It is highly desirable to convert the problem to a Markov optimal stopping problem since directly solving a general optimal stopping problem is very hard. To do this, we need to show that minimizing the overall cost is equivalent to minimizing a function of .

For the sequential testing problem, only the error cost term needs this conversion since the cost in the second term, , is already a function of the stopping time. In contrast, for our misinformation detection problem, the propagation cost term, , also needs a conversion due to the dependence on . We establish such a conversion in Theorem 3.1.

2) After we convert the misinformation detection problem to a Markov optimal stopping problem, the propagation cost term does not have a linear form as in the sequential testing problem. So the solving techniques of sequential testing do not directly apply to our problem. Then we potentially need to resort to a general Markov optimal stopping problem, the solution of which may not have an efficient-to-compute form. To overcome this difficulty, we utilize an essential observation that the process is a martingale with respect to . With this, we successfully derive the solution and show that the optimal policy is a time-invariant threshold policy which has a similar form as the solution of sequential testing problem. These results are presented in Theorem 3.2.

Theorem 3.1 ().

The optimal stopping problem (2) is equivalent to a Markov optimal stopping problem. Formally,

(4)

where is the set of stopping times with respect to .

Note that the variable in the Markov stopping problem (4) is just the stopping time instead of both and Therefore, we can find the optimal stopping policy in two steps: first find the optimal stopping time by solving (4), and then find the optimal decision rule based on . The proof of Theorem 3.1 can be found in Appendix A.

Clearly, the accuracy of the hypothesis testing increases with an increasing number of samples However, the cost of spreading possible misinformation also increases. We establish the optimal stopping policy in Theorem 3.2. Note that because of the term one would expect that the optimal stopping policy to be a function of both time and the state Interestingly, we will see that the optimal stopping policy is time-invariant; i.e., it only depends on the value of but not time. In other words, we have stopping rules with time-invariant thresholds. The proof of Theorem 3.2 is presented in Appendix B.

Theorem 3.2 ().

The optimal stopping time is

(5)

In other words, there exist positive values (), independent of such that the algorithm declares the information to be news when and and declares the information to be misinformation when and The thresholds and for are determined by solving the following equations:

(6)
(7)

where is the solution of the Bellman equation

(8)

and

Remark 1 ().

Note in (8) is understood as the expected cost to go starting from the next time step based on given the state in the current time step is and . In other words,

So it does not depend on .

4. QuickStop: The Quickest Misinformation Detection Algorithm

From the results presented in the previous sections, we propose QuickStop, which includes the following components.

  • [leftmargin=*]

  • Training data: Our algorithm first needs labeled training data. The dataset should include a set of information spreading traces which are labeled as news or misinformation. Each user involved in the information trace has a feature vector. The information should also include the followee from whom a user retweeted the information.

  • Learning the information spreading model via the SVM classifier: Given the labeled data, we first train an SVM classifier with the dataset that classifies information to news or misinformation. The input to the SVM classifier is the average feature vector of edges. Recall that the feature vector of edge is After training the SVM classifier, we use the classifier to classify the edges into four groups based on the edge feature vector. Note that SVM outputs an value between 0 to 1. In our experiments, we use the following mapping: and From the transition probabilities learned from the previous step, we calculate and according to Theorem 3.2.

  • Quickest detection: When monitoring information spreading, the algorithm updates according to (1) when an event occurs. The information is declared to be news when and misinformation when .

We remark that this algorithm combines a data-driven approach, which learns the underlying probabilistic model of information spreading in networks, and a model-driven approach, which identifies misinformation in a timely manner with the quickest detection formulation.

QuickStop consists of two parts: QuickStop-Training and QuickStop-Detection, whose pseudo-code can be found in Algorithms 1 and 2, respectively.

1:
2:A set of information traces: is a sequence of users: where is the posting order of a user, is the index of the news trace
3:A set of labels: is the label of (0: news, 1: misinformation).
4:For the th user who post the information (say user ), obtain feature vector of the edge:
5:Compute is the cardinality of news trace
6:Train edge classifier: using SVM with training dataset
7:Classify edges in the traces,
8:Calculate the transition probabilities
9:Initialize and specify the quantization step size and the convergence tolerance
10:for  do
11:     
12:end for
13:for  do Solve the Bellman equation using value iteration
14:     
15:     for  do
16:          where
17:         if  then
18:              break
19:         else
20:              
21:         end if
22:     end for
23:end for
24:
25: Compute thresholds
26:
27:Edge classifier:
28:Transition probabilities:
29:Thresholds
Algorithm 1 QuickStop-Training (Offline)
1:
2:Information trace: is the th user in the information trace
3:Edge classifier
4:Transition probabilities
5:Thresholds
6:Initialize is the prior of (misinformation) and is the parent node from whose retweets
7:while  do
8:     
9:     For each user obtain feature vector of edge:
10:     
11:      Compute
12:     
13:end while
14:
15:if  then
16:     
17:else if  then
18:     
19:end if
20:
21:stopping time: type of information:
Algorithm 2 QuickStop-Detection (Online)

4.1. Computational and Memory Complexities

In the training part, we use an SVM classifier on

information traces. In SVM, the feature space is obtained by using some mapping functions and the hyperplane is determined by a set of support vectors. Then the dimension of the feature space depends on the mapping function. The minimum computational complexity of training an SVM is

and may reach

The thresholds are calculated using the value iteration method. Let be the quantization step size of the state . During the value iteration, the terminal time depends on the quantization precision. The computational complexity for each iteration is the memory complexity is also This step is done offline.

For the online misinformation detection part, the computational complexity per iteration and memory complexity are both The algorithm needs to store 8 threshold values and 32 transition probabilities. Each update of the state only requires a few elementary operations.

5. Performance Evaluation with Real-World Datasets

We first evaluate the performance of QuickStop using the following real-world dataset.

The Weibo Dataset: Sina Weibo is a Chinese microblogging website similar to Twitter. The Weibo dataset we use is the one released in (ma2016detecting, ), which includes 4,664 labeled information traces from Sina’s community management center444https://service.account.weibo.com. The dataset also includes user information such as the number of followees, the number of followers, the registration days, etc, which are used as user features in our algorithm. We remove information traces whose sizes are small. In particular, we keep the traces in which the information was retweeted by the followers of at least 50 distinct users. We further balance the dataset by selecting 488 news traces and 488 misinformation traces. The average retweets per trace is 2,031, the largest trace includes 55,155 retweets, and the smallest one has 105 retweets.

We compared QuickStop with the following misinformation detection algorithms aiming at early detection: (i) decision-tree-based methods (castillo2011information, ), (ii) SVM-based methods with RBF kernel (yang2012automatic, ); and (iii) linear SVM-based models for time-series data (ma2015detect, ). Note that all three methods are feature-based classification algorithms, which can take both user features and news content features as input. QuickStop on the other hand only uses user features. In the comparison, for each algorithm, we implemented two versions: one with only user features (i.e. the same set of user features used in QuickStop) and the other with both user and content features (so more features than QuickStop). The six different algorithms are summarized below.

  • [leftmargin=*]

  • DTC: The Twitter information credibility method (castillo2011information, ) based on decision trees, with only user features.

  • DTC: The Twitter information credibility method (castillo2011information, ) based on decision trees, with both user and content features.

  • SVM-RBF: The SVM-based method with RBF kernel (yang2012automatic, ), with only user features.

  • SVM-RBF: The SVM-based method with RBF kernel(yang2012automatic, ), with both user and content features.

  • SVM-TS: The linear SVM-based (ma2015detect, ) method for time-series, with only user features.

  • SVM-TS: The linear SVM-based (ma2015detect, ) method for time-series, with both user and content features.

We note that except QuickStop, all other algorithms mentioned require a pre-determined number of observations as input. QuickStop is an optimal stopping algorithm so it decides the number of observations needed in real time.

Performance Metrics: We considered the following performance metrics.

  • [leftmargin=*]

  • Accuracy: the fraction of traces that are correctly identified.

  • False positive rate: the fraction of news classified as misinformation.

  • False negative rate: the fraction of misinformation classified as news.

  • Detection time of news: the average number of events required to declare news.

  • Detection time of misinformation: the average number of events required to declare misinformation.

5.1. Numerical Results

Evolution of under QuickStop: Figure 3 illustrates the evolution of on two traces chosen from the Weibo dataset: one misinformation trace and one news trace. We can see that the upper threshold becomes smaller and the lower threshold becomes larger when we increase the propagation cost from 0.1 to 0.8, and the algorithm stops earlier when than when . Also it takes fewer number of observations to declare misinformation than news. With it takes 15 observations to declare the misinformation and 23 observations to declare the news. Similar trends can be observed on most of the traces.

Figure 3. Examples of and stopping time under QuickStop

Figures 4 and 5 summarize the performances of QuickStop and the other six algorithms. In Figure 4, QuickStop uses parameters and and the -axis is the number of tweets used by the other six algorithms, varying from 10 to 500. Note that when the number of observations in a trace is less than the decision deadline, then the full trace was used as the input. In Figure 5, we varied the parameter of QuickStop from 0.05 to 1.2 with step size 0.05. In Figure 5, all six other algorithms used full Weibo traces as input. The key observations are summarized below.

  • [leftmargin=*]

  • High Accuracy: Figure 5(a) shows that the accuracy of QuickStop only with user features is substantially higher than other algorithms even when other algorithms use both user features and content features. Specifically, QuickStop with achieves higher accuracy than other algorithms with 500 observations with less than observations on average. Under QuickStop, as increases, the accuracy decreases but the number of observations used decreases as well, which is the trade-off between accuracy and speed.

  • Quick Detection: Quickest misinformation detection is the key objective of our algorithm. Figure 4 shows that the accuracy of QuickStop in comparison with the other algorithms. QuickStop with achieves an accuracy of 0.93 with observations on average while the accuracies of all other six algorithms are lower than 0.90 even with 500 observations. Note that three of the six algorithms include content features which are not used in QuickStop.

  • Low False Negative: In almost all cases, the false negative rate of QuickStop is lower than the false positive rate. This is because with the discriminative propagation cost, QuickStop is more aggressive on declaring misinformation than news in order to minimize the propagation cost.

Figure 4. Performance of Early Misinformation Detection under Different Decision Deadlines (based on the Weibo Data)
Figure 5. Performance of QuickStop under Different Choices of Parameter (based on the Weibo Data)

6. Evaluation with Synthetic Data

We further evaluate the algorithm with synthetic network and information spreading data. We construct a network with nodes using the preferential attachment model (simon1955class, ). Our network includes two types of nodes: gossipers and messengers, where gossipers are more likely to spread misinformation than messengers. When a new node joins the network, it is assigned a type uniform at random, and then connects to three existing nodes in the network, i.e. forming three edges. For each edge, the new node first decides whether to connect to a node of the same type (with probability 0.7) or a node of different type (with probability 0.3). After deciding the type, say it chooses to connected to a gossiper, the new node selects a gossiper among all existing gossipers with probability proportional to their degrees. We define the edge types as follows: 0 - (messenger, messenger), 1 - (gossiper, messenger), 2 - (messenger, gossiper) and 3 - (gossiper, gossiper). We simulated the information spreading using the continuous-time SI model. For each set of parameters, we create traces. Each trace was flagged as news with . The probabilities that an article is retweet over a given edge under the SI model are summarized in Table 3. From example, news spreads from a messenger to another messenger with probability 0.9, spreads from a gossiper to messenger with probability 0.7, misinformation spreads from a messenger to another messenger with probability 0.1, and from a gossiper to another gossiper with probability 0.9.

0 1 2 3
News 0.9 0.7 0.3 0.1
Misinformation 0.1 0.2 0.7 0.9
Table 3. Probability of Information Spreading over Different Edge Types

The objective of this evaluation with the synthetic data is to evaluate the robustness of the online QuickStop-Detection with classification errors. With the synthetic data, the edge types are known so we can control the edge classification errors by random flipping the edge types and evaluate the performance of QuickStop-Detection with respect to classification errors.

Figure 6 shows the performance of QuickStop with different classifcation errors. We introduced edge classification errors such that the type of an edge is correctly classified with probability and misclassified with probability We varied from to . In Figure 6, we used and for QuickStop.

  • [leftmargin=*]

  • Robust to Learning Errors: We can observe that even when 50% edges are not correctly classified, QuickStop still has an accuracy close to 91%, which demonstrates the robustness of the detection to modeling errors.

Figure 6. Performance of QuickStop with the synthetic Data with classification errors

7. Related Work

As we pointed out at the beginning of the introduction, government, industry and academia have made great efforts to combat misinformation. This section focuses on new developments on misinformation detection with machine-learning and data-mining methods in the research community.

We have discussed several early detection algorithms and compared their performance with QuickStop. We now focus on other related work. The algorithm developed in (qazvinian2011rumor, ) detects whether a post is similar to one of the posts (topics) that are known to be misinformation; and declares it as misinformation if so. A line of work (takahashi2012rumor, ; hamidian2016rumor, ; hamidian2015rumor, ) analyzes similar models and knowledge/content-based detection algorithms. These approaches are effective for detecting whether a post is associated with misinformation already identified, but not suitable for detecting new misinformation. (banko2007open, ; magdy2010web, ; ciampaglia2015computational, ; shi2016fact, ) exploit open fact-checking sources (such as DPpedia, Wikipedia, etc) to validate the truthfulness of news articles. Viewpoints of users towards news articles such as “like” and “dislike” have also been used in the literature to infer the veracity of a news article. For example, (tacchini2017some, ) classifies Facebook posts as hoax or non-hoaxes based on the set of users who “liked” them. The work (jin2016news, ) uses a topic model to discover viewpoint values from tweets and evaluated the credibility of relevant posts based on these viewpoints.

In (castillo2011information, ), a comprehensive data-mining approach has been proposed for determining the veracity of social media contents. They considered four categories of features: message-based, user-based, topic-based, and propagation-based features to study information credibility, and proposed a PageRank-like credibility analysis method to verify the credibility of twitter events. The features used in (castillo2011information, ) have later been used in other papers (yang2012automatic, ; liu2015real, ; gupta2012evaluating, ). In (kwon2017rumor, ), the authors argued that features vary over time. They reported that linguistic features are effective for detecting rumor even at the early stage of information spreading. A model for time-varying features has been proposed in (ma2015detect, ). (wu2015false, ) explores the use of the features of the message propagation trees for detecting misinformation. (chua2016linguistic, ) analyzes six categories of features: comprehensibility, sentiment, time-orientation, quantitative details, writing style, and topic. (derczynski2017semeval, ) analyzes users’ stance in their tweets to evaluate the credibility of information. (chang2016extreme, ) studies the characteristics of users who often post misinformation, and proposes that after identifying these users, a news article is likely to be misinformation if it spreads among these users. (vosoughi2015automatic, )

proposes a misinformation detection algorithm with dynamic time wrapping and hidden Markov models based on three categories of features (linguistic, user identities and temporal propagation related features).

Users play the central role in information diffusion in social networks. Their social engagements such as sharing, forwarding, commenting are considered to be auxiliary information for improving fake news detection. (tschiatschek2018fake, ) uses users’ flags of fake news as signals and leverages community for misinformation detection by learning the users’ flagging accuracy. Online social network users who intentionally spread misinformation can be divided into three categories: (1) bots, software apps that run automated scripts555https://en.wikipedia.org/wiki/Internet_bot (2) trolls, persons who like to provoke others, and (3) cyborgs666https://en.wikipedia.org/wiki/Internet_troll, accounts registered to run automated programs that mimic human behaviors(shu2017fake, ). (chen2018call, ; shao2017spread, ) analyze the behavior patterns of bots and trolls in misinformation propagation. In (chu2012detecting, ), an automated method is proposed for classifying the users into the three categories mentioned above. In (morstatter2016new, ), bot detection is studied. (shu2018understanding, ) analyzes the users’ role in spreading information and concludes that (1) some specific users are more likely to believe in misinformation than real news; (2) these users have different features form other users. These two key observations motivated the edge-based model considered in this paper. (abbasi2013measuring, ) proposes a method for measuring user credibility in information spreading for misinformation detection. The spread of rumors and misinformation has also been studied in (vosoughi2018spread, ; jin2013epidemiological, ; friggeri2014rumor, ), where it has been shown that misinformation and news have different spreading patterns and structures. In this paper, we consider both edge profiles (the edge classification) and spreading patterns (the Markovian spreading model) in QuickStop to design a highly efficient misinformation detection algorithm. Different from existing work, QuickStop is an optimal stopping algorithm that optimizes the number of observations in realtime and makes the quickest decision on misinformation detection.

8. Conclusions

In this paper, we proposed a quickest misinformation detection algorithm, named QuickStop. We formulated the problem as an optimal stopping problem with a asymmetric cost function towards misinformation. We proved that the problem is a Markov optimal stopping problem and showed that the solution is a threshold-based stopping rule based on the martingale theory. Our numerical results with a real-world data demonstrated that QuickStop outperforms existing algorithms even though the latter use 10 times (sometimes 50 times) more observations and use more features. Our numerical evaluation with the synthetic data showed that the algorithm is robust to edge classification errors.

References

  • (1) Abbasi, M.-A., and Liu, H. Measuring user credibility in social media. In International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction (2013), Springer, pp. 441–448.
  • (2) Banko, M., Caella, M. J., Soderland, S., Broadhead, M., and Etzioni, O. Open information extraction from the web. In IJCAI (2007), vol. 7, pp. 2670–2676.
  • (3) Castillo, C., Mendoza, M., and Poblete, B. Information credibility on Twitter. In Proceedings of the 20th International Conference on World Wide Web (2011), ACM, pp. 675–684.
  • (4) Chang, C., Zhang, Y., Szabo, C., and Sheng, Q. Z. Extreme user and political rumor detection on Twitter. In International Conference on Advanced Data Mining and Applications (2016), Springer, pp. 751–763.
  • (5) Chen, T., Li, X., Yin, H., and Zhang, J.

    Call attention to rumors: Deep attention based recurrent neural networks for early rumor detection.

    In Pacific-Asia Conference on Knowledge Discovery and Data Mining (2018), Springer, pp. 40–52.
  • (6) Chu, Z., Gianvecchio, S., Wang, H., and Jajodia, S. Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing 9, 6 (2012), 811–824.
  • (7) Chua, A. Y. K., and Banerjee, S. Linguistic predictors of rumor veracity on the Internet. In Proceedings of the International MultiConference of Engineers and Computer Scientists (IMECS) (2016), pp. 387–391.
  • (8) Ciampaglia, G. L., Shiralkar, P., Rocha, L. M., Bollen, J., Menczer, F., and Flammini, A. Computational fact checking from knowledge networks. PloS one 10, 6 (2015), e0128193.
  • (9) Derczynski, L., Bontcheva, K., Liakata, M., Procter, R., Hoi, G. W. S., and Zubiaga, A. Semeval-2017 task 8: Rumoureval: Determining rumour veracity and support for rumours. arXiv preprint arXiv:1704.05972 (2017).
  • (10) Friggeri, A., Adamic, L. A., Eckles, D., and Cheng, J. Rumor cascades. In ICWSM (2014).
  • (11) Gupta, M., Zhao, P., and Han, J. Evaluating event credibility on Twitter. In Proceedings of the 2012 SIAM International Conference on Data Mining (2012), SIAM, pp. 153–164.
  • (12) Hamidian, S., and Diab, M. Rumor identification and belief investigation on Twitter. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (2016), pp. 3–8.
  • (13) Hamidian, S., and Diab, M. T. Rumor detection and classification for Twitter data. In Proceedings of the Fifth International Conference on Social Media Technologies, Communication, and Informatics (SOTICS) (2015), pp. 71–77.
  • (14) Jin, F., Dougherty, E., Saraf, P., Cao, Y., and Ramakrishnan, N. Epidemiological modeling of news and rumors on twitter. In Proceedings of the 7th Workshop on Social Network Mining and Analysis (2013), ACM, p. 8.
  • (15) Jin, Z., Cao, J., Zhang, Y., and Luo, J. News verification by exploiting conflicting social viewpoints in microblogs. In AAAI (2016), pp. 2972–2978.
  • (16) Kwon, S., Cha, M., and Jung, K. Rumor detection over varying time windows. PloS ONE 12, 1 (2017), e0168344.
  • (17) Kwon, S., Cha, M., Jung, K., Chen, W., and Wang, Y. Prominent features of rumor propagation in online social media. In International Conference on Data Mining (2013), IEEE.
  • (18) Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., Rothschild, D., et al. The science of fake news. Science 359, 6380 (2018), 1094–1096.
  • (19) Liu, X., Nourbakhsh, A., Li, Q., Fang, R., and Shah, S. Real-time rumor debunking on Twitter. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (2015), ACM, pp. 1867–1870.
  • (20) Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B. J., Wong, K.-F., and Cha, M. Detecting rumors from microblogs with recurrent neural networks. In

    Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence

    (2016), IJCAI’16, AAAI Press, pp. 3818–3824.
  • (21) Ma, J., Gao, W., Wei, Z., Lu, Y., and Wong, K.-F. Detect rumors using time series of social context information on microblogging websites. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (2015), ACM, pp. 1751–1754.
  • (22) Ma, J., Gao, W., and Wong, K.-F. Detect rumors in microblog posts using propagation structure via kernel learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2017), vol. 1, pp. 708–717.
  • (23) Ma, J., Gao, W., and Wong, K.-F. Rumor detection on twitter with tree-structured recursive neural networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2018), vol. 1, pp. 1980–1989.
  • (24) Magdy, A., and Wanas, N. Web-based statistical fact checking of textual documents. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (2010), ACM, pp. 103–110.
  • (25) Morstatter, F., Wu, L., Nazer, T. H., Carley, K. M., and Liu, H.

    A new approach to bot detection: striking the balance between precision and recall.

    In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2016), IEEE Press, pp. 533–540.
  • (26) Poor, H. V., and Hadjiliadis, O. Quickest detection, vol. 40. Cambridge University Press Cambridge, 2009.
  • (27) Qazvinian, V., Rosengren, E., Radev, D. R., and Mei, Q. Rumor has it: Identifying misinformation in microblogs. In

    Proceedings of the Conference on Empirical Methods in Natural Language Processing

    (2011), Association for Computational Linguistics, pp. 1589–1599.
  • (28) Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., and Menczer, F. The spread of fake news by social bots. arXiv preprint arXiv:1707.07592 (2017), 96–104.
  • (29) Shi, B., and Weninger, T. Fact checking in heterogeneous information networks. In Proceedings of the 25th International Conference Companion on World Wide Web (2016), International World Wide Web Conferences Steering Committee, pp. 101–102.
  • (30) Shu, K., Sliva, A., Wang, S., Tang, J., and Liu, H. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter 19, 1 (2017), 22–36.
  • (31) Shu, K., Wang, S., and Liu, H. Understanding user profiles on social media for fake news detection. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) (2018), IEEE, pp. 430–435.
  • (32) Simon, H. A.

    On a class of skew distribution functions.

    Biometrika 42, 3/4 (1955), 425–440.
  • (33) Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S., and de Alfaro, L. Some like it hoax: Automated fake news detection in social networks. arXiv preprint arXiv:1704.07506 (2017).
  • (34) Takahashi, T., and Igata, N. Rumor detection on Twitter. In Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on (2012), IEEE, pp. 452–457.
  • (35) Tschiatschek, S., Singla, A., Gomez Rodriguez, M., Merchant, A., and Krause, A. Fake news detection in social networks via crowd signals. In Companion of the The Web Conference 2018 on The Web Conference 2018 (2018), International World Wide Web Conferences Steering Committee, pp. 517–524.
  • (36) Vosoughi, S. Automatic detection and verification of rumors on Twitter. PhD thesis, Massachusetts Institute of Technology, 2015.
  • (37) Vosoughi, S., Roy, D., and Aral, S. The spread of true and false news online. Science 359, 6380 (2018), 1146–1151.
  • (38) Wu, K., Yang, S., and Zhu, K. Q. False rumors detection on Sina Weibo by propagation structures. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on (2015), IEEE, pp. 651–662.
  • (39) Yang, F., Liu, Y., Yu, X., and Yang, M. Automatic detection of rumor on Sina Weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics (2012), ACM, p. 13.
  • (40) Zhao, Z., Resnick, P., and Mei, Q. Enquiring minds: Early detection of rumors in social media from enquiry posts. In Proceedings of the 24th International Conference on World Wide Web (2015), International World Wide Web Conferences Steering Committee, pp. 1395–1405.

Appendix A Proof of Theorem 3.1

We first show that when is a stopping time.

Since is a stopping time based on we further have

Therefore, we have

For any , it is well known (see for example (poor2009quickest, )) that

We next present the proof tailored for our problem for the completeness of the paper.

Note that the equation is obvious when or , so we only consider the case . Recall that

where the inequality becomes equality when the algorithm declares when and declares otherwise.

Appendix B Proof of Theorem 3.2

We define the following value function for

Then is the minimum expected total cost if one is only allowed to stop at or after time step given the state at . Note . Then the minimum expected total cost over the prior is

where we use the fact that as the first observation does not provide any information about the type of the information.

Now according to the optimality principle of dynamic programming,

where is a random process defined by as in equation (1).

We next show that is a martingale with respect to . Define , which is the -algebra generated by , …, . We have

Since

we have

For let and for all . Then

In other words, because the posterior probability

is a martingale with respect to the observations , every time step passed before time (when one is allowed to stop and make a decision) incurs a constant additive cost of to the minimum expected total cost.

Now define

(9)

Then for any ,