1 Introduction
In recent years, users in social media and microblogging platforms have played major roles in disseminating online content and news stories [67, 22]. While some are based on factual information, a nonnegligible portion of the news contains false information that often stir unnecessary disputes, sway people’s opinions, give rise to political polarization, and even cause financial losses in the stock market [28, 5, 46, 6, 52].
In reaction to this, meaningful academic progress has been made to uncover the spreading pattern of true and false news in social networks [59, 71, 32], detect misinformation from content, network, and temporal information [31, 38, 39, 40, 73, 53], earlyfactcheck and curtail the spread of falsity [27, 19, 65, 4], build factchecking systems [47, 66], and investigate the potential intervention of user biases in the process of factchecking [3]. Also, recent work by Vosoughi et al. conducts an indepth investigation into the differences between true and false information in the aspects of topics, cascade patterns and sizes, diffusion speeds, emotion, and sentiment [67]. Another recent paper by Del Vicario et al. suggests that homogeneity among users is the main driving factor of the dissemination of both true and false news [12].
Motivated by such progress, we propose HomogeneityBased Transmissive Process (HBTP), a Bayesian nonparametric model that captures the complex interplay between the textual content, topics of true and false news, and the topical interests of users who share the news in social networks. Specifically, our model is operationalized as follows:

Each user is assigned user interest which is computed from the collection of content information of news stories that the user shares.

Each news story is assigned a homogeneity index which regulates the degree of homogeneity. We define this documentcentric (rather than usercentric) homogeneity index to be the degree of uniformness of user interests among users who share the story.
Under our modeling assumption, a user’s topical interest transmits to another user through the process of them cosharing a news story in a social network, and the degree of transmission between users depends on the homogeneity index of the news story that the users coshare. Thus, suppose there are two news stories, one with a high homogeneity index and the other with a low index. The news story with the high index will propagate through a highly biased subgroup within a social network, while propagation of the news story with low homogeneity index will look more like a random walk.
To formulate such modeling assumptions we design our HBTP model as follows:

Within the nonparametric topic modeling framework [60], we model the content of news stories from the topical interests of users who participate in diffusing the news story.

We formulate each user’s topical interest as a single probability measure in twolayer Dirichlet processes
[18] and allow it to be drawn not only from the single, upperlevel probability measure but also from multiple probability measures of the preceding users. Also, we adopt the Gamma process construction of HDP [49] with a modification to incorporate the homogeneity index and regulate the homogeneity among user interests. 
We combine nonparametric topic models with Bayesian Gaussian process latent variable models (Bayesian GPLVMs) [63] to infer the homogeneity index of a news story from the users’ and the stories’ topics in a highly nonlinear fashion.
Since both the content modeling and homogeneity discovery part of our modeling fall into the Bayesian nonparametric framework, the posterior inference of our model can be done using the variational Bayes methods [23]. We train our model on a realworld Twitter dataset [36, 37, 39] that consists of user posting and sharing with true and false news stories. Our main contributions and the findings through the experiments are as follows:

We develop HBTP that jointly models contents of news stories and user sharing events. Our model discovers the latent topics, topical interests of users, and homogeneity indices of news stories; these mutually reshape one another in the joint modeling.

We find that the homogeneity indices that our model discovered significantly varies according to the labels of genuineness of the stories.

We develop a supervised extension of HBTP (sHBTP) that incorporates the genuineness of news stories in the training of the model, and conduct a classification task of predicting the genuineness of a news story in the test set. We compare our prediction results with stateoftheart fake news detection models and with other recently proposed supervised topic models and show that our model outperforms the comparison models in most of the evaluation metrics.

We make our code and data (Twitter data augmented with the news articles that the tweets refer to) publicly available for future research (https://github.com/todoaskit/HBTP).
Related work.
By incorporating the label information of the news stories, the supervised formulation of our model performs the classification task of predicting the labels of the news stories. There are featurebased methods that leverage text, social network, temporal traces and propagation models to classify true and false news in supervised fashion
[39, 50, 20, 29, 72, 70, 31, 32]. Also, another line of research focuses on devising algorithms that mitigate false news and their diffusion in social networks [9, 64, 48, 27, 17].From the modeling aspect, there are upstream models in parametric topic modeling literature that generate the topic indices in a document from the topic distribution of a set of entities [54, 45, 14, 43, 44]. HBTP falls into this category in a way that a set of users who share a news story participate in modeling its content. Also, to generate both homogeneity values and labels of news stories, we incorporate Bayesian GPLVM [63, 11] in our model in the nonparametric topic modeling framework. Bayesian GPLVM extends GPLVM [33, 34] by incorporating additional priors and conducting posterior inference using variational methods [23, 68]. Kandemir et al. recently proposed GPSTM as a joint model between LDA and Bayesian GPLVM and demonstrate the performance gain of jointly modeling LDA and GPLVM over linear classifiers [24]. Lastly, while our model focus on the joint modeling of text and users’ sharing patterns [26], textual data has also been aligned with users’ temporal traces to model diffusion pattern and text [16, 21], to cluster the documents in both discrete [1] and continuous [15, 69, 41] time domains. These models can be complementary to our model and be cotrained to increase the predictive power in detecting false news.
2 Background on the Gamma process construction of HDP
Hierarchical Dirichlet process (HDP) for modeling document collections is a twolevel Dirichlet process (DP) [18] that functions as a nonparametric Bayesian prior for mixed membership models [60]. Being nonparametric, we do not have to specify the number of topics a priori, but instead, it can be inferred from the data [60].
In the HDP topic model, the random probability measure drawn from the firstlevel Dirichlet process becomes the base measure for the secondlevel Dirichlet processes:
where is the document index, is the base measure, and are the first and the secondlevel DP concentration parameters respectively. is a Dirichlet distribution on the vocabulary simplex, and the atoms of are an infinite set of wordtopic distributions drawn from .
The first and the secondlevel Dirichlet processes yield random discrete probability measures and . Here, the weights and depend on the corresponding concentration parameters and and atoms are drawn from the base probability measure: . Finally, each secondlevel generates its associated topicindicator variable and each word is drawn from for times as follows:
Following the work of Paisley et al. [49], we constructively define the first and the secondlevel Dirichlet processes using the stickbreaking process [57] and normalized Gamma processes [49]. For the firstlevel , we represent the weights as
The secondlevel probability measures are conditionally distributed on the firstlevel random probability measure , and the corresponding weights can be represented as
3 HomogeneityBased Transmissive Process
In this section, we introduce the modeling scheme of the homogeneity based transmissive process. We start by specifying different kinds of data structure the model works on. Next, we describe the generative process of the first and secondlevel Dirichlet processes, explain the transmissive property when generating the secondlevel probability measures and explain how the homogeneity indices of news stories function in regulating those probability measures. We then explain how we generate the content of a news story as a mixture of users’ probability measures and illustrate how we draw homogeneity indices of news stories. Lastly, we discuss the parametric and supervised formulations of our model.
3.1 Event representation
In social networks, users introduce news stories and these stories are propagated throughout the network by means of sharing (e.g., tweets and retweets). We first represent such tweet and retweet events in a social network as a triplet
Here, we denote a set of events as . The first item is a user who created the tweet/retweet, and the second item is a user who created the preceding tweet if the event is a retweet. We assume that there are users and denote a set of users as . The last item is a news story. We represent each news story as a triplet
where is a bagofwords representation of content of the story, is a homogeneity index, and is the label of the story, e.g., true or false. We remark that the homogeneity index is hidden and will be uncovered using our model. There are stories and the set of stories is denoted as .
3.2 Modeling the first and the secondlevel DPs
Our model inherits the twolevel DP construction of the HDP, and the firstlevel DP formulates identically to that of the HDP. We draw a probability measure from DP with the base measure and the concentration parameter . Using the stickbreaking construction [57], we represent the firstlevel probability measure as , and .
Transmissive modeling of user DPs. For the secondlevel, we endow each user in a social network, instead of a news story, with a single DP. The key idea is that each user’s probability measure is transmitted to the users who retweeted her (re)tweet, and the degree of transmission, i.e., the similarity between the two users’ probability measure, depends on the homogeneity index of the news story the users are sharing. To formalize this, for each user , we first construct which is a set of events whose first element is . From we retrieve , a set of users preceding and denote the set of news stories that are shared by and as .
When a user has no predecessor, i.e., , the user’s probability measure is drawn from
Recall that is the weight of the atom of the firstlevel probability measure.
If the user has preceding users, for each we draw , which is user ’s probability measure transmitted by the preceding user . Denoting the homogeneity index of story as , is drawn from the DP with the base measure and the concentration parameter as follows:
(1) 
where is the weight of the atom of the preceding user ’s probability measure. Here, we posit that there is no circular retweet among users. Finally, denoting the weight of the atom of as , we constructively define user ’s probability measure as
Note that derived from multiple s is still a probability measure as it meets the countable additivity property and the weights of the atoms lie in the unit interval [55].
Remark 1. We modify the shape parameter of in equation 1 to incorporate the effect of homogeneity of news stories on the transmission of the probability measure. This approach differs from past approaches which modify the rate parameter to achieve topicwise rescaling [49, 25]. In our model, changing the rate parameter would be ineffective because the expected impact of the homogeneity index is identical for all topics, and scaling the rate parameter using the same multiplier for all topics yields identical outcome without the multiplier in the Gamma process after the normalization step. For this particular reason, the rate parameters in the Gamma processes disappear when used to represent DP [13]. Conversely, by scaling the shape parameter by, say,
, the mean of the normalized Gamma process remains unchanged while the variance is reduced by
. Therefore, when a probability measure of a user is transmitted by a news story with a high homogeneity index, the probability measure of the user will be more similar to a preceding user’s probability measure. Finally, note that since the rate parameter remains unchanged, our formulation of the secondlevel probability measures can be represented using the stickbreaking process. We use the normalized Gamma process representation to enjoy the merit of generating independently, which will be useful for approximate posterior inference.3.3 Modeling news stories using mixtures of user DPs
We model the contents of a news story using probability measures of users who participated in propagating the story. There are multiple previous parametric topic models that adopt this upstream approach to model a document using probability measures from multiple sources [54, 45, 14, 43, 44]. Here, using , we define as the set of users who propagated the story . Then, following the work of RosenZvi et al. [54], for word, we sample a user uniformly from and draw the topic indicator variable and word from as follows:
Generating homogeneity indices of stories. In equation 1, we assume that a user’s probability measure is generated from her preceding user’s probability measure and the similarity of the two probability measures is calibrated by the homogeneity index of a news story that bridges the two users. Here, the homogeneity index of a news story is generated by
, which is a normalization of indicator onehot vectors in the topic space. Following the Bayesian Gaussian process latent variable model (Bayesian GPLVM)
[63, 11], we first generate a hidden input vectorfrom the Gaussian distribution centered at
, and draw latent function values from the Gaussian distribution centered at zero and the covariance matrix constructed from the pairings of news stories’ s. Finally, the homogeneity index for a news story is a noisy observation of with Gaussian noise. The formal constructions are as follows:(2) 
We use squared exponential kernel for the covariance matrix .
Supervised HBTP.
HBTP can be used for supervised learning,
e.g., to predict the type of genuineness of news stories. Similar to drawing the homogeneity indices, we draw labels of the stories from the same hidden input vector. The difference is that the homogeneity indices are hidden in the modeling state and have to be inferred from the model, while the labels are given in the training data.Parametric counterpart of HBTP.
There is a much overlap in the modeling scheme when we develop a parametric version of HBTP. The major differences are that in the parametric model, (1) we predefine the number of topics and (2) we use the logistic normal prior
[2] for the usertopic distributions used to capture the transmissive property among users. With the parametric HBTP, we sacrifice the model flexibility, but we gain an algorithmic benefit of inferring the homogeneity variables for news stories with closedform updates by solving the Lambert W function. Refer to the Appendix for the generative process of parametric HBTP.Remark 2. In our model, we jointly model Bayesian GPLVM with a nonparametric topic model which provides both computational and modeling benefits. First, incorporating GPLVM with hidden inputs reduces the time complexity of drawing function values from to , where is the number of stories and is the number of auxiliary inducing variables [10, 58, 62, 56]. In the modeling aspect, we can model multidimensional or categorical labels from the topic indicator variable without directly conditioning on the documenttopic simplex, which has been pointed out as one of the weaknesses of the supervised topic model [42] compared to the labeled LDA [51]. Finally, we remark that when computing vectorwise norms of , we resort to a truncated approximation method [7, 30, 61] to predefine the upperbound of the number of topics used.
4 Posterior Inference of HBTP
We derive a joint variational inference algorithm to approximate the posterior of HBTP. Since both the text modeling and the homogeneity generation of HBTP fall into the Bayesian nonparametrics, we refer to the wellestablished literature for variational inference in nonparametric topic models
[7, 30, 61] and Bayesian GPLVMs [11]. We derive the log joint distribution of HBTP as:
The log variational distribution of HBTP can be factorized into
with the truncation value with . We specify the variational distributions for the topicrelated latent variables with their variational parameters as
To specify the variational distributions for homogeneityrelated variables, we introduce auxiliary inducing variables with corresponding inducing input locations with , and express the variational distributions for the hidden input vectors and noisefree GP latent functions as
where we set and assume independence among topics for . Here we set to be drastically smaller than to increase the inference speed. Finally, we specify the variational distributions for topic variables as follows:
where we use the delta function for both simplicity and tractability in inference steps as demonstrated in the work of Liang et al. [35]. Given the evidence, the task of approximating the variational distribution with respect to the original posterior is equivalent to minimizing the KL divergence between the joint distribution and the factorized variational distribution or maximizing the evidence lower bound. The main derivations come from the expectations of the logprobability of the latent variables with respect to the variational distribution , and the optimization is done using the coordinate ascent algorithm. For the remainder of the section, we report updating formulas for the latent variables that are unique for HTBP compared to previous nonparametric topic models and GPLVMs. Refer to the Appendix for the full updates.
Storylevel updates. For the storylevel latent variables, is at the intersection of the topic modeling part and the GP part, and the update for is
Also, the update for the homogeneity variable is
where the full expression for the matrix is specified in the Appendix.
Userlevel updates. Updating is done as follows:
where and we simplify to .
Computational complexity. For text modeling, HBTP has the time complexity with being the number of events, the average length of a story (in words) and the truncation size. The time complexity for the homogeneity variables is .
5 Experiments
We first investigate the interplay among news stories’ content, homogeneity indices and users’ topical interests modeled in nonparametric HBTP and examine how the homogeneity indices of news stories regulate the alignment of topical interests of users who tweet/retweet the news stories. Through the process, we find interesting relationship between the homogeneity indices and the labels (genuineness) of news stories when the labels are not observed during training. Motivated by the finding, we demonstrate how the supervised version of our model (sHBTP) is able to detect the labels of news stories better than the stateoftheart comparison models.
Dataset. We use the extension of Twitter rumor dataset that has been previously used in rumor detection research [36, 37, 39, 40]. The original dataset contains, for each tweet, (1) the tweet ID, (2) URL of the news story (3) users’ tweet and retweet logs, and (4) a label, which is either true/T, false/F, nonrumor/NR, or unverified/U. The "true", "false", and "unverified" stories are potential "rumors" that need to be factchecked by the debunking websites such as snopes.com to inspect their genuineness. "True" stories are the ones that turned out to be genuine after the factchecking, while "false" stories are the ones that contain misinformation and "unverified" stories are the ones that cannot be conclusively judged even after the human evaluation. Finally, "nonrumor" stories are the ones that do not need the factchecking because e.g., they came from reliable sources.
For the preprocessing of the original dataset, we exclude the news stories whose content information cannot be retrieved with the Twitter API. Then, we remove "leaf" users, the ones who shared a story once and are at the leaf of the diffusion cascades, since they do not affect the calculation of the homogeneity indices and thus, the outcome of our modeling. Also, we confirm that there is no "circular" retweets among users regarding news stories so that our graphical model remains a directed acyclic graph (DAG). After the preprocessing, there are 79,416 users and 175,389 tweets and retweets. For the news stories, there are 1,107 stories in total that contain 12,515 unique tokens. The news stories are divided into 289, 262, 371, and 185 true, false, nonrumor and unverified stories. Finally, for the prediction task, we conduct 5fold cross validation on the Twitter dataset.
Parameter settings. For HBTP and sHBTP, we set the topic Dirichlet prior to 0.1. For GPLVM in the model, we set 50 inducing points for the homogeneity variable and the labels, noise precision parameters , to 10, and the variational precision parameter to 0.1. For the comparison methods, we set the values to be consistent with ours for overlapping parameters. Otherwise, we follow the parameter settings disclosed by the original paper or optimize the values through exhaustive search.
Analyses on the homogeneity of news stories and their topic and labels using unsupervised HBTP. We analyze the interplay between the homogeneity indices of news stories uncovered by HBTP and the alignment of topical interests of users linked to the story and examine how news stories with different topics and labels have different homogeneity indices.
First, figure 1 lists 6 topics chosen from HBTP and their corresponding homogeneity indices. We observe that topics related to nature, urban culture, and science in general have low homogeneity indices, suggesting that users who share news stories with these topics are in general less topically aligned than those who share other news stories. On the other hand, the homogeneity indices of economic and industryrelated topics are in the middlerange, and topics regarding domestic and international politics have high homogeneity indices. Note that our topicword probability measures are attached with homogeneity indices in a similar way to that of supervised topic models [42], since both HBTP and supervised topic models leverage the labels conditioned on topics, unlike other models that incorporate label information but the conditioning is the other way around [51]. However, the modeling assumption of HBTP diverges from that of supervised topic models in that in supervised topic models, the labels are "given", whereas in our model, both topics and homogeneity indices are "latent", and the homogeneity indices need to be inferred from users’ tweet/retweeting patterns and both topics and homogeneity indices mutually reshape one another during the jointmodeling.
Second, from figure 1(a), we observe that the homogeneity indices of stories vary significantly according to their labels. We note that HBTP computes the homogeneity indices of news stories in unsupervised fashion, without observing the labels of the stories while training. Interestingly, "nonrumors" have the highest homogeneity indices while the other types of news stories that went through factchecking process have much lower values. Among the factchecked stories, the stories that are verified to be "true" have significantly lower homogeneity indices than "false" and "unverified" stories. Figure 1(b) shows the distribution of news stories with different labels over homogeneity indices. From the figure, we confirm that majority of "nonrumors" have high homogeneity indices (top 40%), majority of "true" stories have low homogeneity indices (bottom 40%), while "false" and "unverified" stories are distributed in the middle range. Overall, the difference in homogeneity indices of news stories with respect to different labels suggest that we can leverage such homogeneity indices and topics to classify the news labels in prediction tasks.
Method  Accuracy  True  False  NonRumor  Unverified 

HDP+SVM: Linear  0.484 0.019  0.593  0.490  0.514  0.357 
HDP+SVM: RBF  0.605 0.024  0.644  0.522  0.629  0.514 
HBTP+SVM: Linear  0.533 0.014  0.621  0.464  0.641  0.593 
HBTP+SVM: RBF  0.717 0.006  0.855  0.550  0.682  0.653 
GPSTM (Kandemir et al., 2018)  0.664 0.013  0.662  0.686  0.687  0.435 
BURvNN (Ma et al., 2018)  0.622 0.010  0.616  0.571  0.687  0.584 
TDRvNN (Ma et al., 2018)  0.698 0.004  0.664  0.668  0.723  0.693 
Supervised HBTP  0.781 0.012  0.891  0.740  0.812  0.622 
Results on label classification task using supervised HBTP and other methods. The plusminus sign indicates one standard error.
Lastly, we highlight the transmissive nature of our model and inspect how the homogeneity of a news story affects the degree of transmission between the topical interests of user pairs who tweet and retweet the story. In figure 3, we visualize three news stories tweeted by a user (user A). All three stories are labeled as "false", but the content of the stories differ, leading to different homogeneity indices. Here, the topic proportion of user A is transmitted to three other users, B, C, and D, each of whom retweeted different tweets of user A with different stories. Among users B, C, and D, since user D retweeted a tweet that contains a high homogeneityvalued story, her topic distribution is more similar to user A than, say, that of users B and C. By generalizing this observation to multiple tweeting and retweeting users, we can presume that the topical interests of users that share a high homogeneityvalued story will be more uniform than that of the users who share news stories with low homogeneity indices.
Story label prediction using supervised HBTP. We validate our modeling scheme by showing the predictive power of our model compared with the other competitive models. For this classification task, we use sHBTP that observes the labels of the news stories in the training set. Our comparison models are as follows:

HDP + SVM
: This method uses documenttopic probability measures for news stories using HDP, and uses support vector machine (SVM) as a classifier. For SVM, we use linear and radial basis function (RBF) kernels.

HBTP + SVM: Unlike HDP, documenttopic probability measures are drawn per users, and the topics of news stories are represented by aggregating users’ probability measures who shared the story. Also, the homogeneity of news stories are incorporated in the model which affect users’ probability measures.

BURvNN and TDRvNN: Bottomup and Topdown recursive neural networks [40] are the stateoftheart models designed for classifying true and false news. The models are operationalized on bottomup and topdown propagation trees where each node contains tweet contents. Thus, our model and the other topicbased comparison models exploit the content information of news sources, whereas BURvNN and TDRvNN use content information of tweets.
Note that there are tradeoffs between using the content information of news stories and tweets.
From one standpoint, contents in news stories have richer description about the actual event from which the models judge the genuineness. From another standpoint, as stated in the work of Ma et al. [39], the tweets often contain user judgments or debates regarding the news stories’ genuineness that can be critical hints the models exploit.
In table 1, we report the overall classification accuracy and the scores for the four labels. We confirm that our model outperforms the comparison methods in the overall accuracy and in the scores for all labels except "unverified". Specifically, in figure 4, we confirm that our model excels at differentiating "true" and "false" news, two controversial groups of news stories that needed to be factchecked by human experts for the classification. On the flip side, our model is relatively poor at detecting "unverified" stories. One possible reason for the performance edge of our model is the inclusion of homogeneity variable in our model; as observed in figure 2, the log homogeneity indices of stories with four labels differ significantly, which help the model to classify confusing label pairs, e.g., highvalued nondiagonal entries in the confusion matrices in figure 4. Also, the performance edge over GPSTM highlights the efficacy of our model’s nonparametric and joint modelingnature among user interests, news stories’ contents and the homogeneity variables that encode information of users’ sharing patterns.
6 Conclusion and Future Work
We developed HBTP, a novel Bayesian nonparametric model that jointly models topics and homogeneity indices of news stories and user interests. HBTP for modeling the content of news stories extends hierarchical Dirichlet process in a way that instead of a single upperlevel probability measure affecting the lowerlevel probability measures, we allowed the probability measures of the bottomlayer to be transmitted from one another. To discover homogeneity values of the news stories, we incorporated Bayesian GPLVM in the model. By conducting quantitative and qualitative analyses, we found interesting relationships between the homogeneity index and the genuineness of news stories, and between the homogeneity index and the topics extracted. Finally, we showed how HBTP can be extended easily to predict genuineness of news stories better than the stateoftheart rumor detection methods.
Our model can be extended to leverage different types of data to increase the flexibility of the model and to boost the predictive power. One direction is to explicitly incorporate users’ diffusion cascades and the contents of the user tweets in the model. For example, our model and RvNN based model of Ma et al. [40] can work in a complementary fashion because the input spaces these models operate on are almost orthogonal to each other.
Also, while the RvNN based models require the users’ diffusion cascades and thus cannot predict completely new news stories, we can explore the predictive power of our model on newly created stories with no user sharing by directly inferring the topic probability measures of these heldout stories. In this setting, we can only leverage the homogeneity indices of the training stories.
Appendix A Parametric HBTP
For parametric HBTP, we sample a user’s topic distribution
using a multivariate normal distribution with a mean being the preceding user’s topic distribution. For the variance, we incorporate the homogeneity value to regulate the similarity between the two users’ topic distributions. Then, we put a sigmoid function over
to turn it into the simplex, and draw topic labels for each words for news stories. Note that the logistic normal distribution has been previously used in correlated topic models [8], but with a different purpose: to capture correlations among topic. Finally, we illustrate the generative process of the parametric counterpart of HBTP as follows:
For each topic , draw wordtopic distribution .

For user , draw with
if has preceding users . If not, draw . 
For news story :

For each word index :

Draw user who propagated the story .

Draw topic indicator variable and draw word .


Draw homogeneity value using equation (9).

Appendix B Posterior Inference
Expectations w.r.t. . By taking expectations of the latent variables in HBTP with respect to the variational distributioin , we get the following results:
For stories and user variables,
For the corpuslevel,
Finally, for GPLVM latent variables,
Corpuslevel updates. Updating can be done in a closedform:
To update , we use the steepest ascent algorithm:
GPLVM updates. Updating formula for , with and , is:
where
To update and , we calculate
Finally, for , we have:
References
 [1] A. Ahmed and E. Xing. Timeline: A dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In UAI, 2010.
 [2] J. Aitchison. The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B, 44(2):139–177, 1982.
 [3] M. Babaei, A. Chakraborty, J. Kulshrestha, E. Redmiles, M. Cha, and K. Gummadi. Analysing biases in perception of truth in news stories and their implications for fact checking. 2018.
 [4] O. Balmau, R. Guerraoui, A.M. Kermarrec, A. Maurer, M. Pavlovic, and W. Zwaenepoel. Limiting the spread of fake news on social media platforms by evaluating users’ trustworthiness. arXiv preprint arXiv:1808.09922, 2018.
 [5] A. Bessi, M. Coletto, G. A. Davidescu, A. Scala, G. Caldarelli, and W. Quattrociocchi. Science vs conspiracy: Collective narratives in the age of misinformation. PloS one, 10(2):e0118093, 2015.
 [6] A. Bessi, A. Scala, L. Rossi, Q. Zhang, and W. Quattrociocchi. The economy of attention in the age of (mis) information. Journal of Trust Management, 1(1):12, 2014.
 [7] D. Blei and M. Jordan. Variational inference for Dirichlet process mixtures. Bayesian analysis, 1(1):121–143, 2006.
 [8] D. Blei and J. Lafferty. Correlated topic models. In NIPS, 2005.
 [9] C. Budak, D. Agrawal, and A. El Abbadi. Limiting the spread of misinformation in social networks. In WWW, 2011.
 [10] L. Csató and M. Opper. Sparse online Gaussian processes. Neural computation, 14(3):641–668, 2002.
 [11] A. Damianou, M. Titsias, and N. Lawrence. Variational inference for latent variables and uncertain inputs in Gaussian processes. JMLR, 17(1):1425–1486, 2016.
 [12] M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, E. Stanley, and W. Quattrociocchi. The spreading of misinformation online. PNAS, 113(3):554–559, 2016.
 [13] L. Devroye. Samplebased nonuniform random variate generation. In Winter Simulation Conference, 1986.
 [14] L. Dietz, S. Bickel, and T. Scheffer. Unsupervised prediction of citation influences. In ICML, 2007.
 [15] N. Du, M. Farajtabar, A. Ahmed, A. Smola, and L. Song. DirichletHawkes processes with applications to clustering continuoustime document streams. In KDD, 2015.
 [16] N. Du, L. Song, H. Woo, and H. Zha. Uncover topicsensitive information diffusion networks. In AISTATS, 2013.
 [17] M. Farajtabar, J. Yang, X. Ye, H. Xu, R. Trivedi, E. Khalil, S. Li, L. Song, and H. Zha. Fake news mitigation via point process based intervention. In ICML, 2017.
 [18] T. Ferguson. A Bayesian analysis of some nonparametric problems. The annals of statistics, pages 209–230, 1973.
 [19] A. Friggeri, L. Adamic, D. Eckles, and J. Cheng. Rumor cascades. In ICWSM, 2014.
 [20] A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In WWW, 2013.
 [21] X. He, T. Rekatsinas, J. Foulds, L. Getoor, and Y. Liu. Hawkestopic: A joint model for network inference and topic modeling from textbased cascades. In ICML, 2015.
 [22] J. Holcomb, J. Gottfried, A. Mitchell, and J. Schillinger. News use across social media platforms. Pew Research Journalism Project, 2013.
 [23] M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. An introduction to variational methods for graphical models. Machine learning, 37(2):183–233, 1999.
 [24] M. Kandemir, T. Kekeç, and R. Yeniterzi. Supervising topic models with Gaussian processes. Pattern Recognition, 77:226–236, 2018.
 [25] D. Kim and A. Oh. Hierarchical Dirichlet scaling process. Machine Learning, 3(106):387–418, 2017.
 [26] J. Kim, D. Kim, and A. Oh. Joint modeling of topics, citations, and topical authority in academic corpora. TACL, 5:191–204, 2017.
 [27] J. Kim, B. Tabibian, A. Oh, B. Schölkopf, and M. Gomez Rodriguez. Leveraging the crowd to detect and reduce the spread of fake news and misinformation. In WSDM, 2018.
 [28] S. Kumar and N. Shah. False information on web and social media: A survey. arXiv preprint arXiv:1804.08559, 2018.
 [29] S. Kumar, R. West, and J. Leskovec. Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes. In WWW, 2016.
 [30] K. Kurihara, M. Welling, and Y. W. Teh. Collapsed variational Dirichlet process mixture models. In IJCAI, 2007.
 [31] S. Kwon, M. Cha, and K. Jung. Rumor detection over varying time windows. PloS one, 12(1):e0168344, 2017.
 [32] S. Kwon, M. Cha, K. Jung, W. Chen, et al. Prominent features of rumor propagation in online social media. In ICDM, 2013.

[33]
N. Lawrence.
Gaussian process latent variable models for visualisation of high dimensional data.
In NIPS, 2004. 
[34]
N. Lawrence.
Probabilistic nonlinear principal component analysis with Gaussian process latent variable models.
JMLR, 6(Nov):1783–1816, 2005.  [35] P. Liang, S. Petrov, M. Jordan, and D. Klein. The infinite PCFG using hierarchical Dirichlet processes. In EMNLPCoNLL, 2007.
 [36] X. Liu, A. Nourbakhsh, Q. Li, R. Fang, and S. Shah. Realtime rumor debunking on twitter. In CIKM, 2015.

[37]
J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K.F. Wong, and M. Cha.
Detecting rumors from microblogs with recurrent neural networks.
In IJCAI, 2016.  [38] J. Ma, W. Gao, Z. Wei, Y. Lu, and K.F. Wong. Detect rumors using time series of social context information on microblogging websites. In CIKM, 2015.
 [39] J. Ma, W. Gao, and K.F. Wong. Detect rumors in microblog posts using propagation structure via kernel learning. In ACL, 2017.
 [40] J. Ma, W. Gao, and K.F. Wong. Rumor detection on twitter with treestructured recursive neural networks. In ACL, 2018.
 [41] C. Mavroforakis, I. Valera, and M. Gomez Rodriguez. Modeling the dynamics of online learning activity. In WWW, 2017.
 [42] J. Mcauliffe and D. Blei. Supervised topic models. In NIPS, 2008.
 [43] A. McCallum, A. CorradaEmmanuel, and X. Wang. Topic and role discovery in social networks. In IJCAI, 2005.
 [44] D. Mimno and A. McCallum. Expertise modeling for matching papers with reviewers. In KDD, 2007.
 [45] D. Mimno and A. McCallum. Topic models conditioned on arbitrary features with Dirichletmultinomial regression. UAI, 2008.
 [46] D. Mocanu, L. Rossi, Q. Zhang, M. Karsai, and W. Quattrociocchi. Collective attention in the age of (mis) information. Computers in Human Behavior, 51:1198–1204, 2015.
 [47] A. Nguyen, A. Kharosekar, S. Krishnan, S. Krishnan, E. Tate, B. Wallace, and M. Lease. Believe it or not: Designing a humanAI partnership for mixedinitiative factchecking. In UIST, 2018.
 [48] N. P. Nguyen, G. Yan, M. T. Thai, and S. Eidenbenz. Containment of misinformation spread in online social networks. In WebSci, 2012.
 [49] J. Paisley, C. Wang, and D. Blei. The discrete infinite logistic normal distribution. Bayesian Analysis, 7(4):997–1034, 2012.
 [50] V. Qazvinian, E. Rosengren, D. R. Radev, and Q. Mei. Rumor has it: Identifying misinformation in microblogs. In EMNLP, 2011.
 [51] D. Ramage, D. Hall, R. Nallapati, and C. Manning. Labeled LDA: A supervised topic model for credit attribution in multilabeled corpora. In EMNLP, 2009.
 [52] K. Rapoza. Can ‘fake news’ impact the stock market, Feb 2017.
 [53] H. Rashkin, E. Choi, J. Y. Jang, S. Volkova, and Y. Choi. Truth of varying shades: Analyzing language in fake news and political factchecking. In EMNLP, 2017.
 [54] M. RosenZvi, T. Griffiths, M. Steyvers, and P. Smyth. The authortopic model for authors and documents. In UAI, 2004.
 [55] G. Roussas. An introduction to measuretheoretic probability. Academic Press, 2014.
 [56] M. Seeger, C. Williams, and N. Lawrence. Fast forward selection to speed up sparse Gaussian process regression. In AISTATS, 2003.
 [57] J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639–650, 1994.
 [58] E. Snelson and Z. Ghahramani. Sparse Gaussian processes using pseudoinputs. In NIPS, 2006.
 [59] M. Tambuscio, G. Ruffo, A. Flammini, and F. Menczer. Factchecking effect on viral hoaxes: A model of misinformation spread in social networks. In WWW, 2015.
 [60] Y. W. Teh, M. Jordan, M. Beal, and D. Blei. Sharing clusters among related groups: Hierarchical Dirichlet processes. In NIPS, 2005.
 [61] Y. W. Teh, K. Kurihara, and M. Welling. Collapsed variational inference for HDP. In NIPS, 2008.
 [62] M. Titsias. Variational learning of inducing variables in sparse Gaussian processes. In AISTATS, 2009.
 [63] M. Titsias and N. Lawrence. Bayesian Gaussian process latent variable model. In AISTATS, 2010.
 [64] R. M. Tripathy, A. Bagchi, and S. Mehta. A study of rumor control strategies on social networks. In CIKM, 2010.
 [65] S. Tschiatschek, A. Singla, M. Gomez Rodriguez, A. Merchant, and A. Krause. Fake news detection in social networks via crowd signals. In WWW, 2018.
 [66] N. Vo and K. Lee. The rise of guardians: Factchecking URL recommendation to combat fake news. In SIGIR, 2018.
 [67] S. Vosoughi, D. Roy, and S. Aral. The spread of true and false news online. Science, 359(6380):1146–1151, 2018.
 [68] M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
 [69] S. Wang, X. Hu, P. Yu, and Z. Li. MMRate: inferring multiaspect diffusion networks with multipattern cascades. In KDD, 2014.
 [70] K. Wu, S. Yang, and K. Q. Zhu. False rumors detection on sina weibo by propagation structures. In ICDE, 2015.
 [71] L. Wu and H. Liu. Tracing fakenews footprints: Characterizing social media messages by how they propagate. In WSDM, 2018.
 [72] F. Yang, Y. Liu, X. Yu, and M. Yang. Automatic detection of rumor on sina weibo. In KDD, 2012.
 [73] Z. Zhao, P. Resnick, and Q. Mei. Enquiring minds: Early detection of rumors in social media from enquiry posts. In WWW, 2015.