This paper presents an unsupervised, language-agnostic solution that separates collections of social media posts that reflect different stances (we call beliefs) on a given polarizing topic. We focus on scenarios where the underlying social groups disagree on some issues but agree on others. Hence, we say that their beliefs overlap.
, we mean that our approach does not need prior training, labeling, or remote supervision (in contrast, for example, to deep-learning solutions[24, 33, 45] that usually require labeled data). By language-agnostic, we mean that the approach does not use language-specific prior knowledge [22, 32], distant-supervision [43, 47], or prior embedding [33, 24]. While we test the solution only with English text, we conjecture that the unsupervised nature of the work will facilitate its application to other languages,111Except languages that do not have spaces between words (such as Chinese and Japanese) because we do use tokenization and expect spaces as token separators. offering an out-of-the-box system that does not need to be retrained for new domains, jargon, or hash-tags. In that sense, to the authors’ knowledge, ours is the first unsupervised, language-agnostic solution to the problem of belief separation in the case when the beliefs in question partially overlap. The work is a significant generalization of approaches for polarization detection (e.g., [2, 5, 10, 12]), that identify opposing positions in a debate but treat neutral text as “noise” or “irrelevant”. In contrast, in this paper, we think of text agreed upon by multiple parties (e.g., neutral text) as an inherent part of their belief structure, and optimize the formulation of our belief discovery problem accordingly. More specifically, we postulate a general hierarchical belief structure (that allows for overlap in beliefs), and explicitly look for social media posts that fall into each of its different belief (overlap) components.
For example, during the mysterious disappearance of Malaysia Airlines flight MH370 in 2014, narratives, such as “plotted suicide” versus “northern landing”, emerged as potential hypotheses explaining what people believed. The underlying communities agreed on many circumstances of flight disappearance, but disagreed on other details and conclusions. Some narratives implied that passengers may still be alive, while others posited their death. Our novel non-negative matrix factorization algorithm automatically identifies narratives specific to each belief as well as points of agreement among them.
The unsupervised problem addressed in this paper is different from unsupervised techniques for topic modeling [23, 31] and polarity detection [8, 2]. Prior solutions to these problems aim to find orthogonal topic components  or conflicting stances . In contrast, we aim to find components that adhere to a given (generic) overlap structure. Moreover, unlike solutions for hierarchical topic decomposition [47, 52], we consider not only message content but also user attitudes towards it, thus allowing for better separation (because posts that share a specific stance will likely appeal to and be retweeted/liked by the same people).
In solving our problem, there are two main reasons for opting for an unsupervised approach. First and foremost, labelling is expensive for social networks. A successful supervised model is usually label intensive, which requires much human effort. Second, labeled data from one scenario might not help with another because the domains are different. For example, it is not clear that labeling the different theories on the reason for MH370 flight disappearance will help automatically separate different attitudes, say, regarding causes of global warming. Even in the same domain, transfer might be inexact. For example, in modern discussions of gender identity and related actual/proposed legislation, “liberal beliefs” in one environment might be viewed as “conservative” in another. In contrast, our unsupervised approach does not hold any such preconceptions regarding labels, allowing it to better group latent beliefs of the population at hand, exploiting only lexical similarity between posts, as well as similarity in human response to them.
The work was evaluated using both synthetic data as well as real-life data sets, where it was compared to approaches that detect polarity by only considering who posted which claim 
, approaches that separate messages by content or sentiment analysis[41, 54], and approaches that base belief separation on the social interaction graph that identifies different communities [15, 50]. The results of this comparison show that our algorithm significantly outperforms the state of the art. An ablation study further illustrates the impact of different design decisions on accomplishing this improvement.
The rest of the paper is organized as follows. Section II highlights our motivation, intuitions, and solution approach. Section III presents used notations and formulates the problem. Section IV proposes our new belief structured matrix factorization model, and analyses some model properties. Section V presents a two-fold experimental evaluation, where it compares the proposed model to six baselines and three model variants. We review the related literature on belief mining and matrix factorization in Section VI. The paper concludes with key observations and a statement on future directions in Section VII.
Ii Motivation, Intuition, and Solution Approach
Our work is motivated, in part, by the increasing polarization on social media . Individuals tend to connect with like-minded sources . Search algorithms further exploit such (potentially subconsciously expressed) preferences to recommend similar content, thereby reconfirming consumers biases and contributing to selective exposure to information. The above cycle was shown to produce echo-chambers  and filter bubbles . Tools that could automatically extract social beliefs, and distinguish points of agreement and disagreement among them, might help generate future technologies (e.g., less biased search engines) that summarize information for consumption in a manner that gives individuals more control over (and better visibility into) the degree of bias in the information they consume.
Those more cynical of human nature might recognize that our proposed solution is dual-use. A better understanding of key points of contention versus agreement in a remote conflict can, in principle, significantly improve a nation’s ability to conduct what has recently been termed (in military circles) as information environment operations [38, 16].
To describe polarized systems, in this paper, we make frequent use of the terms beliefs and narratives. Beliefs (describing group positions) may overlap in agreement regions. In a conflict between two sides, and , one can speak of beliefs of side , beliefs of side , and beliefs at the overlap . Narratives are collections of posts that fit in each belief overlap region. There are three regions to consider in the above example: narratives specific to (i.e., region ), narratives specific to (i.e., region ), and narratives that reflect the shared agreement (i.e., region ).
The closest work to ours is the polarization detection approach of Al-Amin et al. . It uses non-negative matrix factorization (NMF) to separate biased posts. It starts with a matrix, , whose dimensions are sources and posts. The element at coordinate () is a binary value that represents whether source endorsed (e.g., retweeted, liked, etc) post
or not. Sources in this matrix are seen as vectors of latent beliefs, where each component of the vector represents the degree to which the individual adopts the corresponding belief. Each post is related to the latent beliefs by another vector, where each component specifies how strongly an adopter of the corresponding belief might endorse such a post. Hence, the “who said what” matrix,, is decomposed into (i) one that relates sources (the “who” dimension) to latent beliefs and (ii) one that relates those latent beliefs to posts (the “what” dimension). Our approach offers three key innovations over the above baseline treatment:
Accounting for belief overlap in matrix factorization: Since we assume that beliefs overlap, different beliefs, by definition, may lead to the emission of some of the same posts. Hence, components of matrix factorization should not be orthogonal. We develop a novel non-negative matrix factorization algorithm that uncovers components that follow a specified belief overlap structure. It relies on a new latent belief mixture matrix that represents the generic belief overlap structure (without specifying what posts fall into each overlap region). Our factorization algorithm then determines the latent beliefs of each source and the specific allocation of posts to different regions of belief overlap.
Interpreting silence: A key challenge in belief separation algorithms that derive beliefs from posts is to properly interpret silence. Say, source did not retweet/like post . In other words, . Does that imply that they do not agree with its content? Or, is it that they simply did not see the post or did not have a chance to respond? In general, we do not know how to interpret silence. For that reason, our matrix factorization algorithm does not start with a matrix of binary values. Rather, we check if source posted or endorsed a statement similar to . If so, we consider the likelihood that source would have endorsed claim
as well. The resulting probability matrix, say, is what gets factorized. We show that factorizing yields much better results than factorizing the binary matrix (because the former interprets silence better).
Accounting for proximity in the social graph: We further conjecture that beliefs of individuals who are close in some social graph are probably close. An example of such a graph is a retweet graph, say, . We therefore define a new matrix , where each element, , represents the degree of endorsement of source to content of post based not only on ’s posts but also on posts of ’s neighbors in graph . We show that factorizing yields even better results than factorizing .
Accordingly, our matrix factorization algorithm is new and different in that it (i) explicitly accounts for structured overlapped beliefs, (ii) better interprets silence, and (iii) better accounts for proximity of users in the social graphs. Next, we formulate the new problem and describe the solution more formally.
Iii Problem Formulation
Let be the set of sources in our data set, and be the set of claims made by those sources. We say that a claim is made by the source if the source posts it, retweets it, or “likes” it, which suggests that the source endorses the content. Let matrix , of dimension , be the matrix of binary entries denoting who endorsed what. If source posted, retweeted, or liked claim , then , otherwise .
Let matrix , of dimension , denote the similarity structure between claims, by some similarity measure. Hence, each element, , denotes how similar claims and
are. To keep our approach simple and language agnostic, as we detail later, we choose a similarity metric that considers only lexical (i.e., bag of words) overlap between the respective claims. No training or natural language processing is used. We defer use of other similarity measures, such as proximity in some semantic embedding space, to future work.
Finally, let matrix , of dimension , denote the social graph. Each entry, , denotes the influence of user on user . is thus the adjacency matrix of a social network graph, either directed or undirected. In this paper, we construct by calculating the frequency of each source retweeting original posts of source . We call it the retweet graph.
We further assume that sources are divided by latent belief. For example, in a polarized scenario, there may be two different beliefs, and , leading to up to three regions to which a narrative may belong: narratives exclusive to , narratives exclusive to , and narratives that fall in the overlap shared by both and , as shown in Figure 1. Figure 2 shows an actual example of tweets in such a scenario. The tweets describe Jamala, the winner of Eurovision 2016 (a song competition), after a controversial competition outcome. Post 1 and 5 are agreed upon by both sides of the controversy. They belong in the overlap region. Other posts include components that are either distinctly pro (green) or distinctly against (blue) the winner/song, representing different attitudes of conflicting groups regarding the competition outcome.
The problem of belief separation addressed in this paper is the following. Given the set of claims and their sources , related by the source-claim matrix , and given the corresponding claim similarity matrix, , and social dependency matrix, , allocate the claims to one of the belief overlap regions specified by a belief mixture matrix, (which can be thought of as encoding a Venn diagram of the beliefs in the question). Next, we describe our solution to this problem.
Iv Methodology: Matrix Factorization with Belief Mixture
Consider a scenario with two beliefs and that share a region of overlap, . Let denote the region of outside the overlap. We assume that sources are polarized, forming overlapped belief regions . Some adopt belief , denoted , some adopt , denoted , and some are neutral with respect to the conflict between these two, denoted . We assume that claims can also be divided into, however, disjoint belief regions , depending on the belief they espouse. The probability that source endorses claim is denoted by . By endorsement
we mean that the source finds the content agreeable with their belief. This probability depends on the belief of the source and the belief espoused by the claim. Note that, while source actions such as posting, retweeting, or liking a claim represent endorsement, absence of such actions does not necessarily mean disagreement. Indeed, a key challenge is to estimate endorsement in the case of silence.
Iv-a A Generative Model of Mixed Beliefs
Let denote the probability that source (i.e., is of belief ). Similarly, let denote the probability that claim espouses belief
. Following the law of total probability,is:
In our generative model, we assume that sources of a given belief endorse claims espousing that belief. We also assume that sources neutral to the conflict between and endorse claims in the overlap, , only. Finally, we assume that when a source holds a belief that is different from that espoused by a claim, the source does not endorse that claim. Hence, if belief region contains region (source and claim are from the same belief or claim belongs to the overlap and so is endorsed by all). Otherwise,
Let and . Let and be the corresponding vectors, with elements ranging over values of and respectively. Thus, we get:
We call the belief mixture matrix. Different generative model assumptions will lead to a different mixture matrix . In general, a at coordinate in means that sources that hold belief generally endorse claims that espouse belief . Our algorithm is not limited to a specific . Equation (3) simply represents an important common case and the running example of this paper.
Let the matrix be the matrix of probabilities, , expressed by our generative model, such that element . Thus, , where is a matrix whose elements are and is a matrix whose elements are .
Factorizing , given , would directly yield and , whose elements are the probabilities we want: matrix yields the probabilities that a given source belongs to a belief region , whereas matrix yields the probabilities that a claim belongs to a belief region . These probabilities trivially serve as the basis for source and claim classification by belief (just take the highest probability belief). Unfortunately, we do not have matrix . We instead have the observed source-claim matrix that is merely a sampling of what the sources actually endorse. Next, we describe how we estimate from , given the claim similarity matrix and social graph .
Iv-B Message Similarity Interpolation (M-module)
Our first attempt to estimate matrix is to compute an approximation, denoted . The approximation, , is developed as follows. First, if a source posted, retweeted, or liked claim in our data set (i.e., in matrix ), then we know that the source endorses that claim (i.e., in matrix too). The question is, what to do when ? In other words, we need to estimate the likelihood that the source endorses a claim, when no explicit observations of such endorsement were made. We do so by considering the claim similarity matrix . If source was observed to endorse claims similar to , then it will likely endorse with a probability that depends on the degree of similarity between and . Thus, when , we can estimate from:
To compute matrix , in this work, we first compute a bag-of-words (BOW) vector for each claim . We then normalize it using vector L2-norm, . We select non-zero entries in each row of as medoids . We assume that claims close to any of the medoids could also be endorsed by as well. Based on that, we use:
in Equation (4). A Gaussian radius function is used for . If the resulting value of is less than 0.2, we regard that it is far from all of the medoids and set it back to 0.
Iv-C Social Graph Convolution (S-module)
To further improve our estimation of matrix , we consider the social dependency matrix . This results in an improved estimate, we call matrix .
The fundamental insight we’d like to leverage is that users that are close in the social graph, , are likely to endorse the same claims, even if an explicit endorsement was not observed in the data set. Thus, we consider the social dependency matrix (user-user retweet frequency) and compute the a degree matrix by summing each row of and the random walk normalized adjacency is denoted as . We define our propagation operator based on with a self-loop re-normalization, . Thus, the new source-claim network is given by,
where each row of adds up to 1. The effect of the propagation operator is to convolve the information from 1-hop neighbors, while preserving half of the information from itself. Note that, we deem dependency beyond 1-hop too weak to capture, so we do not consider , where . From a macroscopic perspective, this social graph convolution recovers some of the possible source-claim connections and also enforces the smoothness of matrix .
Iv-D Overall Factorization Loss and Optimization
Given a belief mixture matrix, , we now factorize to estimate matrices and that decide the belief regions associated with sources and claims, respectively. (e.g., the estimated belief for claim is given by the index of maximum entry in the row of ).
Regularization. To avoid model overfitting, we include widely used L2-regularization. Also, we enforce the sparsity of and by introducing L1-norm. The overall objective function becomes (defined by the Forbenious-norm),
With the help of matrix trace function , we can rewrite as the following form,
We minimize by gradient descent. Since only the non-negative region is of our interests, derivatives of L1-norm are differentiable in this setting. By referring to gradient of traces of product with constant matrix , and , the partial derivative of with respect to and are calculated as,
The gradient matrix is of dimension , and is of dimension . Estimation step begins by updating and , and is a constant step size. The negative values might appear in the learning process, and they are physically meaningless in this problem. Thus, we also impose the non-negative constraints for and during the update. A modified ReLU-like function is utilized: when any entry of or becomes negative, it is set to be . In the experiment, we set . Note that the initial entry values of and are randomized uniformly from . The overall BSMF algorithm is presented in Algorithm 1.
The basic idea of our proposed method is to decompose , where , denoted as , can be viewed as new bases (in the terminology of non-negative matrix decomposition ) and is the learned parts-based representation. In this section, We analyze our model, especially the effect of belief mixture matrix , from a geometric perspective and show its superiority in terms of the search space. We also briefly report the computational complexity of BSMF.
In our problem setting, if we leave out the belief mixture matrix (i.e., let
be the identity matrix of dimension), then the factorization seeks to decompose a non-negative matrix , into:
where is and is , and both and are non-negative. We call it the standard NMF.
Geometric Interpretation of Belief Space. For standard NMF, the columns of , denoted as , are belief bases in . The rows of , can be viewed as coordinates in that belief space. Consider column for one of the claims . It takes the form:
Equation (10) could be interpreted that each claim consists of different belief “parts” , and the coordinates, , allocate the intensity for these belief “parts” accordingly.
Let us now cast it into algebraic geometry. The belief “parts” of standard NMF generate a simplicial cone geometrically, defined by:
Similarly, note that the form of BSMF is , thus the generated simplicial cone is accordingly,
In such -dimensional simplices, claims can be imagined as points whose coordinates are given by the intensity values aligned with the bases. The non-negativity means that they lie in the positive orthant of . All messages in form a cloud of data points. The belief separation problem can thus be abstracted to: find a good enough simplicial cone in that orthant that contains the whole data cloud while simultaneously estimate the coordinates of data points, so as to find an optimal separation. Let us start to analyze the search space of factorization in a scenario with two beliefs and that share a region of overlap, .
Analysis of Search Space. Note that, the space spanned by represent latent reflections of user belief regions , that overlap, while data points (i.e., claims) are samples from the latent claim belief regions, . Standard NMF separates data purely based on numerical statistics and treats all the bases equivalently, which leads to a separation in a misaligned search space. By imposing a mixture matrix, , our BSMF potentially transforms into a new space , where learns independent positive bases and encodes on the top. As is shown in Figure 3, the transformation, due to the structure of , leads to: (i) a smaller search space, and (ii) better prior knowledge adoption. Thus, the effect of is to disentangle the latent data manifold and limit the search space, based on the constrained (i.e., specified) structure of the belief matrix.
Complexity. Though we use post similarity (in Section IV-B) and social convolution (in Section IV-C) to estimate , the non-zero entries in the estimated matrix are still far fewer than . We consider to use sparse matrix multiplications and avoid dense intermediate matrices, which makes the computation efficient. Note that, is empirically picked according to the dataset, and it typically satisfies . During the estimation, we generalize standard NMF multiplicative update rules  for our tri-factorization,
Theoretically, updating and takes per iteration. We could also take the advantages of the structure of , and reduce the complexity to , identical to typical NMF. The number of iterations before the empirical convergence is usually no more than 200 for random initialization, and thus we claim that our model is salable and efficient.
In this section, we first evaluate the approach on a synthetic dataset, where we compare our model to standard NMF and NMTF algorithms. In NMF, the mixture matrix becomes the identity matrix, whereas in NMTF it is treated as an unknown to be learned during the decomposition.
We also apply our BSMF to real-world Twitter datasets with multiple overlapping beliefs. Our model is compared to six different baselines and three model variants. The empirical results show that BSMF beats the baselines substantially.
V-a Synthetic Data
V-A1 Dataset Construction
In order to theoretically evaluate our model, we build a synthetic dataset222https://github.com/ycqcpsup/narrative-detection for a 4-belief case , in which three different beliefs exist that intersect in a common overlap region, . For each belief, we build: (i) a non-overlapping word-level corpus, and (ii) a group of users adopting that belief. Users in the overlap region, , pick words from the word corpus. Users in each of the -groups generate messages using vocabulary specific to their belief corpus, including the overlap corpus. In sum, 400 users and 4000 messages were created. The labels of posts are annotated according to the group the user belongs to. To determine claim similarity, keyword-level similarity measures (from Section IV-B) are used. We do not impose any social relations. Instead, we use the identity matrix for .
V-A2 Method Comparison
For this experiment, the factorization starts from matrix , and takes the belief structure of . Two simpler variants are introduced: (i) the first variant substitutes with an identity matrix, and takes a standard NMF formulation ; (ii) the second variant substitutes with an learnable matrix , which takes a standard non-negative matrix tri-factorization (NMTF) form, . Compared to NMF, NMTF obviously possesses more freedom by introducing a learnable matrix . We use this variant to investigate whether the latent belief structure could be learned from pure data statistics (i.e., whether after the estimation). In NMTF, we use a gradient descent algorithm for ,
We use the same settings: L1- and L2-regularization for NMF, NMTF and our BSMF to make sure that each method gives sparse results without overfitting. Empirically, after iterations, all three methods reach convergence. The predicted labels for each message are then given by the index of the max value in this final representation from .
V-A3 Results of 200 Rounds
We run each model for 200 times and report the clustering accuracy in Figure 4. The figure shows that all of these matrix factorization methods can achieve accuracy and that BSMF consistently outperforms NMF and NMTF. The average results for these three models are , , , respectively. From this result, we could tell that under certain constraints, the additional freedom endowed by makes NMTF more flexible and powerful than NMF in this scenario. However, NMTF cannot precisely capture the belief structure from data. Hence, it remains inferior to our proposed solution.
V-A4 Visualization of Results
To compare these methods in more depth, we visualize the final and color messages based on ground-truth labels. In Figure 5 and Figure 6, we project the estimated onto a 3-D space. Each data point represents a message colored by belief. In each figure, all of the data points seem to lie in a regular tetrahedron (should be regular -polyhedron for more general -belief cases). It is interesting that for NMF, most of the data points cluster around the upper corner. It is obviously difficult to draw a boundary for the crowded mass. NMTF works a little bit better; data points with different beliefs stretch apart, making their beliefs more separable. We also visualize the learned
, and it turns out to be an SVD-like diagonal matrix, which means that pure NMTF only learns the independent variances aligned with the basis of each belief.
Overall, NMF and NMTF cannot reveal the latent data structure. The manifold still exists. The projection result of BSMF is surprising: data points are evenly located and grouped by colors. They approximately form a regular tetrahedron. We hypothesize that in the four-dimensional space, data points should be perfectly aligned with one of the belief bases/parts, and these four bases are conceivably orthogonal in that space. In a word, our model disentangles the latent manifold in the data and leads to a perfect separation for multiple beliefs.
V-B Real-world Twitter Datasets
We also evaluate our algorithm in the context of social events on Twitter. Three datasets are investigated: (i) Eurovision2016 is borrowed from , (ii) Brexit & May and (iii) Global Warming are crawled in real time with the Apollo Social Sensing Toolkit333http://apollo2.cs.illinois.edu/ (statistics of whole datasets is shown in Table I). Users on Twitter are regarded as sources and tweets as claims. Thus, the source-claim network is constructed to reflect who posts/retweets which tweet. A social dependency matrix is generated as the retweet graph.
|Dataset||# sources||# claims||# all tweets||# retweets|
|Brexit & May||8,430||4,468||10,254||7,318|
Baselines. We carefully select six baselines methods that encompass different perspectives on belief separation:
Random labelling is a trivial baseline. It annotates posts randomly by belief, giving equal probability to each label. Other baselines should at least beat this.
Sentiment140 and SANN  are content-aware solutions based on language or sentiment models. In the implementation, each of the claims is a query through Sentiment140 API that respond with a polarity score: 0, 2 or 4. For SANN, it outputs three polarity labels, which we regard as three clusters.
H-NCut [57, 29], where the bipartite structure of the source-claim network is formulated as a hypergraph. Claims are nodes and sources are hyperedges. Our problem is thus viewed as a hypergraph/overlapping community detection problem, where community nodes represent posts. The detected overlapping communities are thus interpreted as a grouping of posts by (overlapping) beliefs. We implement H-NCut
, a hypergraph normalized cut algorithm from spectral clustering perspective,
Polarization  is a relevant baselines that uses an NMF-based solution for social network belief extraction to separate biased and neutral claims.
NMTF is a baseline with a learnable mixture matrix. We compare our model with it to demonstrate that pure learning without a prior is not enough to unveil the true belief overlap structure in real-world applications.
Different variants of BSMF are further evaluated to verify the effectiveness of message similarity interpolation (the M-module) and social graph convolution (the S-module). The final version is called BSMF. It incorporates both modules. Models without the M-module or the S-module are named BSMF-M and BSMF-S, respectively, while BSMF-MS denotes running our algorithm without either module.
Evaluation Metrics. In this work, we aim to automatically extract overlapping beliefs. Multi-classification metrics are employed. We use the Python scikit-learn
package to help with the evaluation. Macro-evaluation simply calculates the mean of the metrics, giving equal weight to each class. It is used to highlight model performance of infrequent classes. Weighted-metrics account for class imbalance by computing the average of metrics in which each class score is weighted by its presence in the true data sample. Standard precision, recall and f-score are considered in both scenarios. Note that weighted averaging may produce an f-score that is not between precision and recall.
The first real-world dataset is about a Ukrainian singer, Susana Jamaladinova (Jamala), who won the annual European song contest in 2016. Her success was a surprise to many as the expected winner had been from Russia according to pre-competition polls. The winning song was controversial. It told a story of deportation of Crimean Tatars by Soviet Union forces in the 1940s. Political songs are not permitted in Eurovision. Tweets related to Jamala were collected within five days after the contest. Basic statistics are reported in Table I. The most popular 1000 claims were manually annotated. They were separated into 600 pro-Jamala, 239 anti-Jamala, and 161 neutral claims.
|Eurovision 2016||Brexit & May|
Result of Eurovision2016. In this scenario, we use the belief mixture matrix from Equation (3). We consider the top 100 claims predicted for each belief. The results are shown in Table II. It is not surprising that all baselines beat Random. Overall, matrix factorization methods work well for this problem, especially our method. When used with both the M-module and the S-module, our BSMF algorithm ranks the first in all metrics. Among other baselines, Sentiment140 and SANN work poorly for this problem, because (i) they use background language models that are pre-trained on another corpus; and (ii) they do not user dependency information, which matters in real-world data. H-NCut also yields weak performance. H-NCut first flattens the underlying hypergraph structure into a weighted single graph, then conducts a normalized graph cut. We highly suspect that during the flattening process, much information is lost. We also notice that the NMF-based algorithm actually outperforms NMTF. The reason might be that, for real-world data, the latent belief structure is harder to capture, and NMTF could be trapped in poor local minima.
Table III shows the top 3 tweets from each belief estimated by our model. Note that, due to an update of the Twitter API, the crawled text field is truncated to 140 characters. Our algorithm runs on the text within that range only. For human readability and interpretability, however, we manually fill in the rest of the tweet, showing that additional text in yellow (the same for Table IV and V). It can be seen that the algorithm does a good job at belief separation. Note that, the labels shown in the first column, called Beliefs are inserted manually after the fact (and not by our algorithm). Our algorithm merely does the separation.
|Neutral||BBC News - Eurovision Song Contest: Ukraine’s Jamala wins competition https://t.co/kL8SYOPOYL|
|Parents of ”#Ukrainian” Susana #Jamaludinova - @Jamala are #Russian citizens and prosper in the Russian #Crimea|
|A politically charged ballad by the Ukrainian singer Jamala won the @Eurovision Song Contest http://nyti.ms/1qlmmNs|
|Pro-Jamala||@jamala congratulations! FORZA UKRAINE!|
|@DKAMBinUkraine: Congratulations @jamala and #Ukraine!!! You deserved all the 12 points from #Denmark and the victory, #workingforDK|
|@NickyByrne: Well done to Ukraine and @jamala|
|Anti-Jamala||jamala The song was political and agaisnt The song contest rules shows NATO had influence on jury decision.|
|@VictoriaLIVE @BBCNews @jamala Before voting we rated it worst song in the contest. Not changed my mind.|
|@JohnDelacour So @jamala has violated TWO ESC rules - the song is not new, and it includes political content. Result MUST be annulled|
V-B2 Brexit & May
This dataset is about the withdrawal of the United Kingdom from the European Union and the discussion about the former UK prime minister, Theresa May, on Twitter. May was working on UK withdrawal for about two years. In January 2019, a vote was held on the specific withdrawal agreement in the UK house of commons. It was defeated by 432 votes to 202. We collect tweets related to Brexit and Theresa May for five consecutive days after that vote, and the total number of crawled tweets is 109,010. The beliefs are generally summarized as neutral, pro-May, and anti-May. The most popular 400 tweets were read and manually labeled for evaluation.
|Neutral||UK PM May’s Political Spokesman: Our objective to have an independent trade policy post-Brexit and that is not compatible with being in a customs union with EU|
|May cancelled her Brexit talks with Scotland and Wales. Can this get any more farcical? Probably. #DissolveTheUnion #ipv6|
|Fine but let’s just hope Mrs May sticks to her word. https://t.co/3CDhLkwCi7.|
|Pro-May||May on her feet in Commons shortly - she ’ll say more work to do on backstop, but hearing also she might scrap the fee for EU nationals who want to stay after Brexit, and promise select committees and other parties a bigger role in second phase of the negotiations|
|May is RIGHT to reject “Will of the House“ in favour of “Will of the People” Taking No Deal off table will tie us to EU for ever: it’ll then have no INCENTIVE to agree FAIR deal Stop Brexit & ensuing civil unrest will make French Yellow Vest protests look like Teddy Bears’ Picnic|
|May is planning to whip Tory MPs to vote against the ’no deal’ amendment, to keep ’no deal’ on the table. Let’s see if those 40 ministers who said they would resign will do so|
|Anti-May||It seems Theresa May is only interested trying to hold her own party together - at virtually any cost to the whole country.|
|@jeremycorbyn says May is wasting £171,000 an hour of taxpayers’ money on dangerous #nodeal brinkmanship https://t.co/7zzx8DBrnk|
|@Olgachristie It must be obvious to #TheresaMay that she is never going to unite the country. The only way she can possibly save what’s left of her career is to deliver the #Brexit we voted for by leaving the #EU without a trade deal and saving ourselves £39billion.|
Result of Brexit & May. In this scenario, we continue to use the belief matrix from Equation (3). Measurements are based on the top 25 claims predicted for each belief. Results are reported in Table II, demonstrating again that our approach outperforms the state of the art. Sample tweets from each narrative/belief are presented in Table IV. As before, observe the separation of claims into a neutral (overlap) subset and more biased (pro/anti) belief groups.
V-C Generalizing the Belief Structure
|Australia’s top scientists urge government to do more on global warming https://t.co/NclFqGKXE1|
|: Global Warming / urge response||Australia’s most prestigious scientific organisation has added to growing pressure on Prime Minister Scott Morrison over climate change policy, calling on the government to ”take stronger action” in response to the bushfire crisis|
|”Have we now reached the point where at last our response to global warming will be driven by engineering and economics rather than ideology and idiocy?” #auspol|
|As long as the ALP keep accepting ‘donations’ (bribes) from the climate change deniers the fossil fuel industry, who spent millions and millions spreading lies about global warming @AlboMP, they have zero creditibility when they talk about phasing out fossil fuel #auspol|
|: Global Warming / fossil fuel||To mitigate the effects of climate change, we must do away with fossil fuel burning as they are the major contributors of global warming.|
|Turnbull: “The world must, and I believe will, stop burning coal if we are to avoid the worst consequences of global warming. And the sooner the better.” Malcom Turnbull, The Guardian 12 January #ScottyfromMarketing|
|That time when The Australian misrepresented @JohnChurchOcean to say sea level rise wasn’t linked to global warming. After I wrote about it, they pulled the story.|
|: Global Warming / sea level||Brave global warming researchers are studying sea level rise in the Maldives this morning. https://t.co/aqGtgXAj2t|
|CO2 is a magical gas which causes Lake Michigan water levels to both rise and fall https://t.co/8FrC1Cx2Rm|
|CLIMATE’S FATAL FLAW : ‘Greenhouse Gases Simply Do Not Absorb Enough Heat To Cause Global Warming’ – “New data and improved understanding now show that there is a fatal flaw in greenhouse-warming theory.”|
|: No Global Warming||“Three new research studies confirm that geothermal heat flow, not man-made global warming, is the dominant cause of West Antarctic Ice Sheet (WAIS) melting,” writes geologist James Edward Kamis.|
|Left Media talks about.. Climate Change, Global Warming, But… Jihad is reason for recent Forest Fires in Australia !|
In this section, we offer preliminary evidence to show that the approach generalizes to more complex belief structures, besides the simple case of the belief mixture matrix given by Equation (3). Consider a scenario with a majority belief, , and a minority belief, . The two beliefs have no overlap. Since more data is expected on
(by definition of majority), we opt to (hierarchically) classify it further. Hence, we postulate that supporters ofhold opinions, and , about what to do next. We call the overlap between them, . The corresponding belief structure is reflected by the belief mixture matrix, , where rows represent source beliefs , , , and , and columns represent the corresponding post stances, respectively. For example, the first column means that posts of stance may be generated by sources of belief , , , but not . We apply this belief structure to a Twitter discussion of global warming in the wake of recent Australia wildfires that ravaged the continent, since September 2019, where at least 17.9 million acres of forest have burned in one of worst fire seasons on record. Our goal is to identify and separate posts according to the above abstract belief structure.
Results are very interesting. The best fit of online discussions to the aforementioned belief matrix, as determined automatically by our unsupervised algorithm, is shown in Table V. The first column shows the abstract belief categories , , , and , corresponding to the belief mixture matrix. While the algorithm allocates posts to categories based on the structure of matrix , we manually inspect after-the-fact the posts assigned to each category in the matrix, and give that category a human-readable name, accordingly (also shown in the first column). For each belief category, the table also shows statements that fall into the given category.
The table reveals that sources in our data set are polarized between group, , that believes in global warming and group that does not. Within group , there are three subgroups . The former (by construction of our belief matrix) is shared by believers of and generally includes statements that urge serious response to global warming. The latter two categories ( and ) are adopted by different subsets of that focus on specific, more concrete concerns regarding global warming; blames the fossil fuel industry, whereas is concerned with rising sea levels. The last category, is against the theory of global warming. Remember that our algorithm fits the data set to the abstract categories , , , and automatically, without human supervision. While we do not claim to have reached conclusions on global warming, the table shows a potential use of our belief structured matrix factorization solution. Namely, it can fit data sets automatically to arbitrary belief structures, thereby offering visibility into what individuals are concerned with, what actions they agree on, and what they disagree about. In future work, we shall further explore the application of our model to the disentangling of hierarchical belief structures.
Vi Related Work
Vi-a Belief Separation
The problem of belief mining has been a subject of study for decades [34, 32]. Solutions include such diverse approaches as detecting social polarization [2, 10], opinion extraction [24, 33, 43], and sentiment analysis [21, 22], to name a few.
Pioneers, like Leman at el.  and Bishan at el. , had used Bayesian models and other basic classifiers to separate social beliefs. On the linguistic side, many efforts extracted user opinions based on domain-specific phrase chunks , and temporal expressions . With the help of pre-trained embedding, like Glove  or word2vec 
, deep neural networks (e.g., variants of RNN[24, 33]) emerged as powerful tools (usually equipped with an attention module ) for understanding the polarity or sentiment of user-generated messages. In contrast to the above supervised or language-specific solutions, we consider the challenge of developing an unsupervised and language-agnostic approach.
In the domain of unsupervised algorithms, our problem is different from the related problems of unsupervised topic detection [23, 31], sentiment analysis [21, 22], and unsupervised community detection [15, 55]. Topic modeling assigns posts to polarities or topic mixtures , independently of actions of users on this content. Hence, they often miss content nuances or context that helps better interpret the stance of the source. For instance, in the Eurovision 2016 example, a tweet that says “Jamala won with a song that first aired in 2015” might be interpreted as pro-Jamala, unless one knows that songs in this competition need to be original; airing the song a year earlier should have disqualified it (i.e., the tweet is against the winner). Clustering content in part by user attitudes towards it (e.g., who likes/retweets it and who does not) leverages user behavior to properly classify such cases. Community detection [37, 19], on the other hand, groups nodes by their general interactions, maximizing intra-class links while minimizing inter-class links [50, 15], or partitioning (hyper)graphs [57, 29]. While different communities may adopt different beliefs, this formulation fails to distinguish regions of belief overlap from regions of disagreement.
The above suggests that belief mining must consider both sources (and forwarding patterns) and content. Prior solutions used a source-claim bipartite graph, and determined disjoint polarities by iterative factorization [2, 1]. Our work is novel by postulating a more generic and realistic view: social beliefs could overlap. In this context, we developed a new matrix factorization scheme that considers (i) the source-claim graph ; (ii) message word similarity [47, 14] and (iii) user social dependency [53, 30] in a new class of non-negative matrix factorization techniques.
Vi-B Non-negative Matrix Factorization
The work also contributes to non-negative matrix factorization. NMF was first introduced by Paatero and Tapper  as the concept of positive matrix factorization and was popularized by the work of Lee and Seung 
, who gave an interesting interpretation based on parts-based representation. Since then, NMF has been widely used in various applications, such as pattern recognition, signal processing , bioinformatics , geophysics , and economics .
Two main issues of NMF have been intensively discussed during the development of its theoretical properties: solution uniqueness [13, 25] and decomposition sparsity [36, 26]. By only considering the standard formula , it is usually not difficult to find a non-negative and non-singular matrix , such that and could also be a valid solution. Uniqueness will be achieved if and are sufficiently sparse or if additional constraints are included . Special constraints have been proposed in [35, 20] to improve the sparseness of the final representation. The non-negative condition in NMF was also proven to ensure a sort of sparseness naturally .
Non-negative matrix tri-factorization (NMTF) is an extension of conventional NMF (i.e., 
). Unconstrained NMTF is theoretically identical to unconstrained NMF. However, when constrained, NMTF possesses more degrees of freedom. NMF on a manifold emerges when the data lies in a nonlinear low-dimensional submanifold . Discriminant Sparse NMF , Manifold-respecting Discriminant NMF , and Manifold Regularized Discriminative NMF  were proposed with special constraints to preserve local invariance, so as to reflect the multilateral characteristics.
In this work, instead of including constraints to impose structural properties, we adopt a novel belief structured matrix factorization by introducing the mixture matrix . The structure of can well reflect the latent belief structure and thus narrows the search space to a good enough region.
In this paper, we proposed a new class of NMF, where the structure of parts is already known (or assumed to follow some generic form). Specifically, we introduced a belief mixture matrix , and proposed a novel Belief Structured Matrix Factorization algorithm, called BSMF, to separate overlapping beliefs from large volumes of user-generated messages. The factorization could be briefly formulated as , where is known. The results on synthetic datasets and three real-world Twitter events show that our algorithm could consistently outperform baselines by a great margin. We believe this paper could seed a research direction on automatically separating data sets according to arbitrary belief structures to enable more in-depth understanding of social groups, attitudes, and narrative on social media.
-  (2014) Quantifying political polarity based on bipartite opinion networks. In ICWSM, Cited by: §VI-A, §VI-A.
-  (2017) Unveiling polarization in social networks: a matrix factorization approach. In INFOCOM, Cited by: §I, §I, §I, §II, 4th item, §V-B, §VI-A, §VI-A.
-  (2011) Manifold-respecting discriminant nonnegative matrix factorization [j]. Pattern Recognition Letters. Cited by: §VI-B.
-  (2015) Exposure to ideologically diverse news and opinion on facebook. Science. Cited by: §II.
-  (2016) Users polarization on facebook and youtube. PloS one. Cited by: §I, §II.
Non-negative matrix factorization, a new tool for feature extraction: theory and applications. International Journal of Computers, Communications and Control. Cited by: §VI-B.
-  (2008) Non-negative matrix factorization on manifold. In ICDM, Cited by: §VI-B.
-  (2017) Unsupervised sentiment analysis with signed social networks. In AAAI, Cited by: §I.
Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons. Cited by: §VI-B.
-  (2011) Political polarization on twitter. In ICWSM, Cited by: §I, §I, §VI-A.
-  (2008) Analysis of financial data using non-negative matrix factorization. In International Mathematical Forum, Cited by: §VI-B.
-  (2011) Analyzing political trends in the blogosphere. In Fifth International AAAI Conference on Weblogs and Social Media, Cited by: §I.
-  (2004) When does non-negative matrix factorization give a correct decomposition into parts?. In NIPS, Cited by: §VI-B.
-  (2018) Octopus: an online topic-aware influence analysis system for social networks. In ICDE, Cited by: §VI-A.
-  (2016) Community detection in networks: a user guide. Physics reports 659. Cited by: §I, §VI-A.
-  (2013, Feb.) Value of science is in the foresight. Military-Industrial Kurier. External Links: Cited by: §II.
-  (2011) Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Transactions on Image Processing. Cited by: §VI-B.
-  (2007) Frequent pattern mining: current status and future directions. TKDD. Cited by: §VI-A.
-  (2017) Influence maximization by probing partial communities in dynamic online social networks. Transactions on Emerging Telecommunications Technologies. Cited by: §VI-A.
-  (2004) Non-negative matrix factorization with sparseness constraints. JMLR. Cited by: §VI-B.
-  (2013) Unsupervised sentiment analysis with emotional signals. In WWW, Cited by: §VI-A, §VI-A.
-  (2013) Listening to the crowd: automated analysis of events via aggregated twitter sentiment. In IJCAI, Cited by: §I, §VI-A, §VI-A.
-  (2018) Tools and approaches for topic detection from twitter streams: survey. Knowledge and Information Systems. Cited by: §I, §VI-A.
Opinion mining with deep recurrent neural networks. In EMNLP, Cited by: §I, §VI-A, §VI-A.
-  (2009) Non-negative matrix factorization: ill-posedness and a geometric algorithm. Pattern Recognition. Cited by: §VI-B.
-  (2008) Theorems on positive data: on the uniqueness of nmf. Computational intelligence and neuroscience. Cited by: §VI-B.
-  (1999) Learning the parts of objects by non-negative matrix factorization. Nature. Cited by: §VI-B.
-  (2001) Algorithms for non-negative matrix factorization. In NIPS, Cited by: §IV-E, §IV-E, §VI-B.
-  (2017) Inhomogeneous hypergraph clustering with applications. In NIPS, Cited by: 3rd item, §VI-A.
-  (2018) Influence maximization on social graphs: a survey. TKDE. Cited by: §VI-A.
-  (2017) Pythia: a system for online topic discovery of social media posts. In ICDCS, Cited by: §I, §VI-A.
-  (2012) Sentiment analysis and opinion mining. Synthesis lectures on human language technologies. Cited by: §I, §VI-A.
-  (2015) Fine-grained opinion mining with recurrent neural networks and word embeddings. In EMNLP, Cited by: §I, §VI-A, §VI-A.
-  (2017) Measuring and moderating opinion polarization in social networks. TKDD. Cited by: §II, §VI-A.
-  (2009) Nonnegative matrix factorization using projected gradient algorithms with sparseness constraints. In ISSPIT, Cited by: §VI-B.
-  (2005) Non-negative source separation: range of admissible solutions and conditions for the uniqueness of the solution. In ICASSP, Cited by: §VI-B.
-  (2011) Overlapping communities in dynamic networks: their detection and mobile applications. In Proceedings of the 17th annual international conference on Mobile computing and networking, Cited by: §VI-A.
-  (2018, July) Joint concept for operating in the information environment (jcoie). External Links: Cited by: §II.
-  (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics. Cited by: §VI-B.
-  (2015) Small-scale incident detection based on microposts. In Proceedings of the 26th ACM Conference on Hypertext & Social Media, Cited by: §VI-A.
-  (2016, Jul) Sentiment140 - a twitter sentiment analysis tool. External Links: Cited by: §I, 2nd item.
-  (2006) Nonnegative matrix approximation: algorithms and applications. Computer Science Department, University of Texas at Austin. Cited by: §VI-B.
-  (2012) Mining diverse opinions. In MILCOM 2012-2012 IEEE Military Communications Conference, Cited by: §I, §VI-A.
-  (2009) Interpretation of organic components from positive matrix factorization of aerosol mass spectrometric data. Atmospheric Chemistry and Physics. Cited by: §VI-B.
-  (2017) Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In AAAI, Cited by: §I, §VI-A.
-  (2012) Nonnegative matrix factorization: a comprehensive review. TKDE. Cited by: §VI-B, §VI-B.
-  (2012) Document-topic hierarchies from document graphs. In CIKM, Cited by: §I, §I, §VI-A.
-  (2018) A hybrid unsupervised method for aspect term and opinion target extraction. Knowledge-Based Systems. Cited by: §VI-A.
-  (2012) Extracting opinion expressions with semi-markov conditional random fields. In EMNLP, Cited by: §VI-A.
-  (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In WSDM, Cited by: §I, §VI-A.
-  (2010) Orthogonal nonnegative matrix tri-factorization for co-clustering: multiplicative updates on stiefel manifolds. Information processing & management. Cited by: §VI-B.
-  (2018) Taxogen: unsupervised topic taxonomy construction by adaptive term embedding and clustering. In SIGKDD, Cited by: §I.
-  (2013) Maximizing the spread of positive influence in online social networks. In ICDCS, Cited by: §VI-A.
-  (2014) Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In SIGIR, Cited by: §I, 2nd item.
-  (2019) Privacy-preserved community discovery in online social networks. Future Generation Computer Systems. Cited by: §VI-A.
-  (2009) Discriminant sparse nonnegative matrix factorization. In 2009 IEEE International Conference on Multimedia and Expo, Cited by: §VI-B.
-  (2007) Learning with hypergraphs: clustering, classification, and embedding. In NIPS, Cited by: 3rd item, §VI-A.