1. Introduction
The widespread adoption of socialmedia platforms has altered the landscape of societal debates in unprecedented ways: an immense amount of content is delivered to socialmedia users in their timelines, allowing them to quickly access information and participate in political discourse. However, with their algorithmically curated and virally propagating content, socialmedia platforms are suspected of contributing to the polarization of opinions by means of the socalled “echo chamber” effect, due to which users tend to interact with likeminded individuals, reinforcing their own ideological viewpoint (bakshy15exposure; garimella2018quantifying). Hence, understanding the interplay between the ideological leanings of social media users and the information they consume and propagate, is of crucial importance towards devising techniques for limiting misinformation, echo chambers, and for designing public information campaigns (garimella17balancing; aslay2018maximizing).
Therefore, modeling opinions and their dynamics in social media has attracted considerable interest in the last years. The bulk of the literature on opinion modeling (del2017modeling; monti2020learning) and opinion mining (barbera2015birds; garimella2018quantifying; wong2016quantifying) is however limited to analyze ideological leanings along a onedimensional political spectrum, i.e., the traditional leftright spectrum, or countryspecific axis (e.g., United States’ DemocratRepublican). Instead, political scientists have long noted that a single leftright axis assumption is inadequate for describing the existing variation in ideological profiles, and risks to ignore important distinctions between ideological groups (davis1970expository; eysenck1975structure). Only recently, opinion dynamics research has started to unveil the complex multidimensional nature of opinion formation (baumann2020emergence).
Input  Output  
Item 1  Item 2  Polarity 
0.9 economy +  0.9 economy +  on economy 
0.1 minorities  0.1 minorities  
Item 3  Item 4  Polarity 
0.1 economy +  0.1 economy +  on minorities 
0.9 minorities  0.9 minorities  
= user, = user sharing an item 
Polarities of some popular subreddits related to 2016 elections, estimated by our method on the Reddit dataset (further details in Sec.
6). Each of the four dimensions represents one axis of our ideological space.In this paper, combining ideas from different areas – i.e., network embeddings, information propagation, and opinion mining – we propose to learn ideological embeddings of social media users in a multidimensional ideological space from information cascades. To this end, we propose a stochastic propagation model, dubbed Multidimensional Ideologyaware Propagation (MIP) model, formalizing how politically salient content spreads in a social network as a function of the ideological embeddings of users under homophilydriven interactions. Our model assumes that information propagates from one node in a social network to another if both are interested in the topic and ideologically aligned with each other. Such a scenario can describe for instance information propagations in Twitter, where clusters of retweets have been found to correlate with the political leanings of users (garimella2018quantifying; barbera2015birds; conover2011political; vaccines). Accordingly, the ideological embedding of a node is characterized by a topic distribution and polarity for each topic, in a multidimensional space defined by the input topics. As a result, our embeddings have the nice feature of being interpretable, and can be therefore used to make explainable predictions on unobserved behavior.
Figure 1 provides an abstraction of our proposal: the input to our model is the propagation of 4 items, each with a known topic distribution. These observations can be explained by the latent polarities of the users over each topic: in the right part of the figure we indicate the spectrum of polarities on the axis defined by the first topic with colors going from red to blue, and for the second axis from green to purple. These polarities are our output.
As a concrete example, consider the topics: “economy”, “emailgate”^{1}^{1}1“Emailgate” is the term coined by the media to refer to the controversy around Hillary Clinton’s use of a private email server for official public communications. This was used as an argument against her by her opponents in the 2016 U.S. elections., “foreign policy”, and “minority rights”. Assume that and are neighbors in a given network (e.g., they are “friends” or ”follow” each other) but they disagree on each of these topics. In this case, regardless of how much interested they are in these topics, any content coming from will hardly be further propagated by and vice versa. Alternatively, if and are aligned on “economy”, then propagation of content about this topic from, e.g., to will be likely, as long as is also exhibiting an interest in the topic.
Notably, the joint explicit modeling of interest and ideological leaning eases the interpretability of nodes’ behavior and ultimately the explainability of propagation process. Figure 2 provides a preview of some of the results of our method on realworld Reddit data. In particular it reports the polarities learned for some popular subreddits related to 2016 election, along the four axes “economy”, “minorities”, “foreign policy”, and “emailgate”. The polarities modeled by our method correspond to expected ones for these subreddits: for instance, along the axis “emailgate”, the subreddit sandersforpresident is much closer to subreddits republican and the_donald than to subreddit hillaryclinton; while on “economy” subreddit hillaryclinton is closer to subreddit republican than to sandersforpresident.
Paper contributions and roadmap. The contributions of this paper can be summarized as follows:
Next section discusses the most relevant related literature.
2. Related work
Learning ideological leaning. Early studies tackling this problem treated it as a classification task and devised methods for predicting the binary political alignment of Twitter users (conover2011predicting; pennacchiotti2011democrats). Apart from the inherent limitation of onedimensional political spectrum assumption common to all the existing approaches, cohen2013classifying showed that another limitation arose from their need for large amounts of manually annotated data, and their dependence on politically active users. Some studies looked into contentaware analysis of users’ social interactions (lu2015biaswatch; lahoti2018joint) as we do in our work: none of these approaches, however, exploits information propagation. In particular, with respect to lahoti2018joint our model is (i) multidimensional, since we jointly estimate ideological scores for multiple topics; (ii) based on information propagation rather than content; (iii) modeldriven: our estimation algorithm stems from a principled model of how content propagate in a network.
A related research direction, topical stance detection (DeySK18; ijcai2017557; Lin+19; sunetal2018stance), aims at assigning a polarity label to a post towards a specific topic. We note that these models are only concerned with extracting topics and sentiments from a text corpora; instead, we characterize information propagation by looking at ideological embeddings.
Influencedriven propagation models. barbieri2013topic introduced topicaware social influence propagation models that take into account the topical interest profiles of the users and the topic distribution of the content that propagates in a social network. Topicaware social influence modeling is crucial to accurately capture the propagation pattern of content in many applications, ranging from viral marketing and social advertising (AslayBBB14; AslayLB0L15; AslayBLL17) to whotofollow recommendations (BarbieriBM14)
. However, for politically salient content, these models are not sufficient to correctly capture the propagation probabilities, as ideological alignment also plays a role.
Homophilybased models. Homophily is a wellknown phenomenon is social networks: users tend to interact with users from the same social group (mcpherson2001birds; himelboim2013birds)
. As such, it is a common assumption in machine learning on social networks
(BarbieriBM13; garimella2018quantifying) and in propagation models (del2017modeling). In this work, we assume that information is more likely to propagate between nodes that, on a specific topic, share the same ideological views. While the validity of this assumption depends on the specific context and action (replies, follows, etc), it is in general well supported by the current literature. In the case of Twitter, retweets – the act of resharing a piece of content – have been shown to follow this pattern very closely: when a user retweets, it is likely that they share the same ideological background with the source regarding that specific topic (garimella2018quantifying). For example, conover2011politicalfound that retweets about U.S. politics had odds ratios higher than one for intrapartisan interactions and lower than
for crosspartisan interactions. lai2019stance analyzed users involved in the debate about the 2016 referendum on the reform of the Italian Constitution on Twitter, and found that 98.6% of retweets happen between users with the same stance on this topic. However, this assumption is very general, and it has shown its validity in many other contexts, from blogs (adamic2005political) to Facebook (schmidtPolarizationVaccinationDebate2018).Interpretable embeddings. We leverage this idea to produce interpretable ideological embeddings. Literature on graph embeddings is vast; we refer to goyal2018graph for a review. However, most of the techniques from this area offer embeddings that are subsymbolic, lacking a precise, natural language explanation (vskrlj2020tax2vec). For instance, fengInf2vecLatentRepresentation2018 devised a technique to learn embeddings from information cascades: however, their embeddings are not directly interpretable; they do not attempt to model polarities, topics, or opinions. Some works have tried to deal with the problem of interpretability in graph embeddings. For instance, idahl2019finding use an external knowledge base to find interpretable subspaces in a given node embedding space. mehta2019stochastic
implement the stochastic block model as a graph neural networks to learn an interpretable embedding for community detection.
To the best of our knowledge, our work is the first to devise an inference algorithm able to extract interpretable embeddings from information cascades; it is also the first to provide a datadriven model for the interplay between topicaware opinions and information propagation on social media.
3. Model
In this section, we present the Multidimensional Ideologyaware Propagation Model (MIP for short). The model embodies the following set of assumptions:

Nodes share items, and each item spans a variety of topics.

Each topic corresponds to an ideological axis, where each node could be placed.

Nodes adopt the same behavior of the nodes they follow on each item, as long as they agree with each other on the topics of that item.
These assumption are general, they combine aspects from previous works, and are easily applicable to realworld data, as seen in Sec. 2. We next formalize these assumptions in a probabilistic framework.
Information cascades. We consider a directed social graph where and denote the set of nodes and edges, respectively, and a directed edge represents being a follower of . As such, receives in their timeline the pieces of content (we call them items) shared by and can be influenced by to share further, thus allowing the propagation of information. We use the term “sharing” here in a loose sense, i.e., to represent any action which is likely to express an alignment with the original node, following the assumptions outlined at the beginning of this section. For example, a retweet on Twitter or a like on Facebook. We denote the set of all items as . We represent the action of node sharing an item at time with a triplet , and call such a triplet an activation. We assume that no node adopts the same item more than once, that is, if a node shares an item , it becomes active on that item, and it will stay so forever. In this way, we can focus on observing whether supports item or not, and ignore the subtleties regarding multiple sharing behavior; moreover, in some contexts (e.g. a like on Facebook) an activation can happen at most once by definition.
We use to denote the set of activations observed and to refer to the set of nodes that became active on (i.e, the cascade of item ).
Likelihood of a set of cascades. Our approach relies on modeling the set of observed activations as a result of the stochastic process governed by a set of latent factors that span over a multidimensional space, representing the ideological tendencies of each node. By assuming that each cascade is independent from the others, the likelihood of can be expressed as , where in turn we can devise as
(1) 
Here, is the ordered set of all inneighbors of in cascade . In other words, the likelihood of a cascade is given by the probability of obtaining the successful and failed activations. The dependency on for modeling the activation probability is fundamental here: resharing an item depends on those who have already shared that item. In other words, node is exposed to by the nodes followed by that shared before. We consider as activations only those such that ; that is, we ignore the initiator of a cascade, as we are only interested in modeling propagation.
Multidimensional ideological space. The core of our approach is then the proper definition of so that it reflects the intuition that activations only happen within the context of ideological alignment. Since items have different topics which span a lowdimensional space, we assume that

an individual only adopts an item if it matches a topic of interest for them,

besides interest, an action is only likely to happen if there is an ideologically alignment on that topic between the individual and the other individuals who shared the item before.
These concepts are reflected in the specification of the model parameters and consequently in the definition of the activation probability as follows. First, we assume that each item exhibits a multinomial distribution over topics (dimensions), where represents the relevance of item to topic . A node can exhibit an interest on topic with probability . The parameter set represents how much is interested in each of the topics. Note that, in principle, a node might even be interested in all the considered topics, as well as in none of them.
Finally, each of the topics defines an ideological axis: given a topic , we assume that can exhibit either positive or negative leaning within the context of that topic. The terms “positive” and “negative” only indicate that the leanings are opposing each other. We model the probability of node exhibiting positive leaning on topic as and the probability of negative leaning as .
The parameter set represents the set of all polarities, i.e., alignment probabilities for each topic relative to , and therefore it defines the position of in the multidimensional ideological space defined by the topics. We can then express the probability that node is ideologically aligned with on topic as
(2) 
where is the probability that both nodes exhibit positive leaning on dimension and, by contrast, is the probability that they exhibit negative leaning. Using this topicspecific alignment probability , we define
(3) 
to be the probability that will activate on item given that their predecessor already did. In other words, there must be a topic picked from the topic distribution of the item such that (i) is of interest to , and (ii) is ideologically aligned on with .
Now, we express the probability that a node becomes active on as a mixture over all possible activators
(4) 
The term
here represent a prior probability that
causes an activation in . As such, it must respect ; for instance, it can be the uninformative prior . In general, this term adds additional flexibility to our model, by allowing for a predefined relevance for . For example, it can be used to model bias towards popular nodes, or the initiators of the cascade.MIP generative model. We can devise a simple generative stochastic process for data generation that adheres to the aforementioned assumptions. First, we introduce the hyperparameters that govern such a generative process. We define to control the polarization of the nodes’ opinions: high values of lead to polarity which are closer to the extremes (i.e., being closer to or ); lower values lead to polarities closer to neutrality (i.e., being closer to ). We also define and to control the generated distribution of the interests. Finally, we define as the prior distribution of the topics of items. Then, given an underlying graph , we take the following steps to generate a propagation dataset.
[backgroundcolor=gray!10,roundcorner=10pt]

Draw interests for each and topic .

Draw polarities for each and topic .

Generate an item :

Draw its topic distribution .

Draw an initial activated seed uniformly at random from .

For each arc s.t. activated and has not yet seen the item:

Node sees the item from .

Draw the item topic according to .

is interested in the topic with probability . If it is not interested, then the propagation fails.

If it is interested, draw the attitudes of and on as Bernoulli variables with probability and .

If the attitudes are equal, activates on the item.


Within the main loop each active node tries to activate its neighbors, and the process stops when no further nodes become active. For simplicity, we presented the model assuming that each active node has an equal chance of activating its neighbors: this corresponds to assuming a uniform prior within Eq. 4.
4. Inference and Learning
Our goal is to have an algorithm that given a set of items represented in a dimensional topic space, is able to estimate the interests and the polarities of each node—that is, our ideological embeddings. We do so by maximizing the likelihood of our model w.r.t. , given all and .
We can easily see the problem of maximizing the likelihood we defined in the previous section as a learning problem, in which we predict whether a certain node will activate or not. To obtain a scalable learning algorithm, we would like to treat each possible propagation of an item from to as an independent example. To do so, we resort to an approximation of the likelihood in Eq. 1. First, for the active users, we notice that, by Jensen inequality:
By virtue of this inequality, we can consider the prior as a predefined sampler that selects users from and approximates the activation probability according to them. In practice, we can focus on a subset of predefined potential activators, and measure the probability of alignment to them.
Different choices of which nodes to sample suit different contexts. For example, considering only the first activator is appropriate in some real cases, where the ideological alignment is likely to happen mostly between the creator of the item and whoever adopts it, more than between followers. In our algorithm, however, we wish to treat as equal every pair among those that reshared an item . This assumption comes from the empirical observation that political communities tend to form groups with a high homophily, often described as echo chambers (del2016echo); these appear in retweet networks as clusters with a homogeneous opinion (garimella2018quantifying; barbera2015birds; conover2011political; vaccines). Therefore, we expect aligned nodes to activate collectively – i.e., if is aligned with a neighbor who shared , they are also probably aligned with another neighbor who shared the same item.
Based on this practical consideration, we further choose to approximate the negative terms as
As a matter of fact, since ideological communities tend to be homogenous, then if is not aligned with a neighbor who shared , we expect them in practice to be unaligned with all their neighbors who shared the same item. This approximation allows us first to better learn from realworld propagations, clustering together users who reshared an item and separate those who did not. Secondly, it allows us to factorize the likelihood and hence to obtain a scalable algorithm. We can therefore rewrite the overall loglikelihood as
Thanks to this approximation, we can employ stochastic gradient descent in the following way. The basic instance of the learning problem is an example
where and iff and otherwise, given that . In other words, from the topic distribution of an item and one of the nodes active on it, we try to predict whether one of its followers will activate or not. Then, given a sequence of examples sampled from the set of observed activations , maximizing the likelihood is equivalent to minimizing the log loss of each example:where is the likelihood of a single example given the latent variables and .
Input: ; items with topics , activations . Output: Polarities and interests for all .
The learning procedure is outlined in Algorithm 1. In real data sets, nonactivated nodes largely outnumber the activated ones for a given item. For this reason, we use negative undersampling (line 8) to reduce the number of negatives. In practice, we undersample negative examples at random so that they are two times the number of positives. In experiments, this implies an average precision of in expectation when predictions are performed randomly.
We also found that considering a random sample of of fixed size (line 4) seems to work well in practice. For gradient descent, we employed AdaGrad (duchi2011adaptive) with a linearly decreasing learning rate (stopping at ) to avoid undesirable local minima.




5. Experimental Assessment
We conduct an extensive empirical evaluation of the proposed model and the inference algorithm on synthetic and realworld datasets. Our goal is to answer to the following research questions:

RQ2. Are the predictions from our algorithm explainable? More precisely: are the embeddings inferred by our algorithm from a realworld scenario interpretable and meaningful, compared to wellknown facts about ideological positioning of political groups? (Section 5.2)

RQ3. In realworld scenarios, can ideological embeddings be used to predict unobserved behavior? How does their predictive power compare to less interpretable stateoftheart embeddings? (Section 5.3)
In order to foster reproducibility, we publicly release all the data and code necessary to reproduce our experiments.^{2}^{2}2https://github.com/corradomonti/ideologicalembeddings
5.1. Experiments on synthetic data
To answer the first question, we wish to verify that our inference algorithm can make correct predictions whenever activations follow the assumptions outlined in Section 3. To this end, we conduct experiments on multiple synthetic datasets in which activations are generated according to our model. We show that Algorithm 1, despite the approximations we assumed to make it scalable, is consistent with the generative process proposed in Section 3. Furthermore, we study under which conditions the predictions made by the algorithm are accurate: it turns out that polarization affects greatly its accuracy, as does the number of items; the topology of the underlying graph, instead, seems to have less impact.
Experimental setting. In all our experiments, we consider the items as divided into training (%) and test (%) sets. We run our inference algorithm on the training set to estimate the values of the variables and which are then used to compute the likelihood of activations in the test set. In particular, we consider the following prediction task: given that a node has activated on an item with topic distribution , what is the probability that node will also activate on item ?
To generate the synthetic dataset, we follow the procedure described in Section 3 with the following hyperparameters. We set topics. We set and , in order to have a high () probability of users having topicaware interests greater than . As a prior for items, we set for all the topics. We use two different graphs , both with nodes: (i) a complete graph, with density ; (ii) a graph generated by the BarabásiAlbert network model, with and density .
Results. We first compare the accuracy of the inference process when varying the number of items propagated through the network. To this end, we set and perform inference with , items. We report the results in Fig. 2(a) and 2(b). As expected, the inference accuracy increases with the number of items since every new item introduces a new cascade to the sample, improving the accuracy of parameter estimations. Surprisingly, the results are similar for both the complete graph and the BarabásiAlbert network model, suggesting that the topology does not seem to affect the performance of our algorithm.
We then measure the inference accuracy with regards to the distribution on the polarization of the individuals. To this end, we set and perform inference on the datasets generated by setting the hyperparameter . We report the results in Fig. 2(c) and 2(d). We can clearly see that our inference algorithm performs better in highly polarized scenarios (i.e., higher values of ) when items induce great controversy. This result shows that more polarized nodes lead to different behavior in the propagation of the content, making it easier for the inference algorithm to distinguish the leaning of the nodes. As a result, we see that the polarization has a profound effect on the quality of the estimations. Overall, we can conclude that our inference algorithm is effective in estimating polarities and interests in order to make predictions about new unseen activations.
5.2. Interpretable Embeddings
In this section, we answer RQ2, that is: are the predictions made by our algorithm explainable? Or, in other words, is our algorithm able to extract interpretable embeddings when applied to a realworld dataset? Is the interpretation of those embeddings in line with known facts about a real world dataset? We operationalize these questions by employing a dataset based on wellstudied (massachs2020roots; soliman2019characterization; mills2018pop) communities from Reddit. We will show how the embeddings we find correspond to known political positions of these communities. We refer to Section 5.3 for an analysis of the predictive power of such embeddings.
Reddit dataset. The realworld dataset we consider in this part is extracted by crawling the social news aggregation website Reddit. Here, we aim at capturing how different Reddit communities (subreddits) share political news items. In fact, we focus on subreddits as nodes, while we ignore individual users: in this way, we can use their known political position as a validation of the interpretability of our embeddings.
To this end, we interpreted an item to be a specific URL posted on Reddit. Each subreddit community is a node that can propagate (i.e. post) an item. We consider subreddits as nodes of a complete graph (i.e., each subreddit can reshare news from all the others). We say that a node is activated on an item when the URL that corresponds to item is posted on the subreddit .
We identified subreddit communities that are most similar to r/politics
(the main community for U.S. politics) in terms of their cosine similarity over users.We then collected the URLs that have been shared by at least
of these subreddits between the years and , obtaining a set of items.We are therefore assuming that this dataset fits the assumptions we outlined at the beginning of Section 3: i.e., each posted item spans a variety of topics; subreddits share an item already shared on another subreddit if they are interested in it and if the two communities are aligned on the topics of that specific item.
Ideological axes. To define the ideological axes of our model we explored the dataset using doc2vec (le2014distributed). We trained the doc2vec
algorithm using the titles of the posts associated to each considered URL in our dataset. Then, we used the soft Kmeans clustering algorithm
(kim2007soft) to group the obtained doc2vecvectors into clusters. We then defined the topic distribution as the soft assignment of item to the clusters. We found meaningful results with this method for . A textual representation of these topics is shown in Table 1. We observed that the identified topics are well differentiated, allowing to provide intuitive naming as follows: economy, Emailgate, foreign policy, campaigning, and minorities’ rights. These five topics are evenly distributed in the data.In order to evaluate how explainable are the predictions made by our algorithm, and therefore how interpretable are the embeddings we find, we investigate how well the polarities estimated from the content that the nodes’ propagate reflect known political leanings of each subreddit^{3}^{3}3To reach the best possible representation, we used the full reddit dataset, and we rerun our model times and picked the outcome that obtained the largest likelihood.. Given that the topics we identified (Table 1) fairly represent different axes of the U.S. political debate and divisions, we expect these divisions to be reflected in the estimated embeddings. To do so, we focus in this analysis on the most active subreddits, reflecting discussions to support a U.S. politician or political party. The political positions typically expressed in these subreddit are wellknown and have been analyzed in the literature (massachs2020roots; soliman2019characterization; roozenbeek2017read; hendricks2017social). These include, among the others:

democrats, subreddit affiliated to the U.S. Democratic Party.

the_donald, a subreddit (now banned) dedicated to supporters of Republican president Donald Trump.

sandersforpresident and wayofthebern are both communities of supporters of Bernie Sanders, democratic candidate in 2016 and 2020 primary elections. While the former subreddit is official, the second one was created by supporters after Sanders lost the primary election in 2016, in opposition to Hillary Clinton and the Democratic Party establishment.

libertarian is a community focused on libertarianism, close but not affiliated to the U.S. Libertarian Party.
Results. We first consider the topics economy, minorities rights, and Emailgate due to the ease in their interpretability and their similarity with axis identified as divisive in the 2016 elections (i.e. economy, identity issues, and antiestablishment disaffection (drutman2017political)). We report the estimated polarities for each of these topics in both the plots and the tables of Figure 4.
On the “economy” ideological axis, we see that the most extreme positions among the aforementioned subreddits are filled by libertarian on one side and Bernie Sanders’ on the other. This accurately represent the deep division between the probusiness libertarian ideology and the more prowelfare ideas of democrats (drutman2017political), and even more with the socialist ideas of Bernie Sanders supporters (mills2018pop). the_donald correctly appears to be on the same side of the axis as libertarians. On the minorities’ rights ideological axis, instead, we found that the_donald lies on one extreme, democrats on the other, while libertarian have more moderate views. These reflects common readings of the positions of these groups on the economical axis (drutman2017political). On the “Emailgate” axis, the divisions reflect how different communities perceived the controversy: in particular, the_donald (together with conspiracy and hillaryforprison) lies on the very extreme end of the spectrum, while democrats on the other. libertarians on this axis are very separated from the_donald. It is known that Donald Trump has been using extensively the socalled Emailgate controversy to attack Hillary Clinton, while the Democratic Party defended her; Libertarians did not attack Clinton on the topic^{4}^{4}4https://www.washingtonpost.com/news/postpolitics/wp/2016/07/07/libertarianrivalswontattackclintonoveremails/. These positions are all wellrepresented in our ideological embeddings. Moreover, we find a separation between wayofthebern, created in opposition with Hillary Clinton’s victory of the 2016 primaries, and democrats: the former lies in the middle on the issue, while the latter is in the most extreme spot. Their embeddings in this axis, again, echo known facts from the literature that studied these communities: the_donald and conspiracy pushed these accusations, while wayofthebern occasionally “[tried] to leverage the dual valence of the deep state frame – appealing to the older, more general antinationalsecurityestablishment frame – even as the frame continues to do work on the right” (benkler2018network). Moreover, these findings reflect survey measurements on antiestablishment disaffection between Clinton and Sanders supporters (drutman2017political). Finally, beside these three axes, we find that the foreign policy axis is highly correlated with the minorities’ rights one. This matches with previous observations from political surveys (baumann2020emergence).
We summarize some of these examples in Figure 2; there, we visualize the position of a subreddit on an axis by indicating its rank among the considered subreddits on that axis. From these examples, this evaluation shows how our representation of nodes in multidimensional ideological space can provide an intuitive and interpretable representation of the opinion and beliefs of each node.
5.3. Prediction accuracy on realworld data
Our final goal is to check whether our inference algorithm is able to capture the realworld activation dynamics and to make accurate predictions on unseen cascades. In order to evaluate the tradeoff between explainability and predictive power, we compare our model with the following representative popular baselines:
 node2vec (grover2016node2vec)::

we define a weighted graph by considering an arc for each instance of “node activates on an item after node ”. Then, we embed this graph using the node2vec algorithm into a representation of the same dimension as our ideological space, i.e. dimensions (corresponding to and ). The idea is to check whether the resulting embedding is capable of summarizing the same information provided by our embeddings. We also perform the same experiments with embedding dimension , usually employed in literature (goyal2018graph)
. After computing these embeddings, we train a logistic regression classifier on the Hadamard product of the embeddings, as suggested by
grover2016node2vec. We remark that, despite node2vec being one of the most popular and wellperforming graph embeddings (goyal2018graph), the results it provides are not directly interpretable (vskrlj2020tax2vec).  Barbera’s model (barbera2015birds)::

the goal of this algorithm is to embed nodes of a given graph into a onedimensional bipolar ideological spectrum (e.g. leftright) which models homophily and divergence. We apply it on the same activation graph defined for node2vec. This model is usually applied to networks, and it is not specific to information cascades. Also, it provides weaker representation capabilities, since its output falls in the monoaxis opinion mining line of research discussed in Section 2.
 Original information::

for this approach, we do not use any embedding. We train a logistic classifier directly on a onehot encoding representation of
and (i.e., in dimensions, a much larger space than the ones we use for embeddings). The classifier is given the concatenated onehot representations of a pair of nodes and it is trained to predict whether that pair of nodes correspond to a successful propagation. The approach is therefore not scalable for very large graphs, and we use it as a naive baseline.
Since our model leverages the topic distribution of each item as input, we also tested baselines using the same information, by concatenating the topic distribution of each item to each input vector of the logistic regression models (node2vec, original information). Barbera’s model, instead, does not offer an immediate way to use this kind of information.
To train and evaluate each algorithm, we divided the dataset in 10 folds by splitting independent items into equally sized groups. We used the first two folds as a validation set to tune the parameters of node2vec, and . We chose the best parameters according to the average AUC ROC in these two folds. Then, we used the remaining folds in crossvalidation to assess results. The evaluation is performed on two datasets, namely the Reddit dataset previously described, and a Twitter dataset we describe in detail next.
Prediction results on Reddit. Figure 5 reports a comparison of the results obtained on the Reddit dataset. On this dataset, our approach results substantially in line with stateoftheart. Moreover, on each single fold it consistently outperforms the baselines (Difference column in Figure 5). In practice, the richness and interpretability of our embeddings, illustrated in the previous section, do not come at the expense of predictive power. The explicit modeling of polarities on ideological axes still enables, in fact, useful predictions of unobserved behavior. Notably, on this network node2vec falls significantly below all the other baselines.
Prediction results on Twitter. We then proceed to evaluating our predictive performance on a larger dataset representing Twitter resharing behavior in a polarized setting. For the construction of this dataset, we considered the accounts of Twitter users extracted by vaccines, along with all their posted tweets. These users have mainly posted about Italian politics hence are in the same weakly connected component of a retweet network. We consider as potential propagations all the instances of user retweeting a tweet after user . We manually selected hashtags related to public political opinions to be used as topics: three politicians (#Salvini, #DiMaio, #Renzi) and three debates (#Vaccini, #Migranti, #Tav). We constructed the dataset from the tweets that contain these hashtags. From this subset, we iteratively removed users and tweets with fewer cascades, so that every Twitter user in the dataset has at least 100 retweets and every retweet has been shared by at least 10 users. This way, we obtained the final dataset with Twitter users and retweets.
We define the topic distribution of each item simply by counting the appearances of hashtags as follows. Denoting as the set of selected hashtags and as the set of selected hashtags mentioned by tweet , we define the topic distribution of tweet as and otherwise.
The retweet graph, built according to the procedure illustrated in the previous section and needed by node2vec, exhibits an extremely high () density. In order to make it tractable by node2vec, we have to filter out all edges representing a number of retweets less than the average (), thus obtaining a final density of .
Figure 6 reports the results of the evaluation. This dataset seems noisy and difficult to predict, but still our model is able to recover some information: we measure an average AUC ROC () significantly higher than the simpler baselines “original information” and Barbera’s model. We conjecture that considering different ideological axes in a meaningful way is essential in predicting behavior in such a dataset, confirming that it respects the general assumptions of our model. In sharp contrast with the previous dataset, node2vec is instead the best performing model, essentially tied to our approach (their AUC ROCs are within one standard deviation from each other). We highlight, however, the clear advantage of our model in terms of computing time, beside its interpretability.
These two datasets show that our model is able to combine the interpretability of simpler models on smaller datasets with the prediction accuracy of stateoftheart embeddings on larger ones.
6. Conclusions and Future Work
In this paper, taking an informationpropagation standpoint, we introduce ideological embeddings, i.e., the mapping of the ideological leanings of social media users in a multidimensional ideological space. To this end, we propose a stochastic propagation model, which formalizes how politically salient content spreads in a social network as a function of the ideological embeddings of users under homophilydriven interactions, and we devise a scalable gradientbased optimization procedure to learn the ideological embeddings that maximize an approximation of the likelihood of a set of information cascades.
We show through experiments on realworld as well as synthetic datasets, that the learnt ideological embeddings are coherent with our model, that they are interpretable, and that they can offer reliable and explainable predictions of unseen behavior.
In general, our framework can aid studies on how polarizing content spreads on social networks: since controversy and confirmation bias are important elements of misinformation spread, our model could help to get a better understanding of the topics that are prone to misinformation. However, we do not claim that our model is in general the best way to predict unobserved behavior in information cascades: its performance relies on the validity of its assumptions to a particular context. Instead, it provides a novel angle to the problem, able to turn raw information cascades into an interpretable embedding with userdefined axes.
Under this perspective it could be extended to adapt to other contexts, or to test different assumptions. For instance, modeling replies on Twitter instead of retweets would need a different set of assumptions because of their different characteristics in terms of homophily (lai2019stance). Since replies can be antagonizing, propagation can be driven by either alignment or misalignment. Another direction is to extend our framework to jointly infer the ideological leanings of social media users and content (by directly modeling items’ polarity as latent variables), as well as the axes of the ideological space and the underlying topic distribution.
Acknowledgments
CM and FB acknowledge support from Intesa Sanpaolo Innovation Center. GM acknowledges support from the EU H2020 ICT48 project ”HumanEAINet” under contract #952026. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
2. Related work
Learning ideological leaning. Early studies tackling this problem treated it as a classification task and devised methods for predicting the binary political alignment of Twitter users (conover2011predicting; pennacchiotti2011democrats). Apart from the inherent limitation of onedimensional political spectrum assumption common to all the existing approaches, cohen2013classifying showed that another limitation arose from their need for large amounts of manually annotated data, and their dependence on politically active users. Some studies looked into contentaware analysis of users’ social interactions (lu2015biaswatch; lahoti2018joint) as we do in our work: none of these approaches, however, exploits information propagation. In particular, with respect to lahoti2018joint our model is (i) multidimensional, since we jointly estimate ideological scores for multiple topics; (ii) based on information propagation rather than content; (iii) modeldriven: our estimation algorithm stems from a principled model of how content propagate in a network.
A related research direction, topical stance detection (DeySK18; ijcai2017557; Lin+19; sunetal2018stance), aims at assigning a polarity label to a post towards a specific topic. We note that these models are only concerned with extracting topics and sentiments from a text corpora; instead, we characterize information propagation by looking at ideological embeddings.
Influencedriven propagation models. barbieri2013topic introduced topicaware social influence propagation models that take into account the topical interest profiles of the users and the topic distribution of the content that propagates in a social network. Topicaware social influence modeling is crucial to accurately capture the propagation pattern of content in many applications, ranging from viral marketing and social advertising (AslayBBB14; AslayLB0L15; AslayBLL17) to whotofollow recommendations (BarbieriBM14)
. However, for politically salient content, these models are not sufficient to correctly capture the propagation probabilities, as ideological alignment also plays a role.
Homophilybased models. Homophily is a wellknown phenomenon is social networks: users tend to interact with users from the same social group (mcpherson2001birds; himelboim2013birds)
. As such, it is a common assumption in machine learning on social networks
(BarbieriBM13; garimella2018quantifying) and in propagation models (del2017modeling). In this work, we assume that information is more likely to propagate between nodes that, on a specific topic, share the same ideological views. While the validity of this assumption depends on the specific context and action (replies, follows, etc), it is in general well supported by the current literature. In the case of Twitter, retweets – the act of resharing a piece of content – have been shown to follow this pattern very closely: when a user retweets, it is likely that they share the same ideological background with the source regarding that specific topic (garimella2018quantifying). For example, conover2011politicalfound that retweets about U.S. politics had odds ratios higher than one for intrapartisan interactions and lower than
for crosspartisan interactions. lai2019stance analyzed users involved in the debate about the 2016 referendum on the reform of the Italian Constitution on Twitter, and found that 98.6% of retweets happen between users with the same stance on this topic. However, this assumption is very general, and it has shown its validity in many other contexts, from blogs (adamic2005political) to Facebook (schmidtPolarizationVaccinationDebate2018).Interpretable embeddings. We leverage this idea to produce interpretable ideological embeddings. Literature on graph embeddings is vast; we refer to goyal2018graph for a review. However, most of the techniques from this area offer embeddings that are subsymbolic, lacking a precise, natural language explanation (vskrlj2020tax2vec). For instance, fengInf2vecLatentRepresentation2018 devised a technique to learn embeddings from information cascades: however, their embeddings are not directly interpretable; they do not attempt to model polarities, topics, or opinions. Some works have tried to deal with the problem of interpretability in graph embeddings. For instance, idahl2019finding use an external knowledge base to find interpretable subspaces in a given node embedding space. mehta2019stochastic
implement the stochastic block model as a graph neural networks to learn an interpretable embedding for community detection.
To the best of our knowledge, our work is the first to devise an inference algorithm able to extract interpretable embeddings from information cascades; it is also the first to provide a datadriven model for the interplay between topicaware opinions and information propagation on social media.
3. Model
In this section, we present the Multidimensional Ideologyaware Propagation Model (MIP for short). The model embodies the following set of assumptions:

Nodes share items, and each item spans a variety of topics.

Each topic corresponds to an ideological axis, where each node could be placed.

Nodes adopt the same behavior of the nodes they follow on each item, as long as they agree with each other on the topics of that item.
These assumption are general, they combine aspects from previous works, and are easily applicable to realworld data, as seen in Sec. 2. We next formalize these assumptions in a probabilistic framework.
Information cascades. We consider a directed social graph where and denote the set of nodes and edges, respectively, and a directed edge represents being a follower of . As such, receives in their timeline the pieces of content (we call them items) shared by and can be influenced by to share further, thus allowing the propagation of information. We use the term “sharing” here in a loose sense, i.e., to represent any action which is likely to express an alignment with the original node, following the assumptions outlined at the beginning of this section. For example, a retweet on Twitter or a like on Facebook. We denote the set of all items as . We represent the action of node sharing an item at time with a triplet , and call such a triplet an activation. We assume that no node adopts the same item more than once, that is, if a node shares an item , it becomes active on that item, and it will stay so forever. In this way, we can focus on observing whether supports item or not, and ignore the subtleties regarding multiple sharing behavior; moreover, in some contexts (e.g. a like on Facebook) an activation can happen at most once by definition.
We use to denote the set of activations observed and to refer to the set of nodes that became active on (i.e, the cascade of item ).
Likelihood of a set of cascades. Our approach relies on modeling the set of observed activations as a result of the stochastic process governed by a set of latent factors that span over a multidimensional space, representing the ideological tendencies of each node. By assuming that each cascade is independent from the others, the likelihood of can be expressed as , where in turn we can devise as
(1) 
Here, is the ordered set of all inneighbors of in cascade . In other words, the likelihood of a cascade is given by the probability of obtaining the successful and failed activations. The dependency on for modeling the activation probability is fundamental here: resharing an item depends on those who have already shared that item. In other words, node is exposed to by the nodes followed by that shared before. We consider as activations only those such that ; that is, we ignore the initiator of a cascade, as we are only interested in modeling propagation.
Multidimensional ideological space. The core of our approach is then the proper definition of so that it reflects the intuition that activations only happen within the context of ideological alignment. Since items have different topics which span a lowdimensional space, we assume that

an individual only adopts an item if it matches a topic of interest for them,

besides interest, an action is only likely to happen if there is an ideologically alignment on that topic between the individual and the other individuals who shared the item before.
These concepts are reflected in the specification of the model parameters and consequently in the definition of the activation probability as follows. First, we assume that each item exhibits a multinomial distribution over topics (dimensions), where represents the relevance of item to topic . A node can exhibit an interest on topic with probability . The parameter set represents how much is interested in each of the topics. Note that, in principle, a node might even be interested in all the considered topics, as well as in none of them.
Finally, each of the topics defines an ideological axis: given a topic , we assume that can exhibit either positive or negative leaning within the context of that topic. The terms “positive” and “negative” only indicate that the leanings are opposing each other. We model the probability of node exhibiting positive leaning on topic as and the probability of negative leaning as .
The parameter set represents the set of all polarities, i.e., alignment probabilities for each topic relative to , and therefore it defines the position of in the multidimensional ideological space defined by the topics. We can then express the probability that node is ideologically aligned with on topic as
(2) 
where is the probability that both nodes exhibit positive leaning on dimension and, by contrast, is the probability that they exhibit negative leaning. Using this topicspecific alignment probability , we define
(3) 
to be the probability that will activate on item given that their predecessor already did. In other words, there must be a topic picked from the topic distribution of the item such that (i) is of interest to , and (ii) is ideologically aligned on with .
Now, we express the probability that a node becomes active on as a mixture over all possible activators
(4) 
The term
here represent a prior probability that
causes an activation in . As such, it must respect ; for instance, it can be the uninformative prior . In general, this term adds additional flexibility to our model, by allowing for a predefined relevance for . For example, it can be used to model bias towards popular nodes, or the initiators of the cascade.MIP generative model. We can devise a simple generative stochastic process for data generation that adheres to the aforementioned assumptions. First, we introduce the hyperparameters that govern such a generative process. We define to control the polarization of the nodes’ opinions: high values of lead to polarity which are closer to the extremes (i.e., being closer to or ); lower values lead to polarities closer to neutrality (i.e., being closer to ). We also define and to control the generated distribution of the interests. Finally, we define as the prior distribution of the topics of items. Then, given an underlying graph , we take the following steps to generate a propagation dataset.
[backgroundcolor=gray!10,roundcorner=10pt]

Draw interests for each and topic .

Draw polarities for each and topic .

Generate an item :

Draw its topic distribution .

Draw an initial activated seed uniformly at random from .

For each arc s.t. activated and has not yet seen the item:

Node sees the item from .

Draw the item topic according to .

is interested in the topic with probability . If it is not interested, then the propagation fails.

If it is interested, draw the attitudes of and on as Bernoulli variables with probability and .

If the attitudes are equal, activates on the item.


Within the main loop each active node tries to activate its neighbors, and the process stops when no further nodes become active. For simplicity, we presented the model assuming that each active node has an equal chance of activating its neighbors: this corresponds to assuming a uniform prior within Eq. 4.
4. Inference and Learning
Our goal is to have an algorithm that given a set of items represented in a dimensional topic space, is able to estimate the interests and the polarities of each node—that is, our ideological embeddings. We do so by maximizing the likelihood of our model w.r.t. , given all and .
We can easily see the problem of maximizing the likelihood we defined in the previous section as a learning problem, in which we predict whether a certain node will activate or not. To obtain a scalable learning algorithm, we would like to treat each possible propagation of an item from to as an independent example. To do so, we resort to an approximation of the likelihood in Eq. 1. First, for the active users, we notice that, by Jensen inequality:
By virtue of this inequality, we can consider the prior as a predefined sampler that selects users from and approximates the activation probability according to them. In practice, we can focus on a subset of predefined potential activators, and measure the probability of alignment to them.
Different choices of which nodes to sample suit different contexts. For example, considering only the first activator is appropriate in some real cases, where the ideological alignment is likely to happen mostly between the creator of the item and whoever adopts it, more than between followers. In our algorithm, however, we wish to treat as equal every pair among those that reshared an item . This assumption comes from the empirical observation that political communities tend to form groups with a high homophily, often described as echo chambers (del2016echo); these appear in retweet networks as clusters with a homogeneous opinion (garimella2018quantifying; barbera2015birds; conover2011political; vaccines). Therefore, we expect aligned nodes to activate collectively – i.e., if is aligned with a neighbor who shared , they are also probably aligned with another neighbor who shared the same item.
Based on this practical consideration, we further choose to approximate the negative terms as
As a matter of fact, since ideological communities tend to be homogenous, then if is not aligned with a neighbor who shared , we expect them in practice to be unaligned with all their neighbors who shared the same item. This approximation allows us first to better learn from realworld propagations, clustering together users who reshared an item and separate those who did not. Secondly, it allows us to factorize the likelihood and hence to obtain a scalable algorithm. We can therefore rewrite the overall loglikelihood as
Thanks to this approximation, we can employ stochastic gradient descent in the following way. The basic instance of the learning problem is an example
where and iff and otherwise, given that . In other words, from the topic distribution of an item and one of the nodes active on it, we try to predict whether one of its followers will activate or not. Then, given a sequence of examples sampled from the set of observed activations , maximizing the likelihood is equivalent to minimizing the log loss of each example:where is the likelihood of a single example given the latent variables and .
Input: ; items with topics , activations . Output: Polarities and interests for all .
The learning procedure is outlined in Algorithm 1. In real data sets, nonactivated nodes largely outnumber the activated ones for a given item. For this reason, we use negative undersampling (line 8) to reduce the number of negatives. In practice, we undersample negative examples at random so that they are two times the number of positives. In experiments, this implies an average precision of in expectation when predictions are performed randomly.
We also found that considering a random sample of of fixed size (line 4) seems to work well in practice. For gradient descent, we employed AdaGrad (duchi2011adaptive) with a linearly decreasing learning rate (stopping at ) to avoid undesirable local minima.




5. Experimental Assessment
We conduct an extensive empirical evaluation of the proposed model and the inference algorithm on synthetic and realworld datasets. Our goal is to answer to the following research questions:

RQ2. Are the predictions from our algorithm explainable? More precisely: are the embeddings inferred by our algorithm from a realworld scenario interpretable and meaningful, compared to wellknown facts about ideological positioning of political groups? (Section 5.2)

RQ3. In realworld scenarios, can ideological embeddings be used to predict unobserved behavior? How does their predictive power compare to less interpretable stateoftheart embeddings? (Section 5.3)
In order to foster reproducibility, we publicly release all the data and code necessary to reproduce our experiments.^{2}^{2}2https://github.com/corradomonti/ideologicalembeddings
5.1. Experiments on synthetic data
To answer the first question, we wish to verify that our inference algorithm can make correct predictions whenever activations follow the assumptions outlined in Section 3. To this end, we conduct experiments on multiple synthetic datasets in which activations are generated according to our model. We show that Algorithm 1, despite the approximations we assumed to make it scalable, is consistent with the generative process proposed in Section 3. Furthermore, we study under which conditions the predictions made by the algorithm are accurate: it turns out that polarization affects greatly its accuracy, as does the number of items; the topology of the underlying graph, instead, seems to have less impact.
Experimental setting. In all our experiments, we consider the items as divided into training (%) and test (%) sets. We run our inference algorithm on the training set to estimate the values of the variables and which are then used to compute the likelihood of activations in the test set. In particular, we consider the following prediction task: given that a node has activated on an item with topic distribution , what is the probability that node will also activate on item ?
To generate the synthetic dataset, we follow the procedure described in Section 3 with the following hyperparameters. We set topics. We set and , in order to have a high () probability of users having topicaware interests greater than . As a prior for items, we set for all the topics. We use two different graphs , both with nodes: (i) a complete graph, with density ; (ii) a graph generated by the BarabásiAlbert network model, with and density .
Results. We first compare the accuracy of the inference process when varying the number of items propagated through the network. To this end, we set and perform inference with , items. We report the results in Fig. 2(a) and 2(b). As expected, the inference accuracy increases with the number of items since every new item introduces a new cascade to the sample, improving the accuracy of parameter estimations. Surprisingly, the results are similar for both the complete graph and the BarabásiAlbert network model, suggesting that the topology does not seem to affect the performance of our algorithm.
We then measure the inference accuracy with regards to the distribution on the polarization of the individuals. To this end, we set and perform inference on the datasets generated by setting the hyperparameter . We report the results in Fig. 2(c) and 2(d). We can clearly see that our inference algorithm performs better in highly polarized scenarios (i.e., higher values of ) when items induce great controversy. This result shows that more polarized nodes lead to different behavior in the propagation of the content, making it easier for the inference algorithm to distinguish the leaning of the nodes. As a result, we see that the polarization has a profound effect on the quality of the estimations. Overall, we can conclude that our inference algorithm is effective in estimating polarities and interests in order to make predictions about new unseen activations.
5.2. Interpretable Embeddings
In this section, we answer RQ2, that is: are the predictions made by our algorithm explainable? Or, in other words, is our algorithm able to extract interpretable embeddings when applied to a realworld dataset? Is the interpretation of those embeddings in line with known facts about a real world dataset? We operationalize these questions by employing a dataset based on wellstudied (massachs2020roots; soliman2019characterization; mills2018pop) communities from Reddit. We will show how the embeddings we find correspond to known political positions of these communities. We refer to Section 5.3 for an analysis of the predictive power of such embeddings.
Reddit dataset. The realworld dataset we consider in this part is extracted by crawling the social news aggregation website Reddit. Here, we aim at capturing how different Reddit communities (subreddits) share political news items. In fact, we focus on subreddits as nodes, while we ignore individual users: in this way, we can use their known political position as a validation of the interpretability of our embeddings.
To this end, we interpreted an item to be a specific URL posted on Reddit. Each subreddit community is a node that can propagate (i.e. post) an item. We consider subreddits as nodes of a complete graph (i.e., each subreddit can reshare news from all the others). We say that a node is activated on an item when the URL that corresponds to item is posted on the subreddit .
We identified subreddit communities that are most similar to r/politics
(the main community for U.S. politics) in terms of their cosine similarity over users.We then collected the URLs that have been shared by at least
of these subreddits between the years and , obtaining a set of items.We are therefore assuming that this dataset fits the assumptions we outlined at the beginning of Section 3: i.e., each posted item spans a variety of topics; subreddits share an item already shared on another subreddit if they are interested in it and if the two communities are aligned on the topics of that specific item.
Ideological axes. To define the ideological axes of our model we explored the dataset using doc2vec (le2014distributed). We trained the doc2vec
algorithm using the titles of the posts associated to each considered URL in our dataset. Then, we used the soft Kmeans clustering algorithm
(kim2007soft) to group the obtained doc2vecvectors into clusters. We then defined the topic distribution as the soft assignment of item to the clusters. We found meaningful results with this method for . A textual representation of these topics is shown in Table 1. We observed that the identified topics are well differentiated, allowing to provide intuitive naming as follows: economy, Emailgate, foreign policy, campaigning, and minorities’ rights. These five topics are evenly distributed in the data.In order to evaluate how explainable are the predictions made by our algorithm, and therefore how interpretable are the embeddings we find, we investigate how well the polarities estimated from the content that the nodes’ propagate reflect known political leanings of each subreddit^{3}^{3}3To reach the best possible representation, we used the full reddit dataset, and we rerun our model times and picked the outcome that obtained the largest likelihood.. Given that the topics we identified (Table 1) fairly represent different axes of the U.S. political debate and divisions, we expect these divisions to be reflected in the estimated embeddings. To do so, we focus in this analysis on the most active subreddits, reflecting discussions to support a U.S. politician or political party. The political positions typically expressed in these subreddit are wellknown and have been analyzed in the literature (massachs2020roots; soliman2019characterization; roozenbeek2017read; hendricks2017social). These include, among the others:

democrats, subreddit affiliated to the U.S. Democratic Party.

the_donald, a subreddit (now banned) dedicated to supporters of Republican president Donald Trump.

sandersforpresident and wayofthebern are both communities of supporters of Bernie Sanders, democratic candidate in 2016 and 2020 primary elections. While the former subreddit is official, the second one was created by supporters after Sanders lost the primary election in 2016, in opposition to Hillary Clinton and the Democratic Party establishment.

libertarian is a community focused on libertarianism, close but not affiliated to the U.S. Libertarian Party.
Results. We first consider the topics economy, minorities rights, and Emailgate due to the ease in their interpretability and their similarity with axis identified as divisive in the 2016 elections (i.e. economy, identity issues, and antiestablishment disaffection (drutman2017political)). We report the estimated polarities for each of these topics in both the plots and the tables of Figure 4.
On the “economy” ideological axis, we see that the most extreme positions among the aforementioned subreddits are filled by libertarian on one side and Bernie Sanders’ on the other. This accurately represent the deep division between the probusiness libertarian ideology and the more prowelfare ideas of democrats (drutman2017political), and even more with the socialist ideas of Bernie Sanders supporters (mills2018pop). the_donald correctly appears to be on the same side of the axis as libertarians. On the minorities’ rights ideological axis, instead, we found that the_donald lies on one extreme, democrats on the other, while libertarian have more moderate views. These reflects common readings of the positions of these groups on the economical axis (drutman2017political). On the “Emailgate” axis, the divisions reflect how different communities perceived the controversy: in particular, the_donald (together with conspiracy and hillaryforprison) lies on the very extreme end of the spectrum, while democrats on the other. libertarians on this axis are very separated from the_donald. It is known that Donald Trump has been using extensively the socalled Emailgate controversy to attack Hillary Clinton, while the Democratic Party defended her; Libertarians did not attack Clinton on the topic^{4}^{4}4https://www.washingtonpost.com/news/postpolitics/wp/2016/07/07/libertarianrivalswontattackclintonoveremails/. These positions are all wellrepresented in our ideological embeddings. Moreover, we find a separation between wayofthebern, created in opposition with Hillary Clinton’s victory of the 2016 primaries, and democrats: the former lies in the middle on the issue, while the latter is in the most extreme spot. Their embeddings in this axis, again, echo known facts from the literature that studied these communities: the_donald and conspiracy pushed these accusations, while wayofthebern occasionally “[tried] to leverage the dual valence of the deep state frame – appealing to the older, more general antinationalsecurityestablishment frame – even as the frame continues to do work on the right” (benkler2018network). Moreover, these findings reflect survey measurements on antiestablishment disaffection between Clinton and Sanders supporters (drutman2017political). Finally, beside these three axes, we find that the foreign policy axis is highly correlated with the minorities’ rights one. This matches with previous observations from political surveys (baumann2020emergence).
We summarize some of these examples in Figure 2; there, we visualize the position of a subreddit on an axis by indicating its rank among the considered subreddits on that axis. From these examples, this evaluation shows how our representation of nodes in multidimensional ideological space can provide an intuitive and interpretable representation of the opinion and beliefs of each node.
5.3. Prediction accuracy on realworld data
Our final goal is to check whether our inference algorithm is able to capture the realworld activation dynamics and to make accurate predictions on unseen cascades. In order to evaluate the tradeoff between explainability and predictive power, we compare our model with the following representative popular baselines:
 node2vec (grover2016node2vec)::

we define a weighted graph by considering an arc for each instance of “node activates on an item after node ”. Then, we embed this graph using the node2vec algorithm into a representation of the same dimension as our ideological space, i.e. dimensions (corresponding to and ). The idea is to check whether the resulting embedding is capable of summarizing the same information provided by our embeddings. We also perform the same experiments with embedding dimension , usually employed in literature (goyal2018graph)
. After computing these embeddings, we train a logistic regression classifier on the Hadamard product of the embeddings, as suggested by
grover2016node2vec. We remark that, despite node2vec being one of the most popular and wellperforming graph embeddings (goyal2018graph), the results it provides are not directly interpretable (vskrlj2020tax2vec).  Barbera’s model (barbera2015birds)::

the goal of this algorithm is to embed nodes of a given graph into a onedimensional bipolar ideological spectrum (e.g. leftright) which models homophily and divergence. We apply it on the same activation graph defined for node2vec. This model is usually applied to networks, and it is not specific to information cascades. Also, it provides weaker representation capabilities, since its output falls in the monoaxis opinion mining line of research discussed in Section 2.
 Original information::

for this approach, we do not use any embedding. We train a logistic classifier directly on a onehot encoding representation of
and (i.e., in dimensions, a much larger space than the ones we use for embeddings). The classifier is given the concatenated onehot representations of a pair of nodes and it is trained to predict whether that pair of nodes correspond to a successful propagation. The approach is therefore not scalable for very large graphs, and we use it as a naive baseline.
Since our model leverages the topic distribution of each item as input, we also tested baselines using the same information, by concatenating the topic distribution of each item to each input vector of the logistic regression models (node2vec, original information). Barbera’s model, instead, does not offer an immediate way to use this kind of information.
To train and evaluate each algorithm, we divided the dataset in 10 folds by splitting independent items into equally sized groups. We used the first two folds as a validation set to tune the parameters of node2vec, and . We chose the best parameters according to the average AUC ROC in these two folds. Then, we used the remaining folds in crossvalidation to assess results. The evaluation is performed on two datasets, namely the Reddit dataset previously described, and a Twitter dataset we describe in detail next.
Prediction results on Reddit. Figure 5 reports a comparison of the results obtained on the Reddit dataset. On this dataset, our approach results substantially in line with stateoftheart. Moreover, on each single fold it consistently outperforms the baselines (Difference column in Figure 5). In practice, the richness and interpretability of our embeddings, illustrated in the previous section, do not come at the expense of predictive power. The explicit modeling of polarities on ideological axes still enables, in fact, useful predictions of unobserved behavior. Notably, on this network node2vec falls significantly below all the other baselines.
Prediction results on Twitter. We then proceed to evaluating our predictive performance on a larger dataset representing Twitter resharing behavior in a polarized setting. For the construction of this dataset, we considered the accounts of Twitter users extracted by vaccines, along with all their posted tweets. These users have mainly posted about Italian politics hence are in the same weakly connected component of a retweet network. We consider as potential propagations all the instances of user retweeting a tweet after user . We manually selected hashtags related to public political opinions to be used as topics: three politicians (#Salvini, #DiMaio, #Renzi) and three debates (#Vaccini, #Migranti, #Tav). We constructed the dataset from the tweets that contain these hashtags. From this subset, we iteratively removed users and tweets with fewer cascades, so that every Twitter user in the dataset has at least 100 retweets and every retweet has been shared by at least 10 users. This way, we obtained the final dataset with Twitter users and retweets.
We define the topic distribution of each item simply by counting the appearances of hashtags as follows. Denoting as the set of selected hashtags and as the set of selected hashtags mentioned by tweet , we define the topic distribution of tweet as and otherwise.
The retweet graph, built according to the procedure illustrated in the previous section and needed by node2vec, exhibits an extremely high () density. In order to make it tractable by node2vec, we have to filter out all edges representing a number of retweets less than the average (), thus obtaining a final density of .
Figure 6 reports the results of the evaluation. This dataset seems noisy and difficult to predict, but still our model is able to recover some information: we measure an average AUC ROC () significantly higher than the simpler baselines “original information” and Barbera’s model. We conjecture that considering different ideological axes in a meaningful way is essential in predicting behavior in such a dataset, confirming that it respects the general assumptions of our model. In sharp contrast with the previous dataset, node2vec is instead the best performing model, essentially tied to our approach (their AUC ROCs are within one standard deviation from each other). We highlight, however, the clear advantage of our model in terms of computing time, beside its interpretability.
These two datasets show that our model is able to combine the interpretability of simpler models on smaller datasets with the prediction accuracy of stateoftheart embeddings on larger ones.
6. Conclusions and Future Work
In this paper, taking an informationpropagation standpoint, we introduce ideological embeddings, i.e., the mapping of the ideological leanings of social media users in a multidimensional ideological space. To this end, we propose a stochastic propagation model, which formalizes how politically salient content spreads in a social network as a function of the ideological embeddings of users under homophilydriven interactions, and we devise a scalable gradientbased optimization procedure to learn the ideological embeddings that maximize an approximation of the likelihood of a set of information cascades.
We show through experiments on realworld as well as synthetic datasets, that the learnt ideological embeddings are coherent with our model, that they are interpretable, and that they can offer reliable and explainable predictions of unseen behavior.
In general, our framework can aid studies on how polarizing content spreads on social networks: since controversy and confirmation bias are important elements of misinformation spread, our model could help to get a better understanding of the topics that are prone to misinformation. However, we do not claim that our model is in general the best way to predict unobserved behavior in information cascades: its performance relies on the validity of its assumptions to a particular context. Instead, it provides a novel angle to the problem, able to turn raw information cascades into an interpretable embedding with userdefined axes.
Under this perspective it could be extended to adapt to other contexts, or to test different assumptions. For instance, modeling replies on Twitter instead of retweets would need a different set of assumptions because of their different characteristics in terms of homophily (lai2019stance). Since replies can be antagonizing, propagation can be driven by either alignment or misalignment. Another direction is to extend our framework to jointly infer the ideological leanings of social media users and content (by directly modeling items’ polarity as latent variables), as well as the axes of the ideological space and the underlying topic distribution.
Acknowledgments
CM and FB acknowledge support from Intesa Sanpaolo Innovation Center. GM acknowledges support from the EU H2020 ICT48 project ”HumanEAINet” under contract #952026. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
3. Model
In this section, we present the Multidimensional Ideologyaware Propagation Model (MIP for short). The model embodies the following set of assumptions:

Nodes share items, and each item spans a variety of topics.

Each topic corresponds to an ideological axis, where each node could be placed.

Nodes adopt the same behavior of the nodes they follow on each item, as long as they agree with each other on the topics of that item.
These assumption are general, they combine aspects from previous works, and are easily applicable to realworld data, as seen in Sec. 2. We next formalize these assumptions in a probabilistic framework.
Information cascades. We consider a directed social graph where and denote the set of nodes and edges, respectively, and a directed edge represents being a follower of . As such, receives in their timeline the pieces of content (we call them items) shared by and can be influenced by to share further, thus allowing the propagation of information. We use the term “sharing” here in a loose sense, i.e., to represent any action which is likely to express an alignment with the original node, following the assumptions outlined at the beginning of this section. For example, a retweet on Twitter or a like on Facebook. We denote the set of all items as . We represent the action of node sharing an item at time with a triplet , and call such a triplet an activation. We assume that no node adopts the same item more than once, that is, if a node shares an item , it becomes active on that item, and it will stay so forever. In this way, we can focus on observing whether supports item or not, and ignore the subtleties regarding multiple sharing behavior; moreover, in some contexts (e.g. a like on Facebook) an activation can happen at most once by definition.
We use to denote the set of activations observed and to refer to the set of nodes that became active on (i.e, the cascade of item ).
Likelihood of a set of cascades. Our approach relies on modeling the set of observed activations as a result of the stochastic process governed by a set of latent factors that span over a multidimensional space, representing the ideological tendencies of each node. By assuming that each cascade is independent from the others, the likelihood of can be expressed as , where in turn we can devise as
(1) 
Here, is the ordered set of all inneighbors of in cascade . In other words, the likelihood of a cascade is given by the probability of obtaining the successful and failed activations. The dependency on for modeling the activation probability is fundamental here: resharing an item depends on those who have already shared that item. In other words, node is exposed to by the nodes followed by that shared before. We consider as activations only those such that ; that is, we ignore the initiator of a cascade, as we are only interested in modeling propagation.
Multidimensional ideological space. The core of our approach is then the proper definition of so that it reflects the intuition that activations only happen within the context of ideological alignment. Since items have different topics which span a lowdimensional space, we assume that

an individual only adopts an item if it matches a topic of interest for them,

besides interest, an action is only likely to happen if there is an ideologically alignment on that topic between the individual and the other individuals who shared the item before.
These concepts are reflected in the specification of the model parameters and consequently in the definition of the activation probability as follows. First, we assume that each item exhibits a multinomial distribution over topics (dimensions), where represents the relevance of item to topic . A node can exhibit an interest on topic with probability . The parameter set represents how much is interested in each of the topics. Note that, in principle, a node might even be interested in all the considered topics, as well as in none of them.
Finally, each of the topics defines an ideological axis: given a topic , we assume that can exhibit either positive or negative leaning within the context of that topic. The terms “positive” and “negative” only indicate that the leanings are opposing each other. We model the probability of node exhibiting positive leaning on topic as and the probability of negative leaning as .
The parameter set represents the set of all polarities, i.e., alignment probabilities for each topic relative to , and therefore it defines the position of in the multidimensional ideological space defined by the topics. We can then express the probability that node is ideologically aligned with on topic as
(2) 
where is the probability that both nodes exhibit positive leaning on dimension and, by contrast, is the probability that they exhibit negative leaning. Using this topicspecific alignment probability , we define
(3) 
to be the probability that will activate on item given that their predecessor already did. In other words, there must be a topic picked from the topic distribution of the item such that (i) is of interest to , and (ii) is ideologically aligned on with .
Now, we express the probability that a node becomes active on as a mixture over all possible activators
(4) 
The term
here represent a prior probability that
causes an activation in . As such, it must respect ; for instance, it can be the uninformative prior . In general, this term adds additional flexibility to our model, by allowing for a predefined relevance for . For example, it can be used to model bias towards popular nodes, or the initiators of the cascade.MIP generative model. We can devise a simple generative stochastic process for data generation that adheres to the aforementioned assumptions. First, we introduce the hyperparameters that govern such a generative process. We define to control the polarization of the nodes’ opinions: high values of lead to polarity which are closer to the extremes (i.e., being closer to or ); lower values lead to polarities closer to neutrality (i.e., being closer to ). We also define and to control the generated distribution of the interests. Finally, we define as the prior distribution of the topics of items. Then, given an underlying graph , we take the following steps to generate a propagation dataset.
[backgroundcolor=gray!10,roundcorner=10pt]

Draw interests for each and topic .

Draw polarities for each and topic .

Generate an item :

Draw its topic distribution .

Draw an initial activated seed uniformly at random from .

For each arc s.t. activated and has not yet seen the item:

Node sees the item from .

Draw the item topic according to .

is interested in the topic with probability . If it is not interested, then the propagation fails.

If it is interested, draw the attitudes of and on as Bernoulli variables with probability and .

If the attitudes are equal, activates on the item.


Within the main loop each active node tries to activate its neighbors, and the process stops when no further nodes become active. For simplicity, we presented the model assuming that each active node has an equal chance of activating its neighbors: this corresponds to assuming a uniform prior within Eq. 4.
4. Inference and Learning
Our goal is to have an algorithm that given a set of items represented in a dimensional topic space, is able to estimate the interests and the polarities of each node—that is, our ideological embeddings. We do so by maximizing the likelihood of our model w.r.t. , given all and .
We can easily see the problem of maximizing the likelihood we defined in the previous section as a learning problem, in which we predict whether a certain node will activate or not. To obtain a scalable learning algorithm, we would like to treat each possible propagation of an item from to as an independent example. To do so, we resort to an approximation of the likelihood in Eq. 1. First, for the active users, we notice that, by Jensen inequality: