Polarization in Geometric Opinion Dynamics

In light of increasing recent attention to political polarization, understanding how polarization can arise poses an important theoretical question. While more classical models of opinion dynamics seem poorly equipped to study this phenomenon, a recent novel approach by Hązła, Jin, Mossel, and Ramnarayan (HJMR) proposes a simple geometric model of opinion evolution that provably exhibits strong polarization in specialized cases. Moreover, polarization arises quite organically in their model: in each time step, each agent updates opinions according to their correlation/response with an issue drawn at random. However, their techniques do not seem to extend beyond a set of special cases they identify, which benefit from fragile symmetry or contractiveness assumptions, leaving open how general this phenomenon really is. In this paper, we further the study of polarization in related geometric models. We show that the exact form of polarization in such models is quite nuanced: even when strong polarization does not hold, it is possible for weaker notions of polarization to nonetheless attain. We provide a concrete example where weak polarization holds, but strong polarization provably fails. However, we show that strong polarization provably holds in many variants of the HJMR model, which are also robust to a wider array of distributions of random issues – this indicates that the form of polarization introduced by HJMR is more universal than suggested by their special cases. We also show that the weaker notions connect more readily to the theory of Markov chains on general state spaces.

Authors

• 5 publications
• 59 publications
• 8 publications
11/17/2017

12/12/2017

Robust Fragmentation Modeling of Hegselmann-Krause-Type Dynamics

In opinion dynamics, how to model the enduring fragmentation phenomenon ...
09/08/2021

Contrarian effect in opinion forming: insights from Greta Thunberg phenomenon

In recent months the figure of Greta Thunberg and the theme of climate c...
09/03/2020

Quasi-synchronization of bounded confidence opinion dynamics with stochastic asynchronous rule

Recently the theory of noise-induced synchronization of Hegselmann-Kraus...
05/26/2019

Discrete Opinion Dynamics with M choices

Here, I study how to obtain an opinion dynamics model for the case where...
07/29/2019

Phase Transitions of Best-of-Two and Best-of-Three on Stochastic Block Models

This paper is concerned with voting processes on graphs where each verte...
08/04/2017

Identification of Probabilities

Within psychology, neuroscience and artificial intelligence, there has b...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

People’s opinions naturally evolve in response to a variety of external factors, like interactions with other individuals, campaign messaging, media reports, political identities, and random events. Understanding how opinions evolve and the emergent qualitative features of these dynamics remains a subject of immense interest in the computer science, economics, and social science communities.

In recent years, a crucial phenomenon of interest is that of polarization

, where agents roughly partition into groups holding diametric views. That is, rather than agents holding a rich spectrum of beliefs, individuals instead typically belong to opposite clusters even in cases where beliefs on separate topics ostensibly ought not be correlated. This phenomenon is somewhat widely observed to have accelerated in the last couple of decades; for instance, Gentzkow, Shapiro, and Taddy show that political partisanship is more easily inferrable in recent years than previously from textual analysis using machine learning methods

[gentzkow2016measuring]. While polarization and issue realignment is perhaps most familiarly observed along political dimensions (for instance, the correlation between beliefs on climate change and gun rights), these effects are not limited to just these facets [dellaposta2015liberals].

A prototypical way to model the evolution of the beliefs of agents, and thereby address phenomena like polarization, is to assume that the opinion of each agent is scalar- or vector-valued and evolves according to some prescribed update rule. Many classical models, like the DeGroot and Friedkin-Johnsen models

[degroot1974reaching, friedkin1990social], study opinion evolution as a form of social learning where agents update beliefs based on the graph-weighted opinions of their neighbors within an ambient network. However, the original forms of these models do not appear to provide a satisfactory mechanism to explain polarization, as the dynamics provably lead to less polarization than in the initial configuration (see Section 1.2 for a larger discussion).

To understand how polarization can arise in a more intrinsic way, Hązła, Jin, Mossel, and Ramnarayan (HJMR) recently introduced a novel and simple geometric model of opinion dynamics [DBLP:journals/corr/abs-1910-05274]. In their model, each agent’s opinion is a unit length vector where each dimension represents a particular relevant political axis. Their dynamics work as follows: at each time, a new random direction representing a new issue, political figure, or influencer is drawn from some fixed distribution. In response, each agent evaluates the correlation between their current opinions and this new issue and moves either toward or away from this new direction depending on the strength and sign of this correlation. The resulting vector is then renormalized to unit length (see Section 2 for the exact model). These random directions model the natural intuition that opinion evolution is often driven by concrete, possibly random events. For instance, opinions potentially change dramatically in response to campaign messages or candidates during election seasons, when new political coalitions form and ideologies can themselves prove malleable to accommodate the new compositions of these groups. Because each vector is normalized, polarization has a very clean geometric interpretation: a set of opinion vectors is polarized if they are all equal up to sign.

One of the key findings of HJMR is a proof that these simple dynamics intrinsically and strongly polarize

when restricted to a set of specialized settings. Concretely, when opinions are two-dimensional (so lie on the unit circle) and the new issues are drawn uniformly at random from the circle, or when vectors have arbitrary dimension but new issues correspond to one of two “duelling influencers” that are sufficiently close and drawn with equal probability, then almost surely the distance between any two starting opinions will converge to 0 or 2 via these random dynamics; in other words, almost surely any two agents will have opinions either converging to each other or to the negative of each other so that they are equal up to sign.

However, it is not at all clear whether or not this almost sure polarization is specific to their exact model formulation: for instance, their technique to show strong polarization in the two-dimensional case with uniform random directions does not obviously generalize to higher dimensions, nor to non-uniform distributions—without this high degree of symmetry, it is not clear whether the dynamics still model polarization (see

Section 1.1 for more discussion). Moreover, their dynamics are completely oblivious in the sense that each agents’ opinions evolve randomly in response to (the same) new issues but otherwise have no social influence over each other. A natural and important question in this framework is thus whether, or to what extent, polarization can occur more generally in related variants that do not satisfy these stringent constraints.

To address this, we can abstractly define geometric opinion dynamics more generally as follows: the key components of the HJMR model are that (i) the dynamics are random to model the impact of new political issues that arise, and in fact form a Markov chain on the hypersphere, and (ii) the set of polarized vectors is invariant under the dynamics. This latter property is of course necessary to prove that polarization occurs, while the former provides a convenient mathematical framework to analyze these dynamics. This motivates the following generalization of their dynamics:

Definition 1.1.

Let be an arbitrary, but otherwise fixed parameter denoting the number of agents, and let denote the dimension of opinions. In a general model of geometric opinion dynamics, the evolution of each agent’s opinion follows a discrete-time Markov chain in given by a recurrence of the following form:

 X(i)t+1∝X(i)t+fi(X(1)t,…,X(n)t,ξt), (1)

where is a given starting configuration of opinions,

is drawn i.i.d. over time (and independent of all other random variables) from some distribution

, and each is an explicit, fixed updating function.

The implicit normalization in this formulation ensures that each opinion returns to the unit sphere; in particular, we assume that the unnormalized quantity above is nonzero so that the projection onto the sphere is well-defined. The joint dynamics of this system thus take place as a Markov chain in . In such models, the random vector again denotes a common, random stimulus that each agent in the system updates opinions with respect to, but now possibly also as a function of the opinions of the other agents in the system. For example, may model a particular issue or figure during election season that splits society into two camps, “for” and “against,” and opinions update according to these coalitions. Crucially, we also assume that the distribution that these vectors are drawn from remains fixed throughout time so that these dynamics form a time-homogeneous Markov chain.

With this definition, we extend the results of HJMR in this paper by pursuing a more systematic study of polarization in these general geometric opinion dynamics models that take the form of creftypecap 1. The motivating questions we consider are: does strong polarization hold more generally, or is it specific to the HJMR model formulation? Can strong polarization be shown with nontrivial network interactions, thereby incorporating a key feature of models like the DeGroot and Friedkin-Johnsen dynamics? Are there other interesting notions of polarization that hold in these settings, even if the strong form does not hold?

We show that whether polarization holds in such models is rather surprisingly nuanced and requires extending the notion of polarization beyond what is proven in HJMR. In particular, we show that there exist models that weakly polarize in a formal sense, but provably do not satisfy the stronger form in HJMR. We show that these weaker forms of polarization are connected to the existence of nontrivial invariant distributions on the induced Markov chain. Nonetheless, we also prove that strong polarization holds for nontrivial variants of the HJMR model which are also robust to more general distributions of update vectors—in fact, we show that this holds in a model that has network interactions in addition to the random updates. We hope that some of our techniques will prove useful in studying polarization in such models more generally.

1.1 Overview of Results and Techniques

We contribute to the theoretical understanding of polarization in geometric opinion models initiated in [DBLP:journals/corr/abs-1910-05274]. In Section 2, we begin by identifying, though not necessarily requiring, several natural natural properties that such dynamics might satisfy and their relation to the problem of polarization. These will be convenient in our later arguments. We also define formally several distinct geometric opinion models, including the HJMR model, that will be the subject of our analysis later on.

Our main results in Section 3 show that strong polarization is more universal in such models than perhaps it would appear from the specialized cases studied by HJMR. In particular, we show that the strong form of polarization exhibited by HJMR in restricted versions of their model holds in two other variants we consider. The first model we study, which we term the signed HJMR model, considers variant of the HJMR dynamics that replaces an inner product with a sign. With these altered dynamics, we show that strong polarization holds in any dimension and is robust to the choice of distribution on update vectors so long as it is sufficiently close to the uniform measure on the sphere. We also show that the same holds in the party model, which notably incorporates network effects where agents exert influence over each other—such effects are not present in HJMR and it is not clear how to apply their methods to this setting.

To prove our results in Section 3, we resort to conceptually different techniques from those of HJMR. Their work shows strong polarization primarily by appealing to either martingale convergence (in the case of and with a uniform distribution), which relies on rather delicate symmetries, or deterministic contraction (in two-point models representing “duelling influences”, where the challenge of proving contraction is geometric rather than probabilistic). To establish strong polarization in these variants, we appeal to more general zero-one laws to first simplify the problem to showing just that there is some

nonzero probability of strongly polarizing. We obtain our main results from this quite general reduction by employing clean probabilistic and geometric estimates to give the required uniform bounds in our analysis to prove strong polarization.

Our next main contribution is to connect the polarization phenomenon to the general theory of Markov chains in uncountable state spaces in Section 4

. In general, Markov chains in such spaces have considerably more delicate behavior than with countable or finite state spaces. To do so, we define weaker notions of polarization than the strong form considered previously; these notions are motivated by standard notions of convergence from probability theory. We first show that in most cases, whether these weaker forms of polarization hold in geometric opinion dynamics is in fact equivalent to the existence of nontrivial invariant distributions of the Markov chain defined by the dynamics. We then turn to showing the utility of these more nuanced notions of polarization with a concrete example: we conclude by proving that the original HJMR model with orthonormal updates satisfies our notion of weak polarization, but provably

does not satisfy strong polarization—in fact, we show that almost surely, almost every starting configuration will not strongly polarize. We do so by connecting these dynamics to pathwise properties of infinite balls-in-bins processes, which we analyze by applying yet another zero-one law of Hewitt and Savage.

1.2 Related Work

Our work broadly relates to the vast body of theoretical work in opinion dynamics that studies concrete models wherein agents update numerical beliefs and the resulting properties of these dynamics; for a comprehensive survey of this style of work, see [castellano2009statistical]. The particular geometric model of opinion dynamics considered by Hązła, Jin, Mossel, and Ramnarayan [DBLP:journals/corr/abs-1910-05274] motivates many of the considerations in this work. To our knowledge, their work is first to introduce the dynamics in creftypecap 1 in the specific form given below in creftypecap 2. We are not aware of related work exploring polarization in variants of their model.

Polarization more generally has been studied in other models of opinion dynamics, typically in the more well-understood DeGroot and Friedkin-Johnsen dynamics mentioned above. Both models assume that agents update opinions according to the graph-weighted average of their neighbors, and both of these models have been extensively studied for their analytical elegance, tractability, and deep connections to well-studied topics like finite-state Markov chains and random walks [golub2010naive, DBLP:conf/sdm/GionisTT13]. However, an idiosyncratic feature of these models is that they are often inherently contractive: opinions get closer due to the dynamics—for instance, in the DeGroot model, agents will necessarily converge to a common limiting opinion under very mild connectedness assumptions. Therefore, the price of the analytic simplicity of this model is that no nontrivial polarization can occur. The Friedkin-Johnsen model typically will not exhibit perfect consensus, but the network effects of the process nonetheless cause initial opinions to contract closer to a common consensus. This suggests that these models simply are not well-equipped to offer a mechanism for this phenomenon.

Recent theoretical work has attempted to combine the analytically desirable features of these models with questions of polarization by incorporating new elements to the model. For instance, behavioral biases like biased assimilation, which underlies the intuition behind the HJMR model, have also been fruitfully studied in the context of DeGroot dynamics [DBLP:journals/pnas/DandekarGL13]. Motivated by recent events, a related line of work has also attempted to understand polarization by incorporating the influence of external actors or platforms into the dynamics. Polarization then arises either explicitly or implicitly because of the (possibly orthogonal) objectives of these external parties [DBLP:conf/www/MuscoMT18, DBLP:conf/wsdm/ChitraM20, DBLP:conf/sigecom/GaitondeKT20, immorlica]. A related model by Hegselmann and Krause [hegselmann2002opinion] circumvents near-consensus by allowing agents to filter out overly dissimilar opinions before averaging at each step, but already this modest extension incurs a high cost: many basic questions about convergence remain open in this model, greatly complicating a more extensive understanding of how polarization arises [DBLP:conf/innovations/BhattacharyyaBCN13]. Moreover, the polarization in this model is somewhat built-in by requiring that far opinions stop responding to each other. While these lines of work have shed significant light on how polarization can arise in such models, and certainly such forces can help explain an empirical increase in polarization in the last few decades, the polarization that arises does not arise organically via the original dynamics.

As we discuss in Section 4, geometric models of opinion dynamics can be viewed as a special case of the more general theory of Markov chains in non-discrete state spaces. The results in this setting are significantly more involved than corresponding results on discrete state spaces; an accessible reference to some important results in this area can be found in the surveys of Hairer [hairer2006ergodic, hairer2010convergence]. As we show, polarization is related to the set of invariant distributions on the Markov chain induced by the dynamics. Numerous techniques have been developed to determine the uniqueness of an invariant distribution in these settings, as well as the rate and mode of convergence to it if possible (see for instance, [butkovsky2014subgeometric]). It would be interesting to find new ways to apply results from this area to provide quantitative bounds on the convergence to polarization when one can prove strong polarization holds.

2 Preliminaries

As stated above, we are primarily interested in the polarization properties of these random processes given by creftypecap 1. Throughout this paper, we reserve to denote the number of agents and to denote the dimensionality of opinions. We write for the standard -norm and . We will write for the projection onto the unit sphere, i.e. . We also write for the th standard basis vector in , where will be clear from context. For a set of vectors , we define }. We write for the angle between and .

For given , we then define to be the set of diagonal opinion vectors, i.e. the set of elements of the form for a single vector . For a sign vector , define . Finally, we define the set of polarized vectors by In other words, is the set of tuples of vectors such that each vector is equal to the rest up to sign. This definition allows for consensus as a special case, but in many settings, consensus or near-consensus is exponentially unlikely (see Remark 3.3).

For a subset in some Euclidean space and a point , we define to denote the distance from a point to a set with respect to the Euclidean metric. With this in mind, we define strong polarization of a geometric opinion dynamics model as follows:

Definition 2.1.

Let be a discrete-time Markov chain as given by creftype 1. Then strongly polarizes (from ) if almost surely, ; that is, the distance between and the set of polarized vectors converges to zero almost surely.

There are several natural properties that one might desire in these dynamics. Below, we consider, though do not require, the following properties which abstracts those of the original HJMR model:

Definition 2.2.

If each function is continuous from to for each fixed , then we say the dynamics are continuous.

We will show in Section 4 that continuity implies various desirable properties for the dynamics. Note that if this is the case, combined with the fact that we assume the unnormalized update rule is always nonzero, it follows that the dynamics are jointly continuous as transitions from to itself. This follows because the the map taking the joint opinion vector to the unnormalized opinion vectors is continuous and nonzero in every coordinate, which is then composed coordinatewise with a continuous map on , and so is continuous.

Definition 2.3.

The dynamics are sign-invariant if, for all , the function is odd with respect to , but even with respect to the other arguments (i.e. with respect to and ).

Sign-invariance implies each agent reacts to the random update vector and the others the same regardless if any are negated. This feature will be present in all of the models we consider below, including the original HJMR model. The intuition behind sign-invariance is that from the perspective of each agent, if he or she were to react “positively” to the new issue or a different opinion, she would react “negatively” to the negative of that issue or different opinion—sign-invariance thus asserts that these reactions are balanced.

Definition 2.4.

If when and , we say that the dynamics are symmetric, as the updates do not depend on the identities of the agents.

Definition 2.5.

If each function does not depend on for each (so that depends only on and ), then we say the dynamics are oblivious.

In this case, each component of the above process follows a Markov chain, and the joint dynamics form a particular coupling where each component responds to the same update vector. However, any polarization that arises happens indirectly because agents do not influence each other.

In this work, we treat such models in relatively full generality and also specialize to particular models where more specific techniques can establish various desirable properties. The concrete examples we will consider are listed below:

Definition 2.6 (HJMR Model).

In the HJMR Model [DBLP:journals/corr/abs-1910-05274], the update for each agent takes the following form for some fixed scalar :

 X(i)t+1∝X(i)t+η⋅⟨X(i)t,ξt⟩ξt, (2)

where is drawn i.i.d. over time from some distribution on . In words, each agent moves in the (signed) direction of the random update vector proportionally to the correlation with their current opinion, and then renormalizes.

Note that this model satisfies continuity, sign-invariance, obliviousness, and symmetry (assuming is a constant over all agents).

Definition 2.7 (Signed HJMR Model).

In the signed HJMR model, the update rule in creftypecap 2 is amended to

 X(i)t+1∝X(i)t+η⋅sgn(⟨X(i)t,ξt⟩)ξt. (3)

Here, we define , but in our applications below, we will assume is drawn from a continuous distribution so that almost surely . In this case, the amount the vector updates does not depend on the correlation. This choice is intended to model elections, where one is in favor either towards or against a particular candidate and is drawn “all-or-nothing” towards or against the views of this candidate. This model is not continuous due to the sign function, but is still sign-invariant, oblivious, and symmetric (assuming is a constant over all agents).

Definition 2.8 (Party Model).

Suppose each agent has multipliers where measures the influence of agent on agent . The party model is defined by:

 X(i)t+1∝X(i)t+⎛⎜ ⎜⎝∑j∈[n]:% sgn(⟨X(j)t,ξt⟩)=sgn(⟨X(i)t,ξt⟩)η(i)jX(j)t−∑j∈[n]:sgn(⟨X(j)t,ξt⟩)≠sgn(⟨X(i)t,ξt⟩)η(i)jX(j)t⎞⎟ ⎟⎠. (4)

That is, each agent moves towards the vectors that came on the same “side” of the random issue , and away from those on the opposite “side” of the issue. While this latter assumption may appear non-obvious, there is considerable empirical evidence for the sociological principle that “out-group conflict builds in-group solidarity” [mccallion2007groups]: when a binary issue creates disagreement within a collection of people, the people on each side of the disagreement move toward those they agree with and away from those they disagree with [fisher2016towards, sherif-harvey-white-hood-sherif61]. Once again, this model is not continuous due to the sign function, and also is not oblivious as clearly the update rule depends on the values of the other agents. However, it remains sign-invariant, as negating one’s own opinions interchanges the sums, and flipping either or any other agents opinions only permutes summands.

For any sort of polarization to arise, it is natural to ensure that the dynamics are such that if the vector are completely polarized, then they will surely remain so. We provide one simple condition that is easily verified111Note that the converse of Lemma 2.9 trivially fails: one can simply ensure the dynamics are invariant on each such set and otherwise define them arbitrarily.:

Lemma 2.9.

Suppose that some geometric opinion dynamics satisfy symmetry and sign-invariance. Then is invariant under the transitions for every .

Finally, note that if the dynamics are oblivious and strongly polarizes for agents, then it must do so for any finite by a simple union bound. Note that oblivious and symmetric dynamics ensure that the process is well-defined for any number of agents.

Lemma 2.10.

Suppose that the opinion dynamics are symmetric and oblivious. Then if any form of convergence holds for agents, the same holds for any finite number of agents.

3 Models with Strong Polarization

In this section, we prove the strong polarization of the signed HJMR model and the party model. To do this, we first establish a simple, but powerful general principle that will significantly simplify the analysis that has the following simple intuition: suppose momentarily that geometric opinion dynamics were a finite-state Markov chain and that “polarization” is an absorbing state of the Markov chain. Then from standard and simple Markov estimates, so long as it is possible to reach the state “polarized” from any starting point in some fixed finite number of steps, an easy calculation shows that almost surely the Markov chain will become “polarized.” In that case, it would suffice to show that from any starting state, there is some nonzero probability of reaching “polarized” from any starting state in some finite number of steps.

In general, this idea is not so simple to formalize because the geometric opinion dynamics lie in a non-discrete state space, and moreover, it is often only possible to reach asymptotically, not in any finite number of steps. However, by appealing to more general zero-one laws, we show that this intuition nonetheless holds:

Theorem 3.1.

For any geometric model of opinion dynamics that satisfies the Markov property, the following are equivalent:

1. For every choice of starting vector ,

2. For every choice of starting vector , for some constant .

Proof.

One direction is trivial, so we assume the second condition. Consider the dynamics started at any choice of starting vector , and consider the event Let be the filtration generated by the random updates up to time and let be the filtration generated by all of them. By standard arguments, .

For each , define . As

is an indicator random variable, Lévy’s upward theorem (Theorem 4.2.11 of

[durrett2019probability]) implies that almost surely. But observe that by the Markov property, , where gives the dynamics started at . In particular, pointwise. Because almost surely and is bounded from below almost surely by a strictly positive quantity, the only way this can happen is if almost surely (over the realizations of the ), so that holds almost surely. As was arbitrary, this completes the harder direction. ∎

3.1 Strong Polarization in Signed HJMR Model

Our first main result is that strong polarization holds in the signed HJMR model with a common value of , and that this holds for a general class of distributions:

Theorem 3.2.

Suppose there are agents in the signed HJMR model given by creftypecap 3 where each is drawn i.i.d. from a distribution that is equivalent to the uniform (Haar) measure , i.e. there exists such that for every measurable set , . Then this system strongly polarizes from any choice of starting vector .

Remark 3.3.

It can be shown that for any sign-invariant dynamics that strongly polarizes, if is drawn uniformly from , then each possible clustering is equally likely even conditioned on the sequence almost surely. In particular, the probability over starting configurations and the random updates of consensus is exponentially small in .

To set up this result, we establish a sequence of lemmas that will prove useful. We begin with a simple geometric fact:

Lemma 3.4.

Let and suppose is such that for some . Then

Proof.

We may assume , as otherwise the claim is trivial. Consider the arrangement of vectors in the plane spanned by forming a triangle with a vertex at the origin and adjacent sides . By the assumption that these vectors have length at least , scaling this triangle by a factor of ensures that and continue to have at least unit norm, and the distance between them is at most . As projection onto is a contraction in Euclidean distance for vectors of length at least , it follows that

 ∥PSd−1(x+z)−PSd−1(y+z)∥2=∥PSd−1(r(x+z))−PSd−1(r(y+z))∥2≤r∥x−y∥2.

Next, we show that with some constant probability, a random vector drawn from will not split two vectors that form an acute angle, and that the probability of splitting at all tends to zero as the distance tends to zero.

Lemma 3.5.

There exists constants depending on and the measure of equivalence such that the following holds: suppose that satisfies . Then the probability that satisfies:

1. , and

2. for ,

is at least .

Proof.

We first show this when is drawn uniformly from the sphere and then simply change constants when moving to any measure that is equivalent to Haar measure. But this is clear: the probability of satisfies the first property is at least under these assumptions, as the direction of in the plane spanned by is itself uniform and using the acuteness of the two vectors. Moreover, the distribution of does not depend on by uniformity, so we may choose small enough so that the probability of the second property is at least for any fixed . By a union bound, it follows that the probability has the desired properties is at least under Haar measure. Under the true, equivalent distribution, the probability is thus at least . ∎

Lemma 3.6.

Under the assumptions and notation of Lemma 3.5, the probability that is at most and , where the implicit constant depends only on and the measure of equivalence.

Proof.

We first show this for Haar measure. By an analogous argument, the set of vectors with the desired property have directions lie in a band of width in the plane spanned by , and therefore has probability at most and by uniformity. For the true distribution, this can increase by a factor of at most , which we may absorb into the implicit constant. ∎

With this result in hand, we turn to the proof of the main theorem:

Proof of Theorem 3.2.

We begin with a series of reductions that simplifies the problem. First, because these dynamics are oblivious, we observe that by Lemma 2.10 it suffices to consider the case with an arbitrary starting vector . Next, by sign-invariance of these dynamics, we may assume that by possibly flipping the sign of one of the vectors and noting that both the dynamics and the set are invariant under these sign changes. Note that this implies that the two starting vectors form an acute angle. Finally, by Theorem 3.1, it suffices to show that the probability that is bounded below by some nonzero constant , uniformly over the choice of starting vector (though assuming nonnegative inner product).

We use the notation of Lemma 3.5. Note that the good event of Lemma 3.5 and the bad event of Lemma 3.6 are disjoint, though not mutually exhaustive. We claim that by the craps principle, the probability of encountering a random update satisfying the good event before the bad event is at least . Indeed, while an update need not satisfy either event, on the complement of the bad event, is nonincreasing so long as the bad event does not occur as the unnormalized lengths increase by sign-invariance with respect to (so that we may assume both signs are positive) and contractions decrease distances. The claim then follows from the crap’s principle, Lemma 3.5, and Lemma 3.6.

Moreover, on this event, the distance between and decreases by a factor of where by Lemma 3.4 (using ) and the fact that the inner products are bounded below on the good event. It follows that on this event, the new distance between vectors is at most . By the strong Markov property, we may iterate this argument to show that the probability of the good event occurring before the bad event is now at least where we absorb the constant . It follows that the probability that the good event occurs infinitely often without the bad event is at least

 ∞∏k=0⎛⎝1−O⎛⎝∥X(1)0−X(2)0∥2(1+ϵ)k⎞⎠⎞⎠≥∞∏k=0(1−O(1(1+ϵ)k)),

where we simply upper bound in the inequality. Note that if this occurs, then Lemma 3.4 show that this implies that as the distance is nonincreasing and geometrically decreases infinitely often. From standard analysis,

 ∞∏k=0(1−O(1(1+ϵ)k))>0⟺∞∑k=0O(1(1+ϵ)k)<∞,

and the latter is clearly true as a geometric series. Moreover, these lower bounds are uniform over the value of . By the reductions above, this completes the proof. ∎

3.2 Strong Polarization in the Party Model

We now turn to proving strong polarization in the party model. One complicating factor is that these dynamics are not oblivious, unlike the other models where strong polarization is known. Therefore, we have to reason about multiple vectors acting on each other at the same time.

To set up the formal statement of the result, we need the following definition: for any given set of nonnegative coefficients , let be the (directed) adjacency matrix defined by

 Aij={1if η(i)j>00otherwise.

In other words, we consider the directed graph with a directed edge from to if influences . We say that is irreducible if componentwise. This is equivalent to the existence of a directed path between any two agents and . Our main result of this section is as follows:

Theorem 3.7.

Suppose there are agents in the party model where each is drawn i.i.d. from a distribution that is equivalent to the uniform (Haar) measure , i.e. there exists such that for every measurable set , . Moreover, suppose that is irreducible. Then this system strongly polarizes from any choice of starting vector .

To prove this, we follow a similar high-level plan as that of Theorem 3.2. By sign-invariance, we will assume via Lemma 3.8 that each component of

lies on one side of a hyperplane with margin strictly bounded below by zero regardless of the individual configuration. We then construct a potential function that is equivalent to the maximum angle between any two agents with the property that, assuming the dynamics do not split up the vectors on the next

iterations, it is guaranteed to decay by some factor strictly bounded above by . Since, as we will again see, the probability that a random vector splits up any two components is essentially bounded by the maximum angle between any two components and this quantity is decreasing geometrically, it will follow that the probability of strong polarization is bounded below uniformly. We may then again conclude via Theorem 3.1 that the system strongly polarizes from any starting configuration.

We now carry out this plan. First, we show that there always exists a signing of the starting configuration such that all vectors lie on one side of some hyperplane with nontrivial margin:

Lemma 3.8.

For all , there exists a constant such that for any , there exists such that for all .

Proof.

Simply choose such that the probability of a random unit vector drawn from Haar measure does not satisfy the condition for a given is at most and then apply a union bound to conclude there exists such a vector. Note that this choice of indeed depends on , but not on the choice of vectors as the distribution of the inner product of a uniformly random vector on the sphere with a fixed vector does not depend on the identity of this fixed vector. ∎

Next, we proceed with several purely geometric results that will enable us to show contraction of a suitable potential function.

Lemma 3.9.

Let all lie strictly on one side of a hyperplane in . Define by

 y∈argminv∈Sd−1maxj∈[n]∠(v,xj).

Note exists as the objective function is continuous and the constraint set is compact. Let be the orthogonal subspace to and let be the orthogonal projection onto . Then .

Proof.

Observe that the desired claim is implied by . To see this, suppose that this holds: suppose there exists such that . Note that not all are zero as has unit norm by construction. Then applying to both sides, we deduce that

 0=PH(y)=n∑i=1αiPH(xi).

As , we may renormalize these conefficients so that their sum is one to deduce that . Moreover, note that can equivalently be defined via . This holds as the inner product is a monotonically decreasing function of angle on and so the optimization problems are equivalent.

Therefore, suppose that . By the separating hyperplane theorem, there exists a unit vector such that for all but . Write in an orthogonal basis that includes . Then the coordinate with respect to for each of the is strictly positive, while the coordinate for is strictly negative. Therefore, by reflecting about , we obtain unit such that for all , contradicting the optimality of . ∎

Lemma 3.10.

Let satisfy for all and . Then

 ∥∥ ∥∥1nn∑i=1zi∥∥ ∥∥2≤1−1n.
Proof.

We consider two cases:

1. First, suppose that ; without loss of generality, suppose . Then

 ∥∥ ∥∥1nn∑i=1zi∥∥ ∥∥2=∥∥ ∥∥1nn∑i=2zi∥∥ ∥∥2

As is trivially in the convex hull no matter the choice of , the convexity of the Euclidean norm implies that this latter quantity is optimized for for some unit vector . This has norm as needed.

2. Now suppose that . We claim that there exists some such that setting does not decrease the desired quantity, thereby reducing to the previous case.

To see this, suppose that this is false. In particular, for every ,

 ∥∥ ∥∥1nn∑i=1zi∥∥ ∥∥2>∥∥ ∥∥1n∑i≠jzi∥∥ ∥∥2⟺⟨zj,∑i≠jzi⟩>0.

This in turn clearly implies that

 ⟨zj,n∑i=1zi⟩>0. (5)

By assumption, we may write for some nonnegative scalars summing to . Multiplying creftypecap 5 by for each and summing, we obtain

Next, we show that the irreducibility of implies that there is geometric decay in the minimal angle in each steps that the dynamics do not split the vectors. We start with the following crude, but intuitive lemma:

Lemma 3.11.

Let be any set of vectors all lying strictly on one side of a hyperplane with margin (i.e. there exists unit satisfying for all ), and consider the update rule given by creftypecap 4 where the second summand is empty.

Suppose that we apply this update rule times to obtain . Then, there exists such that

for some row-stochastic matrix

depending on satisfying

 M=ϵΠ+(1−ϵ)Q, (6)

where every entry of is equal to and is some arbitrary stochastic matrix.

Remark 3.12.

In the above matrix equation, we interpret as a matrix where the th row is . Moreover, the point of the lemma is that depends on , but not on the starting configuration.

Proof.

We show the following claim by induction: for each , there exists such that where is a row-stochastic matrix such that if , where is the set of nodes reachable from in steps in the directed graph induced by above. This clearly implies the claim by setting noting that by the fact that this matrix has ones on the diagonal.

For the base case , by definition (and absorbing the term into the multiplier): By dividing by , we clearly obtain the claim with . Note that this is independent of .

Now suppose it holds for some so that . By applying the base case to (noting that still lies on the same side of the hyperplane by convexity with same margin),

 xk+1∝M′1xk∝(M′1Mk)x0.

By the induction hypothesis, for all , while for all . For any , there exists some such that and . From the definition of matrix multiplication, if , it follows that . Letting and noting the product of stochastic matrices is stochastic, the claim follows. ∎

Finally, we show that one can get the geometric rate of convergence with respect to a natural potential function.

Lemma 3.13.

Let all lie strictly on one side of a hyperplane in with margin at least . Define by

 Φ(x1,…,xn)=minv∈Sd−1maxi∈[n]∠(v,xi). (7)

Let be the updated vectors after iterations of the update rule in creftypecap 4 when the second summand is empty. Then there exists such that

 Φ(x′1,…,x′n)≤(1−c)Φ(x1,…,xn).
Proof.

Let be an optimizer in creftypecap 7 and let be the orthogonal complement. Let be the corresponding orthogonal projection onto and be the orthogonal projection onto . Note that for any vector , . By Lemma 3.11, we have for some stochastic matrix such that .

Now, observe that by the definition of and from elementary geometry, while for all . Now by definition,

 Φ(x′1,…,x′n)≤maxi∈[n]∠(y,x′i)=maxi∈[n]∠(y,(Mx)i),

as angles do not change under positive scaling.

Therefore, we consider the (unnormalized) set of vectors . By linearity and the fact is row-stochastic, we still have . On the other hand,

 ∥PH(Mx)i∥=∥ϵPH(¯¯¯x)+(1−ϵ)PH~xi∥, (8)

where and is some arbitrary convex combination. We thus have by linearity

 ∥PH~xi∥≤sin(Φ(x1,…,xn)). (9)

Moreover, again by linearity, . Recall that by Lemma 3.9, and therefore by scaling and applying Lemma 3.10 with ,

 ∥PH¯¯¯x∥2≤(1−1n)sin(Φ(x1,…,xn)). (10)

Putting it all together, we now have by monotonicity of the function on and the triangle inequality with creftypepluralcap 10, 9 and 8 that

 tan(Φ(x′1,…,x′n)) ≤maxi∈[n]∥PH(Mx)i∥2∥Py(Mx)i∥ ≤ϵ(1−1n)sin(Φ(x1,…,xn))+(1−ϵ)sin(Φ(x1,…,xn))cos(Φ(x1,…,xn)) =(1−ϵn)tan(Φ(x1,…,xn)).

We may assume that for some small enough by the assumption that lie on one side of a hyperplane with margin . The function has derivative bounded between and some constant depending on on the interval and thus by Lemma 3.14, the above geometric decay of implies that for some constant . ∎

Lemma 3.14.

Let be a differentiable function such that and for some constant . If , then for .

Proof.

Observe from the Mean Value Theorem, the assumption, and the fact from the derivative condition that

 y−x≥f(y)−f(x)K≥cf(y)K≥cxK.

It immediately follows that . As for , we obtain the claim. ∎

Proof of Theorem 3.7.

By sign-invariance and Lemma 3.8, we may assume that each component of lies strictly on one side of a hyperplane in with margin at least ; this follows because we may sign the starting configuration arbitrarily, run the dynamics, and then undo the signing by sign-invariance without affecting the polarization properties of the dynamics. We now show that with some constant probability (independent of ), the dynamics monotonically decrease to zero. Moreover, it is easy to see by the triangle inequality that for any vectors ,

 Φ(x1,…,xn)≤maxi,j∠(xi,xj)≤2Φ(x1,…,xn). (11)

Therefore, implies the same of the maximum angle between components and thus implies polarization.

Observe that if the dynamics do not split the vectors on any of the next iterations, Lemma 3.13 implies that for some constant independent of . This occurs with at least some nonzero constant probability (again, depending on , but not on ) by the margin condition, Lemma 3.6, and the equivalence with the standard Haar measure.

Now, recall from Lemma 3.6 and a union bound that the probability that the dynamics split up any given that all lie strictly on one side of a hyperplane with is by creftypecap 11, where the implicit constant depends on and the measure of equivalence. The probability that the sequence is at least the probability that for each . As we have shown that the probability that this fails decays geometrically, it follows that this latter probability is at least

 (12)

for some constant that does not depend on , where the inequality follows from the same standard analysis argument as in Theorem 3.2. This immediately implies that strong polarization holds with constant probability on the restriction of the sequence to each steps. To extend this to the whole sequence, by inspecting the proof of Lemma 3.13, it is easy to see that if the vectors are not split at some time , then for every , not just on the subsequence (though we may not have strict contraction). In particular, the event that the dynamics never split up the vectors implies , and this holds with constant nonzero probability depending on just . By creftypecap 11, creftypecap 12, and Theorem 3.1, we conclude the result. ∎

4 Weak Polarization and Markov Chains

In the previous section, we showed that the strong polarization observed by HJMR extends to nontrivial variants of geometric opinion dynamics. However, we caution that it is not generally true that just any geometric opinion model, even one that is continuous and -invariant, will strongly polarize from an arbitrary configuration of starting opinions. As a trivial example, suppose that the updates are such that they simply apply a common orthogonal transformation to each vector at each time. This is clearly continuous and -invariant, but obviously does not lead polarization except in very pathological examples.

While these trivial examples show that strong polarization does not necessarily hold in such models, it is natural to wonder if it is possible for other dynamics to satisfy other forms of polarization even if they do not satisfy strong polarization. In this section, we consider weaker forms of polarization than the strong form from Definition 2.1 (restated as part (1) in the definition below):

Let