1. Introduction
With the rise of social networks like Twitter and Facebook, people increasingly receive news through nontraditional sources. One recent study shows that twothirds of American adults have gotten news through social media (Shearer and Gottfried, 2017). Such news sources are fundamentally different than traditional ones like print media and television, in the sense that social media users read and discuss news on the same platform. As a consequence, users turning to these platforms for news receive information not only from major publications but from others users as well; in the words of (Allcott and Gentzkow, 2017), a user “with no track record or reputation can in some cases reach as many readers as Fox News, CNN, or the New York Times.” This phenomenon famously reared its head during the 2016 United States presidential election, when fake news stories were shared tens of millions of times (Allcott and Gentzkow, 2017).
In this paper, we study a mathematical model describing this situation. The model includes a large number of agents attempting to learn an underlying true state of the world (e.g. which of two candidates is better suited for office) using information from three sources. First, each agent receives noisy observations of the true state, modeling e.g. news stories from major publications. Second, each agent observes the opinions of a subset of other agents, modeling e.g. discussions with other social media users. Third, each agent may observe the opinions of stubborn agents or bots who aim to persuade others of an erroneous true state, modeling e.g. users spreading fake news.^{1}^{1}1The term stubborn agents has been used in the social learning and consensus literature to describe such agents; the term bots is used in reference to automated social media accounts spreading fake news while masquerading as real users (Shao et al., 2017). Based on this information, agents iteratively update their beliefs about the true state in a manner similar to the nonBayesian social learning model of Jadbabaie et al. (Jadbabaie et al., 2012). This iterative process continues for a finite number of iterations that we refer to as the learning horizon.
Under this model, two competing forces emerge as the learning horizon grows. On the one hand, agents receive more observations of the true state, suggesting that they become more likely to learn. On the other hand, the influence of bots gradually propagates through the system, suggesting that agents become increasingly susceptible to this influence and thus less likely to learn. Hence, while the horizon clearly affects the learning outcome, the nature of this effect – namely, whether learning becomes more or less likely as the horizon grows – is less clear.
This effect of the learning horizon has often been ignored in works with models similar to ours. For example, our model is nearly identical to that in the empirical work (Azzimonti and Fernandes, 2018), in which the authors show that polarized beliefs can arise when there are two types of bots with diametrically opposed viewpoints. However, the experiments in (Azzimonti and Fernandes, 2018) simply fix a large learning horizon and do not consider the effect of varying it. Models similar to ours have also been treated analytically; for example, (Jadbabaie et al., 2012; Golub and Jackson, 2010; Acemoglu et al., 2010) study nonBayesian learning models similar to ours. However, these works consider a fixed number of agents and an infinite learning horizon and thus also ignore timescale effects.
The main message of this work is that the learning horizon plays a prominent role in the learning outcome and therefore should not be ignored. In particular, we show that the learning outcome depends on the relationship between the horizon and a quantity that describes the number of bots in the system, where both quantities may depend on the number of agents . Mathematically, letting denote the true state and denote the belief about the true state for a uniformly random agent at the horizon , we show (see Theorem 3.1)
(1) 
Here is smaller when more bots are present and 0 is the erroneous true state promoted by the bots. Hence, in words, (1) says the following: if there are sufficiently few bots, in the sense that , learns the true state; if there are sufficiently many bots, in the sense that , adopts the extreme belief 0 promoted by the bots. An interesting consequence arises from taking and , so that
(2) 
In words, a typical agent initially (at time ) learns the true state , then later (at time ) “forgets” the true state and adopts the extreme opinion ! Hence, as illustrated by this example, the chosen learning horizon can lead to drastically different outcomes.
In addition to (1), we prove two other results. First, we consider a regime that falls between the two in (1); namely, the case (note larger implies more bots). In this case, the learning outcome depends on another parameter, which we denote by and which dictates the weight agents place on other agents’ opinions in their belief updates. Here we show
(3) 
The limit in (3) is depicted graphically as a function of in Figure 1, which offers an intuitive interpretation: if an adversary deploys bots in hopes of driving agent opinions to 0, the marginal benefit of deploying additional bots is smaller when is larger (i.e. when more bots have been deployed). In other words, the adversary experiences “diminishing returns”. It is also interesting to fix and consider the limit in (3) as a function of . As , agents place less weight on others’ opinions, and this limit tends to ; in other words, when agents ignore the network (and thus the harmful effects of bots), they learn. On the other hand, as , agents have increased exposure to bots and the limit tends to (interestingly, there is a discontinuity here, i.e. the limiting belief does not fully reach zero as ). Finally, in addition to (1) and (3), we show in Theorem 3.2 that all but agents adopt opinion 0 in a certain subcase of , for some . Hence, Theorem 3.2 is stronger than Theorem 3.1 and applies to fewer cases; we also argue empirically that this stronger result likely fails in other cases.
Before proceeding, we note our particular choice of guarantees that at the learning horizon, an agent’s belief is only influenced by a vanishing fraction of other agents and bots; specifically, those within the agent’s neighborhood in the graph connecting agents and bots (see Section 3.4). This is why the title of our work refers to the learning as “local”. More specifically, our choice of is asymptotically dominated by the mixing time of the random walk on this graph. From an analysis perspective, this means we cannot leverage global properties like the stationary distribution of this walk, in contrast to many works on social learning (see Section 4). In fact, as shown in (Bordenave et al., 2018), this random walk exhibits cutoff
, meaning that at our learning horizon, the distribution of this walk can be maximally far from stationarity. Hence, we cannot even use an approximation of the stationary distribution. Instead, we assume the graph is randomly generated in a manner that guarantees a wellbehaved local structure; we then show that analyzing beliefs amounts to analyzing hitting probabilities of random walks on this random graph. Fundamentally, it is from three regimes of these hitting probabilities that the three regimes in (
1) and (3) arise (see Section 3.2).The remainder of the paper is organized as follows. In Section 2, we define the model studied throughout the paper. We then state and discuss our theoretical results and provide some empirical results in Section 3. Finally, we discuss related work in Section 4 and conclude in Section 5.
Notational conventions: Most notation is standard or defined as needed, but we note here that the following conventions are used frequently. For , we let , and for we let
. All vectors are treated as row vectors. We let
denote the vector with 1 in the th position and 0 elsewhere. We denote the set of nonnegative integers by . We use for the indicator function, i.e. ifis true and 0 otherwise. All random variables are defined on a common probability space
, with denoting expectation, denoting convergence in probability, and meaning almost surely.2. Model
2.1. Learning model
We begin by defining the model of social learning studied throughout the paper. The main ingredients are (1) a true state of the world, represented as a scalar, (2) a social network connecting two sets of nodes, some who aim to learn the true state and some who wish to persuade others of an erroneous true state, and (3) a learning horizon. We discuss each of these ingredients in turn.
The true state of the world is a constant . For example, in an election between candidates representing two political parties (say, Party 1 and Party 2), can be interpreted as the Party 1 candidate being far superior, means the Party 2 candidate is far superior, and implies the candidates are roughly equal. We emphasize that is a deterministic constant and does not depend on time, nor on the number of nodes in the system (both of which will vary).
A directed graph connects disjoint sets of nodes and (details regarding the graph structure are discussed in Section 2.2). We refer to elements of as regular agents, or simply agents, and elements of as stubborn agents or bots. While agents attempt to learn the true state , bots aim to disrupt this learning and convince agents that the true state is instead 0. In the election example, agents represent voters who study the two candidates to learn which is superior, while bots are loyal to Party 1 and aim to convince agents that the corresponding candidate is superior (despite possible evidence to the contrary). Edges in the graph represent connections in a social network over which nodes share opinions in a manner that will be described shortly. An edge from node to node , denoted , will be interpreted to mean that influences ’s opinion.
Agents and bots share opinions until a learning horizon . We will allow the horizon to depend on the number of agents and will thus denote it by at times. In the election example, represents the duration of the election season, i.e. the number of time units that agents can learn about the candidates and that bots can attempt to convince agents of the superiority of the Party 1 candidate. Here will be finite for each finite , and we will let tend to infinity with . In particular, we will choose such that an agent’s opinion at time only depends on the opinions of a vanishing fraction of all agents and bots; namely, those within the agent’s step incoming neighborhood in the social network (see Section 3.4 for details).
It remains to specify how agents attempt to learn and how bots aim to disrupt this learning. We begin with the agents. Initially, believes the state to be , where and for some that do not depend on (if , we let by convention). We refer to as the prior parameters and will not specify them beyond assuming they lie in the aforementioned intervals.^{2}^{2}2Appendix A.1 shows the effect of the prior parameters vanishes when with , so specifying them is unnecessary. In our running example, the initial belief can encode ’s past opinions regarding the political parties, e.g. means historically prefers Party 1 and is predisposed towards the corresponding candidate before the election season begins. At each time , receives a noisy observation of the true state (e.g. reads a news story regarding the candidates) and modifies its opinion based on this observation and on the opinions of its incoming neighbors in (e.g. discusses the election with its social connections). Mathematically, updates its belief as , where
(4)  
(5) 
Here is the noisy observation of the true state, is ’s incoming neighbor set in , , and is a constant (independent of agent and time ). We note that, as grows, the effect of the network becomes stronger (i.e. the opinions of agent ’s neighbors have a stronger effect on ’s own opinion); this will be reflected in our results. Also, as discussed in Section 2.2, we will assume , so (4) is welldefined.
Before discussing the bots, we comment further on the belief update (4). First, assuming temporarily, we simply have
, which is an unbiased estimate of the true state
. Next, if we drop the assumption (but still assume ), is no longer an unbiased estimate. Instead, we can viewas the mean of a beta distribution with parameters
; in this case, (4) is simply a Bayesian update of the prior distribution with a signal. Finally, dropping the assumption to obtain the model we actually consider, (4) is no longer a Bayesian update, as alluded to by the title of our work. This nonBayesian model is closely related to others in the literature; see Section 4.Having specified the behavior of agents, we turn to the bots. For , we simply set
(6) 
Hence, the opinion of is , e.g. bots believe the candidate from Party 1 is far superior. To explain the precise form of (6), consider a system composed of only agents (i.e. ). Since , it is easy to show via (4) that and . Hence, not only are bots biased towards state 0, but their bias is maximal, in the sense that their parameters are as extreme as an agent’s can be.
Note that we can define the bot behavior in an alternative way that will be more convenient for our analysis. Specifically, for , we can set (i.e. has a selfloop and no other incoming edges in ), , , and . Then, assuming updates its parameters via (4), it is straightforward to show (6) holds. This alternative definition will be used for the remainder of the paper. Finally, since all bots have the same behavior, we assume (without loss of generality) that the outgoing neighbor set of is for some , i.e. in addition to its selfloop, each bot has a single outgoing neighbor from the agent set.
2.2. Graph model
Having defined our learning model, we next specify how the social network is constructed. For this, we use a modification of a wellstudied random graph model called the directed configuration model (DCM) (Chen and OlveraCravioto, 2013). The DCM is a means of constructing a graph with a specified degree sequence; our modification is needed to account for the distinct node types at hand (agents and bots).
To begin, we realize a random sequence called the degree sequence from some distribution; here we let . In the construction described next, will have outgoing neighbors ( will influence other agents), incoming neighbors from the ( will be influenced by agents), and incoming neighbors from ( will be influenced by bots). Here the total indegree of is (as used in (6)). We assume
(7) 
In words, the first condition says influences and is influenced by at least one agent, and may be influenced by one or more bots. The second condition says sum outdegree must equal sum indegree in the agent subgraph; this will be necessary to construct a graph with the given degrees.
After realizing the degree sequence, we begin the graph construction.^{3}^{3}3This construction is presented more formally as Algorithm 1 in Appendix A.1. First, we attach outgoing halfedges, incoming halfedges labeled , and incoming halfedges labeled , to each ; we will refer to these halfedges as outstubs, instubs, and instubs, respectively. We let denote the set of these outstubs. We then pair each outstub in with an instub to form edges between agents in an iterative, breadthfirstsearch fashion that proceeds as follows:

Sample from uniformly. For each the instubs attached to , sample an outstub uniformly from (resampling if the sampled outstub has already been paired), and connect the instub and outstub to form an edge from some agent to .

Let . For each , pair the instubs attached to in the same manner the instubs of were paired.

Continue iteratively until all instubs have been paired. In particular, during the th iteration, we pair all instubs attached to , the set of agents at distance from (by distance , we mean a path of length exists, but no shorter path exists).
At the conclusion of this procedure, we obtain a graph with edges between agents, along with unpaired instubs attached to some agents. It remains to attach these instubs to bots. For this, we define to be the set of bots (hence, the node set is ). To each we add a single selfloop and a single unpaired outstub (as described at the end of Section 2.1). This yields unpaired outstubs attached to bots. Finally, we pair these outstubs arbitrarily with the unpaired instubs from above to form edges from bots to agents (note the exact pairing can be arbitrary since all bots behave the same, per Section 2.1).
Before proceeding, we note that the pairing of instubs with outstubs from did not prohibit us from forming agent selfloops (i.e. edges for ), nor did it prohibit multiple edges from to . This second observation means the set of edges formed will in general be a multiset. For this reason, we redefine the parameter update equations (4) as
(8)  
(9) 
i.e. we weigh the opinions of ’s incoming neighbors proportional to the number of edges pointing to . We also note that, instead of attaching bots to instubs after pairing all instubs as described above, we can pair instubs iteratively along with the pairing of instubs. Finally, we note that in the case , the construction described above reduces to the standard DCM.
3. Results
Having defined our model, we now turn to our results. We begin by defining the required assumptions in Section 3.1. We then state and discuss two theorems, one each in Sections 3.2 and 3.3. Finally, in Section 3.4, we return to comment on our assumptions.
3.1. Assumptions
To define the assumptions needed to prove our results, we require some notation. First, from the given degree sequence , we define
(10)  
(11) 
Here and , respectively, are the degree distributions for an agent sampled uniformly and sampled proportional to outdegree, respectively. Note that, since the first agent added to the graph is sampled uniformly from , the degrees of are distributed as . Furthermore, recall that, to pair instubs, we sample outstubs uniformly from , resampling if the sampled outstub is already paired. It follows that, each time we add a new agent to the graph (besides ), its degrees are distributed as . We also note that, because the degree sequence is random, these distributions are random as well. Using these random distributions, we also define the random variables
(12)  
(13) 
Following the discussion above, is the expected value (conditioned on the degree sequence) of the ratio of instubs to total instubs for ; is the expected value of this same ratio, but for new agents added to the graph (besides ). The interpretation of is similar, i.e. the expected ratio of instubs to the square of total instubs for new agents (besides ). At the end of Section 3.2, we discuss in more detail why these random variables arise in our analysis.
We now define the following four assumptions, which are needed to establish our results. Two of these statements require the degree sequence to be wellbehaved (with high probability) – specifically, 1
requires certain moments of the degree sequence to be finite, while
3 requires to be close to a deterministic sequence . The other two statements, 2 and 4, impose maximum and minimum rates of growth for the learning horizon . In particular, must be finite for each finite by 2 and then grow to infinity with by 4, as mentioned in Section 2.1. We defer further discussion of these assumptions to Section 3.4.
[label=(A0)]

, where, for some independent of with ,
(14) 
and independent of s.t. .

, where, for some s.t. , some , and some independent of ,
(15) 
.
3.2. First result
We can now present our first result, Theorem 3.1. The theorem states that the belief at time of a uniformly random agent converges in probability as . Interestingly, the limit depends on the relative asymptotics of the time horizon and the quantity defined in 3. For example, this limit is when ; note that requires to quickly approach 1 (since by 4), which by 3 and (12) suggests the number of bots is small. Hence, learns the true state when there are sufficiently few bots. (The other cases can be interpreted similarly.)
Before discussing the proof of the theorem, we make several observations:

Suppose is fixed and consider varying . To be concrete, let and define and (note satisfy 2, 4). Then and , so by Theorem 3.1, the belief of converges to at time and to 0 at time . In words, initially (at time ) learns the state of the world, then later (at time ) forgets it and adopts the bot opinions!

Alternatively, suppose is fixed and consider varying . For example, let for some . Here smaller implies fewer bots, and Theorem 3.1 says the limiting belief of is a decreasing convex function of (see Figure 1). One interpretation is that, if an adversary deploys bots in hopes of driving agent beliefs to 0, the marginal benefit of deploying additional bots is smaller when is larger, i.e. the adversary experiences “diminishing returns”. It is also worth noting that, since as and as , the limiting belief of is continuous as a function of .

If , consider the limiting belief of as a function of . By Theorem 3.1, this belief tends to as and tends to as . This is expected from (8): when , agents ignore the network (and thus avoid exposure to biased bot opinions) and form opinions based only on unbiased signals; when , the opposite is true. Interestingly, though, there is an asymmetry here: when , the belief approaches the case, but when , it does not approach the case (since ).
The proof of Theorem 3.1 is lengthy; for readability, we outline it in Appendix A and defer computational details to Appendix B. However, we next present a short argument to illustrate the fundamental reason why the three cases of the limiting belief arise in Theorem 3.1. (As a disclaimer, this argument is not entirely precise; we refer the reader to the appendices for a rigorous proof.)
At a high level, these three cases arise as follows. First, when , the “density” of bots within the step incoming neighborhood of is small. As a consequence, is not exposed to the biased opinions of bots by time and is able to learn the true state (i.e. in ). On the other hand, when , this “density” is large; is exposed to bot opinions and thus adopts them (i.e. in ). Finally, when , the “density” is moderate; does not fully learn, nor does fully adopt bot opinions (i.e. in ).
The explanation of the previous paragraph is not at all surprising; what is more subtle is what precisely density of bots within the step incoming neighborhood of means. It turns out that the relevant quantity (and what we mean by this “density”) is the probability that a random walker exploring this neighborhood reaches the set of bots.
To illustrate this, we consider a random walk that begins at and, for , chooses uniformly from all incoming neighbors of (agents and bots); note here that the walk follows edges in the direction opposite their polarity in the graph. For this walk, it is easy to see that, conditioned on the event , the event occurs with probability
(17) 
Importantly, we can sample this walk and construct the graph simultaneously, by choosing which instub of to follow before actually pairing these instubs. Assuming they are later paired with uniform agent outstubs, and hence connected to agents chosen proportional to outdegree, we can average (17) over the outdegree distribution to obtain that occurs with probability
(18) 
Now since bots have a selfloop and no other incoming edges, they are absorbing states on this walk. It follows that if and only if ; by the argument above, this latter event occurs with probability . Since by 3, we thus obtain that with probability
(19) 
From this final expression, the three regimes of Theorem 3.1 emerge: when , the random walker remains in the agent set with probability ; this corresponds to avoiding exposure to bot opinions and learning the true state. Similarly, means the walker is absorbed into the bot set with probability , corresponding to adopting bot opinions. Finally, means the walker stays in the agent set with probability , corresponding to not fully learning nor fully adopting bot opinions.
We note that the actual proof of Theorem 3.1 does not precisely follow the foregoing argument. Instead, we locally approximate the graph construction with a certain branching process; we then study random walks on the tree resulting from this branching process.^{4}^{4}4This is necessary because the argument leading to (18) assumes instubs are paired with with uniform outstubs, which is not true if resampling of outstubs occurs in the construction from Section 2.2. However, the foregoing argument illustrates the basic reason why the three distinct cases of Theorem 3.1 arise.
Finally, we observe that the argument leading to (18) shows why enters into our analysis. The other random variables defined in (12) enter similarly. Specifically, arises in almost the same manner, but pertains only to the first step of the walk; this distinction arises since the walk starts at , the degrees of which relate to . On the other hand,
arises when we analyze the variance of agent beliefs. This is because analyzing the variance involves studying
two random walks; by an argument similar to (18), the probability of both walks visiting the same agent is(20) 
3.3. Second result
While Theorem 3.1 establishes convergence for the belief of a typical agent, a natural question to ask is how many agents have convergent beliefs. Our second result, Theorem 3.2, provides a partial answer to this question. To prove the result, we require slightly stronger assumptions than those required for Theorem 3.1 (we will return shortly to comment on why these are needed). First, we strengthen 1 and 3 to include particular rates of convergence for the probabilities . Second, we strengthen 4 with a minimum rate at which (specifically, ). Third, and perhaps most restrictively, we require in 1. As a result, Theorem 3.2 only applies to the case , for which Theorem 3.1 states the belief of a uniform agent converges to zero. In this setting, Theorem 3.2 provides an upper bound on how many agents’ beliefs do not converge to zero. In particular, this bound is for some .
Theorem 3.2 ().
We reiterate that by 2 and by the theorem statement. Hence, , so one can choose in Theorem 3.2 to show that the size of the nonconvergent set of agents vanishes relative to . We suspect that such a result is the best one could hope for; in particular, we suspect that showing all agent beliefs converge to zero is impossible. This is in part because our assumptions do not preclude the graph from being disconnected. Hence, there may be small connected components composed of agents but no bots; in such components, agent beliefs will converge to (not zero). Additionally, while the lower bound for in Theorem 3.2 is somewhat unwieldy, certain terms are easily interpretable: the bound sharpens as grows (i.e. as agents place less weight on their unbiased signals), as decays (i.e. as the number of bots grows), and as decays (i.e. as signals are more likely to be zero, pushing beliefs to zero).
As for Theorem 3.1, the proof of Theorem 3.2 is outlined in Appendix A with details provided in Appendix B. The crux of the proof involves obtaining a sufficiently fast rate for the convergence in Theorem 3.1; namely, we show that for some , .^{5}^{5}5One may wonder why we derive a separate bound for Theorem 3.2, since we have already bounded to prove Theorem 3.1. The reason for this is that the bound for Theorem 3.1 does not decay quickly enough as to prove Theorem 3.2; on the other hand, the bound for Theorem 3.2 does not decay at all as for the case and therefore cannot be used for all cases of Theorem 3.1. See Appendix A.4.2 for details. At a high level, obtaining such a bound requires bounding three probabilities by , which also helps explain the stronger assumptions of Theorem 3.2:

To analyze the belief process on a tree, we first condition on the random tree structure and treat the belief as a weighted sum of i.i.d. signals using an approach similar to Hoeffding’s inequality. Namely, we obtain the Hoeffdinglike tail ; strengthening 4 with is necessary to show this tail is .

Finally, after conditioning on the tree structure, we show this structure is close to its mean. More specifically, letting denote the expected belief for the root node in the tree conditioned on the random tree structure (see Appendix A for details), we show
(22) Note the only source of randomness in is the random tree; because this tree is recursively generated, it has a martingalelike structure that can be analyzed using an approach similar to the AzumaHoeffding inequality for boundeddifference martingales. Here we require to ensure the degree sequence is illbehaved with probability at most ; we also require in this step (and only in this step).
We now address the most notable difference between Theorems 3.1 and 3.2; namely, that the latter only applies when . We believe this reflects a fundamental distinction between the cases and and is not an artifact of our analysis. An intuitive reason for this is that more bots are present in the former case, so fewer random signals are present (recall we model bot signals as being deterministically zero). As a result, is “less random”, so its concentration around its mean is stronger. Towards a more rigorous explanation, we first note that Appendix A.4.1 provides the following condition for extending Theorem 3.2 to other cases of :
(23) 
where is the limit from Theorem 3.1 based on the relative asymptotics of and , i.e.
(24) 
It is the convergence of in (23) that we suspect is fundamentally different in the cases and . To illustrate this, we provide empirical results in Figure 2. In the leftmost plot, we show versus ; here the plot is on a loglog scale, so a line with slope means . Hence, we are comparing four cases: , so that (blue circles); , so that and (orange squares); , so that (yellow diamonds); and , so that (purple triangles). The second plot reflects the corresponding cases of : decays to zero in the first two cases, grows towards in the fourth case, and approaches an intermediate limit in the third case. The final two plots illustrate the convergence (or lack thereof) in (23). Here the empirical mean of the error term decays quickly for the first case but decays more slowly (or is even nonmonotonic) in the other cases. More strikingly, the empirical variance of this error term is several orders of magnitude smaller in the first case. This suggests that decays much more rapidly in the case , which is why we believe this is the only case for which (23) is satisfied. (We point the reader to Appendix C for further details on this experiment.)
3.4. Comments on assumptions
We now return to comment on the assumptions needed to prove our results. First, 1 states that certain empirical moments of the degree distribution – namely, for uniformly, the first two moments of and the correlation between and – converge to finite limits. Roughly speaking, this says our graph lies in a sparse regime, where typical node degrees do not grow with the number of nodes.^{6}^{6}6This is analogous to an ErdősRényi model with edge formation probability for some independent of , in which degrees converge in distribution to random variables that have finite mean and variance. We also note in 1 is minor and simply eliminates an uninteresting case. To see this, first note that when holds, we have (roughly)
(25) 
where we have used the assumed inequality . Hence,
Comments
There are no comments yet.