1 Introduction
During the last decades a lot of research effort has been devoted to model the underlying process of opinion formation of agents that interact through a social network. In this respect, DeGroot’s [1] or FriedkinJohnsen’s [2] models are classic references. Another classic model is the voter model [3, 4] which considers that each agent holds a binary opinion, or , and at each time step, each agent chooses one of its neighbors at random and adopts that opinion as its own. Other works, based on the voter model, incorporate stubborn agents [5], and biased agents [6]. Moreover, the last few years there has been an increasing literature about manipulation of opinion in social networks [7, 8, 9].
In this work, we are interested in finding the most efficient use over time of a budget in order to manipulate a social network. The idea is to promote an opinion by paying agents to supplant their true opinions. We model opinions as two values, 0 or 1, with 1 (0) representing supportive (nonsupportive) opinion.
We frame the problem of designing sequential payment strategies as a discounted Markov decision process (DMDP). DMDPs have been widely used to formulate many decision making problems in science and engineering (see, e.g., [10, 11, 12]). One of the main applications of DMDP models is the computation of optimal decisions (i.e., actions) over time to maximize the expected reward (analogously, minimize the expected cost).
First of all, we focus on a fully connected network where agents change their opinion following a voter model. We provide the correspondent Bellman equations to solve this problem and we show through an example how to solve the stated problem in practice. We provide a structural characterization of the associated value function and the optimal payment strategy. Then, we compute the optimal payment using dynamic backward programing.
2 Model definition
In order to find the optimal budget allocation on binary opinion dynamics, we make extensive use of the theory of DMDPs. First, we adopt the voter model of opinion formation over a social network. Then, we define the DMDP and the corresponding Bellman equations to obtain the optimal strategy of budget allocation.
Consider an undirected social network , where stands for the set of agents, indexed from to , and is the set of edges. Each agent has a binary initial opinion. Opinions take values or . If agent has opinion (analogously, ) we label it as nonsupporter (supporter). For example, if agents are discussing about politics, could be supporting a particular party and not supporting it.
Moreover, we distinguish two cases, depending on whether the network is fully connected or not.
We start by studying the fully connected case. In each decision epoch, agents update their opinions following a voter model sensitive to external payments. Let
be the discount factor, which represents the loss of reward in the future with respect to the current reward. A discounted Markov decision process (DMDP) is a 5tuple , where is a finite set of states, is a finite set of actions,is the set of transition probabilities and
is the set of rewards.Therefore, the model is defined as follows:

Decision epochs: the set of decision epochs is defined as . We consider a finite discretetime system. At each decision epoch we observe the state and choose an action .

States: the state space of the DMDP consists on the possible number of supporters, , .

Actions: the action space is the set of actions available in state (without loss of generality, actions are state independent). We consider that the actions are the possible number of payments, , where the nonsupporter agents have a cost for changing their opinion from nonsupporter to supporter, and the supporter agents, a cost to hold their supporter opinion. We assume that the cost of changing their opinion is higher than the cost of holding it, i.e., . We consider a finite budget . Notice that, because the actions are constrained by the budget, they are stationary.

Transition probabilities: if the DMDP in decision epoch is at state , the probability that it transitions to state taking action is denoted . Due to the natural independence of agents transitions, we compute those probabilities as the product of the transition probabilities of the agents. The evolution of one agent will be described by the voter model. Starting from any arbitrary initial labels, supporter (S) or nonsupporter (NS), we consider two labeling functions and , where () means that agent is a supporter (nonsupporter). At each decision epoch , each node selects uniformly at random one of its neighbors opinion. For each node , the set of its neighbors is defined as . Therefore, we define for one with zeropayment in decision epoch the labeling functions,
Analogously, for one agent that receives a payment in decision epoch we define:
As we said, we assume that the graph is fully connected, therefore each agent can communicate with every other agent. We denote the set of nonsupporter agents that receive a payment as and its cardinality as . Respectively, the set of supporter agents that receive a payment as and its cardinality as . Notice that .
Therefore the transition probabilities can be computed as:

[noitemsep,topsep=0pt,label=]




Reward: the instant reward in time and state is defined as , where denotes the reward provided by one agent.
Let be the value function of the above DMDP, i.e., it is the supreme, over all possible budget allocation strategies, of the expectation of the discounted reward starting from an initial budget . Under these assumptions, the Bellman equations for all and initial budget are:
where the budget evolves as
Next, we present the second case where in each decision epoch agents update their opinions following a voter model in a network (not necessarily fully connected) that can be affected by external payments. As before, we design this problem as a DMDP where the set of actions are the possible external payments.
Therefore, let be the discount factor, we consider, by a slight abuse of notation, the 5tuple as before. Concretely, the elements changed from the previous model:

Decision epochs: the set of decision epochs is defined as .

States: the state space of the DMDP, consists on all possible combinations of agents’ labels, nonsupporter () or supporter (), i.e., .

Actions: the action space is the set of actions available in state (without loss of generality, actions are state independent). An action means whether or not we give a payment to each of the agents, , where a (respectively ) in position means we give no payment (payment) to agent
. We also define a vector of costs
whose element is the cost of changing by one unit the opinion of agent , in case agent is nonsupporter, or the cost of holding the opinion of agent , in case agent is a supporter. 
Transition probabilities: as before, if the DMDP in decision epoch is at state , the probability that it transitions to state taking action is expressed as and can be computed as:

[noitemsep,topsep=0pt,label=]




Reward: the instant reward in time and state is defined as , where is the vector of rewards whose element is the reward that agent provides.
Let be the value function of the above DMDP, i.e., it is the supreme, over all possible budget allocations strategies, of the expectation of the discounted reward starting from an initial budget . Under these assumptions the Bellman equations for all and initial budget are:
where the budget evolves as and denotes the transpose of vector .
3 Simulation results
We suppose that after a time we will not obtain rewards for the supporter agents, so we are interested in the distribution of the DMDP in the time interval . We consider an undirected, fully connected social network with agents that form opinions with a voter model. In time , we assign at random an initial label to each agent. Solving the Bellman (fixed point) equations, given the model and the set of states and feasible actions, gives us the best strategy for each state to follow in the time interval. We take , , , , and .
Some conclusions can be drawn from the simulations. The optimal payment strategy is to invest all our budget paying to the higher number of agents at time . Obviously, if all the network is nonsupporter (respectively supporter), the budget will be allocated to change opinions (to hold opinions). However, the distribution of the budget differs for the rest of possible initial states as we show on Table 1.
Initial state  Budget Allocation  
Payments to NS  Payments to S  
2  1  
2  2  
1  3  
1  4  
0  5  
0  6 
Given the budget allocation at time , we show in Figure 1 the expected reward obtained at time for each state .
4 Conclusions
In this work, we have introduced the problem of budget allocation over time to manipulate a social network. We have developed a formulation of the discounted Markov decision process as well as the corresponding Bellman equations for the voter model of opinion formation. Using backward programming, we have obtained the optimal payment strategy for a small example of agents interacting trough a fully connected network. Many questions still remain to be answered. Future work would be devoted to improve the performance of our simulations in order to obtain the optimal strategy for larger networks and different topologies. Moreover, we intend to construct the DMDP model and the Bellman equations for different models of opinion dynamics. This will lead to the mathematical characterization of the optimal policy for different network structures and opinion formation models. It will lead also to the characterization of the most important agents (the agents with highest benefitcost ratio) which should be related with its centrality as shown in previous results [9, 7].
Acknowledgements
The work of A. Silva and S. Iglesias Rey was partially carried out at LINCS (www.lincs.fr).
References
 [1] M. H. Degroot, “Reaching a consensus,” Journal of the American Statistical Association, vol. 69, no. 345, pp. 118–121, 1974.
 [2] N. E. Friedkin and E. C. Johnsen, “Social influence networks and opinion change,” Advances in Group Processes, vol. 16, pp. 1–29, 1999.
 [3] P. Clifford and A. Sudbury, “A model for spatial conflict,” Biometrika, vol. 60, no. 3, pp. 581–588, 1973.
 [4] R. A. Holley and T. M. Liggett, “Ergodic theorems for weakly interacting infinite systems and the voter model,” The Annals of Probability, vol. 3, no. 4, pp. 643–663, 1975.
 [5] E. Yildiz, A. Ozdaglar, D. Acemoglu, A. Saberi, and A. Scaglione, “Binary opinion dynamics with stubborn agents,” ACM Trans. Econ. Comput., vol. 1, pp. 19:1–19:30, Dec. 2013.
 [6] A. Mukhopadhyay, R. R. Mazumdar, and R. Roy, “Binary opinion dynamics with biased agents and agents with different degrees of stubbornness,” in 2016 28th International Teletraffic Congress (ITC 28), vol. 1, pp. 261–269, Sept. 2016.
 [7] S. Dhamal, W. BenAmeur, T. Chahed, and E. Altman, “Good versus Evil: A Framework for Optimal Investment Strategies for Competing Camps in a Social Network,” ArXiv eprints, June 2017, 1706.09297.
 [8] M. Förster, A. Mauleon, and V. Vannetelbosch, “Trust and manipulation in Social networks,” Sept. 2013. Documents de travail du Centre d’Economie de la Sorbonne 2013.65  ISSN : 1955611X.
 [9] A. Silva, “Opinion manipulation in social networks,” in Network Games, Control, and Optimization: Proceedings of NETGCOOP 2016, Avignon, France (S. Lasaulce, T. Jimenez, and E. Solan, eds.), pp. 187–198, Springer International Publishing, 2017.
 [10] E. Altman, Applications of Markov Decision Processes in Communication Networks, pp. 489–536. Boston, MA: Springer US, 2002.
 [11] N. Archak, V. Mirrokni, and S. Muthukrishnan, Budget Optimization for Online Campaigns with Positive Carryover Effects, pp. 86–99. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012.

[12]
C. Boutilier and T. Lu, “Budget allocation using weakly coupled, constrained
markov decision processes,” in
Proceedings of the ThirtySecond Conference on Uncertainty in Artificial Intelligence
, UAI’16, (Arlington, Virginia, United States), pp. 52–61, AUAI Press, 2016.
Comments
There are no comments yet.