Playing Stackelberg Opinion Optimization with Randomized Algorithms for Combinatorial Strategies

03/05/2018 ∙ by Po-An Chen, et al. ∙ Academia Sinica 0

From a perspective of designing or engineering for opinion formation games in social networks, the "opinion maximization (or minimization)" problem has been studied mainly for designing subset selecting algorithms. We furthermore define a two-player zero-sum Stackelberg game of competitive opinion optimization by letting the player under study as the first-mover minimize the sum of expressed opinions by doing so-called "internal opinion design", knowing that the other adversarial player as the follower is to maximize the same objective by also conducting her own internal opinion design. We propose for the min player to play the "follow-the-perturbed-leader" algorithm in such Stackelberg game, obtaining losses depending on the other adversarial player's play. Since our strategy of subset selection is combinatorial in nature, the probabilities in a distribution over all the strategies would be too many to be enumerated one by one. Thus, we design a randomized algorithm to produce a (randomized) pure strategy. We show that the strategy output by the randomized algorithm for the min player is essentially an approximate equilibrium strategy against the other adversarial player.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The opinion forming process in a social network can be naturally thought as opinion influencing and updating dynamics. This already attracted researchers’ interest a while ago in mathematical sociology, and recently in theoretical computer science. DeGroot [9] modeled the opinion formation process by associating each individual with a numeric-value opinion and letting the opinion be updated by weighted averaging the opinions of her friends and her own, where the weights represent how much she is influenced by her friends. This update dynamics will converge to a consensus where all individuals hold the same opinions. However, we can easily observe that in the real world, the consensus is difficult to be reached. Friedkin and Johnsen [11] differentiated an expressed opinion that each individual in the networks updates over time from an internal opinion that each individual is born with and stays unchanged. Thus, an individual would always be influenced by her inherent belief, and the dynamics converges to an unique equilibrium, which may not be a consensus.

Bindel et al. [5] viewed the updating rule mentioned above equivalently as each player updating her expressed opinion to minimize her quadratic individual cost function, which consists of the disagreement between her expressed opinion and those of her friends, and the difference between her expressed and internal opinions. They analyzed how socially good or bad the system can be at equilibrium compared to the optimum solution in terms of the price of anarchy [16]. The price of anarchy is at most 9/8 tight in undirected graphs and is unbounded in directed graphs. Nevertheless, a bounded price of anarchy can be obtained for weighted Eulerian graphs in [5], where the total incoming weights equal to the total outgoing weights at each node, while the price of anarchy is bounded for opinion formation games with directed graphs more general than weighted Eulerian graphs in [6].

From a perspective of designing or engineering, opinion maximization (or minimization) has been studied for seeding algorithms in [12, 2]. We then define the game of Stackelberg opinion optimization that will be introduced and analyzed in this paper. With a linear objective of the sum of expressed opinions, opinion maximization seeks to find a -subset (for a fixed size ) of nodes to have their expressed opinions fixed to 1 to maximize the objective. Opinion minimization can be similarly defined to minimize the objective. A seeding algorithm chooses what subset of nodes to fix their expressed opinions (to 1 if to maximize the objective), and it turns out that opinion optimization is NP-hard [12] so greedy algorithms [12, 2] have been designed to approximate the optimum with the help of the submodularity of such social cost.

It is obvious to see that controlling the expressed opinions is not the only way to optimize the objective. It is natural to consider changing the intrinsic (or equivalently, internal) opinions of some subset to optimize the objective. Notice that setting a chosen subset of nodes to have certain assigned intrinsic opinions does not prohibit later deciding their expressed opinions by the influence and update dynamics while controlling the expressed opinions of the chosen subset is definitive. In this sense, such “internal opinion designing” approach is relatively more relaxed, compared with the previously studied expression control [12]. Note that intrinsic opinion design enjoys its computational tractability.

One can think of a scenario of two players, one with the goal to minimize (or maximize) the objective and the other adversarial player trying to do the opposite thing. In such scenario of competitive opinion optimization, a zero-sum game is formed by these two players with all the subsets of nodes as the strategy set and each optimizing the same objective in the opposite direction. We can furthermore define a Stackelberg game by letting the player under study as the first-mover minimize the sum of expressed opinions by doing internal opinion design discussed above, knowing that the other adversarial player as the follower is to maximize the same objective by also her own internal opinion design. Even if a node is selected by the first-mover for intrinsic opinion design, its internal opinion would still be overwritten once later selected by the adversarial follower. Thus, a node’s expressed opinion will be decided by its designed internal opinion (possibly first by the min player and then the max player) and the update dynamic.

We view our problem of coming up with the min player’s strategy against the max player’s as an online optimization problem, specifically an online linear optimization one. We propose for the min player to repeatedly play some “no-regret” learning algorithm in such two-player Stackelberg game of competitive opinion optimization, obtaining rewards or losses depending on the other adversarial player’s play. Using generic or specific no-regret algorithms as strategies is a common approach to reach certain equilibria (on average) in repeated games [17, Chapter 4][8]. However, the previous results are established when players only have finite strategies to play. Since our strategies are combinatorial in nature, i.e., any subset of size

, the probabilities in the distribution over all the strategies (all k-subsets) would be too many to be enumerated one by one in a vector, which is how they are treated in the previous works. Therefore, the general result of playing no-regret algorithms in two-player zero-sum matrix game such as in

[8] is not directly applicable here. Also, due to the problem structure such as how the strategies of the two players are related to each other and something corresponding to the payoff matrix (which will be clear in Section 2.1), we do not have symmetry between the two players as those in the previous results. This justifies why we design algorithms for computing strategies for the min player facing the other adversarial play (which can be efficiently computed), and settle for, instead of characterizing equilibrium (which needs equilibrium strategies for both players), showing that it is indeed the best thing to do for the min player.

The probability distribution for a mixed strategy needs to be expressed implicitly instead of being expressed explicitly as a long vector. We resort to randomizing over such probability distributions (at different time steps), and follow this “average” distribution to produce a

-subset for the min player. Thus, we design a randomized algorithm for outputting a pure strategy of some uniformly chosen time step. Technically, such strategy computation has to be modeled as an online linear optimization problem, and the adversary’s strategy has to be shown efficiently computable. Finally, we show that the strategy output by the randomized algorithm for the min player converges to an approximate min strategy against the other adversarial player (the max player) mainly using the no-regret property. In other words, in our particular setting (opinion optimization games) with large strategy sets, using the randomized algorithm to play such Stackelberg game with the max plater playing in adversary guarantees an approximate minmax equilibrium.

1.1 Related Work

Using the sum of expressed opinions as the objective, opinion maximization seeks to find a -subset of nodes to have their expressed opinions fixed to 1 to maximize the objective. Greedy algorithms have been designed to approximate the optimum with the help of the submodularity of such social cost [12, 2].

There are works on competitive versions of various (combinatorial) optimization problem other than competitive opinion optimization that we define in this paper. The most well-known one is probably competitive influence maximization and its variation

[3, 13, 14].

It has been studied for two players playing no-regret algorithms to reach mixed Nash equilibrium (minmax equilibrium) in general zero-sum matrix-form games where the strategy set is finite [8]. On the other hand, here we apply some specific no-regret algorithms in Stackelberg opinion optimization games and randomize the output strategy for a large strategy set to guarantee the convergence to equilibria on expectation.

Another work closely related to that of Bindel et al. for opinion formation games is by Bhawalkar et al. [4]. The individual cost functions are assumed to be “locally-smooth” in the sense of [18] and may be more general than quadratic functions, for example, convex ones. The price of anarchy for undirected graphs with convex cost functions is shown to be at most 2. They also allowed social networks to change by letting players choose the -nearest neighbors throughout opinion updates and bounded the price of anarchy.

When graphs are directed, a bounded price of anarchy is only known for weighted Eulerian graphs [5], which may not be the most general class of directed graphs that give a bounded price of anarchy. Thus, we bounded the price of anarchy for games with directed graphs more general than weighted Eulerian graphs in [6]. We gave bounds on the the price of anarchy for a more general class of directed graphs with conditions intuitively meaning that each node does not influence the others more than she is influenced by herself and the others, where the bounds depend on such influence differences (in a ratio). This generalizes the previous results on directed graphs, and recovers and matches the previous bounds in some specific classes of (directed) Eulerian graphs. We also showed that there exists an example that just slightly violates the conditions with an unbounded price of anarchy so the conditions are indeed necessary for a bounded price of anarchy. Chierichetti et al. [7] considered the games with discrete preferences, where expressed and internal opinions are chosen from a discrete set and distances measuring “similarity” between opinions correspond to costs.

2 Preliminaries

We introduce fundamentals in opinion formation games first and proceed with preliminaries about our Stackelberg games of competitive opinion optimization in Section 2.1.

We describe a social network as a weighted graph for directed graph and weight matrix . The node set of size is the selfish players, and the edge set is the relationships between any pair of nodes. The edge weight is a real number and represents how much player  is influenced by player ; note that weight can be seen as a self-loop weight, i.e., how much player  influences (or is influenced by) herself. Each (node) player has an internal opinion , which is unchanged and not affected by opinion updates. An opinion formation game can be expressed as an instance that combines weighted graph and vector . Each player’s strategy is an expressed opinion , which may be different from her and gets updated. Both and are real numbers. The individual cost function of player  is

(1)
(2)

where is the strategy profile/vector and is the set of the neighbors of , i.e., . Each node minimizes her cost by choosing her expressed opinion . We analyze the game when it stabilizes, i.e., at equilibrium.

In a (pure) Nash equilibrium , each player ’s strategy is such that given (i.e., the opinion vector of all players except ) for any other ,

(3)

That is equivalently for each player to update her expressed opinion by the following rule [5, 4]:

(4)

This is obtained by taking the derivative of w.r.t. , setting it to for each , and solving the equality system since very player  minimizes . Note that is continuously differentiable. We consider an objective that is linear in ’s in this paper.

Absorbing Random Walks

In an opinion formation game, computing Nash equilibrium can be done by using absorbing random walks [10]. In a random walk on a directed graph with its weight matrix , a node in is an absorbing node if the random walk can only enters this node but not exit from it, and each entry is the weight on edge in . Let be the set of all absorbing nodes, and the set of the remaining nodes is transient nodes. Given the transition matrix (from the weight matrix ) whose entry represents the probability transiting from node to node in this random walk, a matrix can be computed where each entry is the probability that a random walk starting at transient state is absorbed at state (see Appendix A for details). If a random walk starting from transient node gets absorbed at an absorbing node , we assign to node the value that is associated with absorbing node . With , the expected value of is then . Let be the vector of the expected values for all and the vector of values for all . We have that

(5)

Thus, computing the expressed opinion vector at Nash equilibrium for an opinion formation game can be done by taking advantage of Equation (5) on a graph constructed for our purpose as follows. The weighted graph with original internal opinions gives and for the random walk on , where each has a distinct copy and with each weight and so .

In the case of expressed opinion control for opinion maximization as in [12, Section 3.3], controlling the set gives and with all for , i.e., , where is a all- vector of size , since the nodes in cannot change their expressed opinions but stick to value . The so-called internal opinion design will be explained in the following Section. There, when using absorbing random walks for arriving at stable states, and along with the weighted edges remain as mentioned in the last paragraph yet with being the internal opinion after manipulation.

2.1 Stackelberg Opinion Optimization Games and Online Linear Optimization

A two-player Stackelberg opinion optimization game can be described as an instance . We will elaborate each component one by one. Let the min player’s strategy be a vector with and . Denote the modified internal opinion vector by with for all after the min player makes her decision first. So, for a node , if it is selected by the min player, its internal opinion becomes or stays as if not selected by the min player. Knowing , the adversarial player’s strategy is a vector with and . Denote the final internal opinion vector by with for all after the adversarial player also makes her decision. If a node is selected by the adversarial player, its final internal opinion immediately becomes or stays as if not selected by the adversarial player. Let and denote the strategy sets and . Note that the expressed opinions are still influenced by and get updated to the value at stable state by the dynamic, using absorbing random walks (applying Equation (5)).

The min player minimizes her cost function over all ’s, which the adversarial player maximizes,

(6)

for and and a vector .

2.1.1 No-Regret Algorithms for Online Linear Optimization

In the setting of online convex optimization, we describe an online game between a player and the environment. The player is given a convex set and has to make a sequence of decisions . After deciding , the environment reveals a convex reward function and the player obtains . The performance of the player is measured by regret defined in the following. In this paper, what is closely related to our problem is a more specific problem of online linear optimization where the reward functions are linear, i.e., for some .

We define the player’s adaptive strategy as a function taking as input a subsequence of loss vectors and returns a point where .

Definition 1

Given an online linear optimization algorithm and a sequence of loss vectors , let the regret be defined as

A desirable property that one would want an online linear optimization algorithm to have is a regret which scales sublinearly in . For example, the online gradient descent algorithm [19] guarantees a regret of . This property can be formally captured as the following.

Theorem 1 (e.g., Theorem 10 of [1])

For any bounded decision set there exists an algorithm such that for any sequence of loss vectors with bounded norm.

The no-regret property is useful in a variety of contexts. For example, it is known (e.g., [1, Section 3]) that two players playing -regret algorithms and , respectively, in a zero-sum game with a cost function of the form for some give a version of minmax equilibrium.

Theorem 2 (Corollary 3 of [1])

For compact convex sets and and any biaffine function222A biaffine function satisfies and for every , and . , we have

(7)

We restate the argument here to provide a reference to the standard technique and result that have been existing for playing no-regret algorithms in a zero-sum matrix game. One can view the argument in the following as something we would like to do in Section 3 yet with different technical details for coping with our more challenging games with combinatorial strategies. For every , we have and for and . By applying the definition of regret twice, we have

(8)
(9)

We obtain by combining the inequalities above and setting , and by weak duality.

3 Randomized Algorithms for Combinatorial Strategies

For our opinion optimization game, one can first notice that the strategies of the two players interact with each other and matrix , which corresponds to the cost matrix , in a very different way from the standard result discussed in Section 2.1.1. For example, we have here instead of . Because of the differences, we design algorithms for computing an approximate equilibrium strategy of the min player (against the adversarial player), and focus on efficient computation of the adversary’s strategy and the equilibrium strategy analysis only for the min player, instead of characterizing equilibrium, i.e., equilibrium strategies for both players (since the adversarial player can overwrite the min player’s selection and we do not have a symmetric structure such as in our problem).

Given the strategy sets and defined in Section 2.1, our Stackelberg opinion optimization game takes a cost function defined in Section 2.1. Such game is played repeatedly, where at time the min player chooses and the adversarial (max) player chooses . Here, the min player chooses her strategies according to a no-regret algorithm, and the adversarial player is assumed to maximize the value of the objective.

3.1 Follow-the-Perturbed-Leader Algorithms

If we want to use the follow-the-perturbed-leader algorithm [15] to compute a sequence of “mixed” strategies in our game, then since our strategies are combinatorial in nature, i.e., , the probabilities in a distribution over all the strategies would be too many to be enumerated one by one. The probability distribution needs to be expressed implicitly instead of being expressed explicitly as a long vector. Let denote this long vector of the probability distribution at time step .

For every time step , we have that where now

is the follow-the-perturbed-leader algorithm, and estimating

will be explained in Section 3.2. 333We use the notation to denote an expected vector throughout this paper.

As in online linear optimization, we need to define the loss function for the min player’s strategy first. Let

be the -subset that the adversarial player selects at time step . Fixing the adversarial player’s strategy and thereby determining , we can write the loss function as an affine function of the min player’s strategy for from 1 to :

(10)

We are now ready to specify the follow-the-perturbed-leader algorithm for “large strategy sets” to get a randomized pure strategy at each time step . The min player’s strategy at time step is

for a random vector uniformly distributed in each dimension. can be simplified to with each

where for each with realized random value at and is a constant. Thus, can be efficiently computed by considering and ’s range for all and selecting the top nodes that contribute to least. Note that does not represent a mixed strategy in our game, but a randomized (combinatorial) pure strategy following distribution . Actually, can be estimated by sampling enough times. We explain this in Section 3.2.

Thus, we conclude that our randomized algorithm outputs a (randomized) pure strategy in a uniformly random time step for the min player against the adversarial player who ideally is to play at each time step . We can estimate this strategy of the adversary as accurately as possible with high probability. That is, with high probability

where is an error from estimation, which can be made as small as desired. We show that such strategy of the adversary can be found efficiently in Section 3.2. The randomized algorithm runs the follow-the-perturbed-leader update up to time step . Our main result is to show that the randomized pure strategy indeed approaches an approximate minmax equilibrium (see Section 3.3). The randomized algorithm is summarized in the following.

1:  Choose uniformly at random from
2:  for  to  do
3:      for a uniformly random (in each dimension) vector , where the adversary’s that determines can be efficiently computed using the procedure described in Section 3.2.
4:  end for
Algorithm 1 Randomized algorithm for combinatorial strategies

3.2 Computing the Adversary’s Strategy

We now show that the adversarial player’s strategy satisfies with high probability

where is an error from estimation, can be efficiently computed (and thereby finding the loss function ) by estimating each probability that the min player selects node .

Recall that the cost value can be computed as

where is the expected modified internal opinion by the min player. The randomized pure strategy produced by the follow-the-perturbed-leader algorithm at that time step provides a randomized way to modify entries of the vector to the value .

Let denote the probability that it chooses to modify node at time step . Note that . For each , we can draw samples (each of which is a -subset of nodes) from distribution . We let denote the ratio of the number that it chooses to modify node to the number . By applying the Hoeffding’s inequality, we have

That is, by choosing , we can use the estimated probability that is within an estimation error from the actual one with at least probability of .

Then the expected cost of the min player (before the adversarial player’s intervention) is

Now the adversarial player would like to increase the min player’s cost as much as possible. For the max player, by compromising node , the expected cost can be increased by

Thus, the adversarial (max) player simply chooses the nodes with the largest ’s. Note that using for each node incurs an estimation error that jointly guarantees computing the adversary’s efficiently such that with at least probability of

(11)

3.3 Equilibrium Strategy Analysis

Let the min player play the strategy output by the randomized algorithm and the adversarial player’s strategy be the one maximizing the loss, given the min player’s chosen strategy. First, it can be shown that the play output by the randomized algorithm is -regret. This is achieved naturally in the sense of expected losses of the min player since there is a random vector as a random source that produces the distribution .

Lemma 3

For the min player, follow-the-perturbed-leader algorithms are -regret w.r.t. her respective loss functions depending on the adversary’s strategy ’s, i.e.,

Proof 4

We apply Theorem 1.1(a) of [15] in our context with random vector chosen uniformly at random in each dimension.

Since the randomized algorithm chooses time step uniformly at random from , we let

Then, we are ready to state the main result.

Theorem 5

The strategy output by the randomized algorithm for the min player (against the adversarial player), which has the -average regret property, is a -approximate equilibrium strategy with high probability.

Proof 6

For the min player, we have that

by the linearity of expectation and

for any .

For each , applying Inequality (11) that accounts for estimation, we obtain with at least probability of (by a union bound)

Due to the fact that is affine in , the right-hand side of the inequality is equivalent to

By the -average regret property from Lemma 3, we finally have

4 Discussions and Future Work

One does not necessarily have to use linear objectives. For example, the objective of sum of the node players’ costs is not a linear one. Although we focus on computing the min player’s equilibrium strategy due to our model structure in this paper. It does not preclude the possibility of exploring other suitable models for competitive opinion optimization that might allow computing or learning equilibrium-inducing strategies for all players.

As future directions, we can generalize competitive opinion optimization to multi-player non-zero-sum games with different (linear) objectives in terms of expressed opinions for different players each optimizing her own objective. Playing certain no-regret algorithms, the average strategy of each player then might converge to certain more permissive equilibrium (Nash equilibrium, correlated equilibrium, etc.). It does not really make sense in a zero-sum game to ask about the price of anarchy. Nevertheless, the price-of-anarchy type of questions becomes interesting and meaningful in a non-zero-sum game setting again.

Appendix A Computing Matrix

We restate the computation from Section 3.3 of [12]. The transition matrix is constructed by normalizing each row vector of the weight matrix . Given the set of absorbing nodes and the set of transient nodes , then can be partitioned into submatrices ,

, identity matrix

, and all-zero matrix

, where is the submatrix with the transition probabilities from transient nodes to absorbing nodes and is the submatrix with the transition probabilities between transient nodes.

The probability of transition from to in exactly steps is denoted as the entry of the matrix . We can construct the fundamental matrix of the absorbing random walk where the entry is the probability that such random walk starting from ends up at without being absorbed.

Finally, we have that

where each entry of such matrix is the probability that a random walk starting at transient node gets absorbed at absorbing node .

References

  • [1] J. Abernethy, P. L. Bartlett, and E. Hazan. Blackwell approachability and no-regret learning are equivalent. In Proc. of Conference on Learning Theory, 2011.
  • [2] A. Ahmadinejad and H. Mahini. How effectively can we form opinions? In Proc. of International World Wide Web Conference, 2014.
  • [3] S. Bharathi, D. Kempe, and M. Salek. Competitive influence maximization in social networks. In Proc. of 3rd Workshop on Web and Internet Economics, 2007.
  • [4] K. Bhawalkar, S. Gollapudi, and K. Munagala. Coevolutionary opinion formation games. In

    Proc. 45th ACM Symposium on Theory of Computing

    , 2013.
  • [5] D. Bindel, J. Kleinberg, and S. Oren. How bad is forming your own opinion? In Proc. of 52nd Annual IEEE Symposium on Foundations of Computer Science, 2011.
  • [6] P.-A. Chen, Y.-L. Chen, and C.-J. Lu. Bounds on the price of anarchy for a more general class of directed graphs in opinion formation games. Operations Research Letters, 44, 2016.
  • [7] F. Chierichetti, J. Kleinberg, and S. Oren. On discrete preferences and coordination. In Proc. 14th ACM Conference on Electronic Commerce, 2013.
  • [8] C. Daskalakis, A. Deckelbaum, and A. Kim. Near-optimal no-regret algorithms for zero-sum games. Games and Economic Behavior, 92, 2015.
  • [9] M. DeGroot. Reaching a consensus. Journal of the American Statistical Association, 69(345), 1974.
  • [10] P. Doyle and J. Snell, editors. Random walks and electric networks. Mathematical Association of America, 1984.
  • [11] N. E. Friedkin and E. C. Johnsen. Social influence and opinions. The Journal of Mathematical Sociology, 15(3-4), 1990.
  • [12] A. Gionis, E. Terzi, and P. Tsaparas. Opinion maximization in social networks. In Proc. 13th SIAM International Conference on Data Mining, 2013.
  • [13] S. Goyal, H. Heidari, and M. Kearns. Competitive contagion in networks. Games and Economic Behavior, 2014.
  • [14] X. He and D. Kempe.

    Price of anarchy for the n-player competitive cascade game with submodular activation functions.

    In Proc. 9th Workshop on Internet and Network Economics, 2013.
  • [15] A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71, 2005.
  • [16] E. Koutsoupias and C. Papadimitriou. Worst-case equilibria. In Proc. 17th Annual Symposium on Theoretical Aspects of Computer Science, 1999.
  • [17] N. Nisan, T. Roughgarden, E. Tardos, and V. V. Vazirani, editors.

    Algorithmic Game Theory

    .
    Cambridge University Press, 2007.
  • [18] T. Roughgarden and F. Schoppmann. Local smoothness and the price of anarchy in atomic splittable congestion games. In Proc. 22nd ACM-SIAM Symposium on Discrete Algorithms, 2011.
  • [19] M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In

    Proc. 20th International Conference on Machine Learning

    , 2003.