Distributed optimization and game theory provide powerful frameworks to deal with optimization problems arising in multi-agent systems. In generic distributed optimization problems, the cost functions of agents are distributed across the network, meaning that each agent has only partial information about the whole optimization problem which is to be solved. Game theoretic problems arise in such networks when the agents do not cooperate with each other and the cost functions of these non-cooperative agents are coupled by the decisions of all agents in the system. The applications of game theoretic and distributed optimization approaches include, for example, electricity markets, power systems, flow control problems and communication networks[11, 12, 6].
On the other hand, cooperation and competition coexists in many practical situations, such as cloud computing, hierarchical optimization in Smart Grid, and adversarial networks [3, 4, 8]. A body of recent work has been devoted to analysis of non-cooperative games and distributed optimization problems in terms of a single model called -cluster games [16, 19, 17, 18, 20, 5]. In such -cluster games, each cluster corresponds to a player whose goal is to minimize her own cost function. However, the clusters in this game are not the actual decision-makers as the optimization of the cluster’s objective is controlled by the agents belonging to the corresponding cluster. Each of such agents has her own local cost function, which is available only to this agent, but depends on the joint actions of agents in all clusters. The cluster’s objective, in turn, is the sum of the local cost functions of the agents within the cluster. Therefore, in such models, each agent intends to find a strategy to achieve a Nash equilibrium in the resulting -cluster game, which is a stable state that minimizes the cluster’s cost functions in response to the actions of the agents from other clusters.
Continuous time algorithms for the distributed Nash equilibria seeking problem in multi-cluster games were proposed in [19, 17, 18]. The paper  solves an unconstrained multi-cluster game by using gradient-based algorithms, whereas the works  and  propose a gradient-free algorithm, based on zero-order information, for seeking Nash and generalized Nash equilibria respectively. In discrete time domain, the work  presents a leader-follower based algorithm, which can solve unconstrained multi-cluster games in linear time. The authors in  extend this result to the case of leaderless architecture. Both papers [5, 20] prove linear convergence in games with strongly monotone mappings and first-order information, meaning that agents can calculate gradients of their cost functions and use this information to update their states. In contrast to that, the work  deals with a gradient-free approach to the cluster games. However, the gradient estimations are constructed in such a way that only convergence to a neighborhood of the equilibrium can be guaranteed. Moreover, these estimations are obtained by using two query points, for which an extra coordination between the agents is required.
Motivated by relevancy of -cluster game models in many engineering applications, we present a discrete time distributed procedure to seek Nash equilibria in -cluster games with zero-order information. We consider settings, where agents can communicate with their direct neighbors within the corresponding cluster over some undirected graph. However, in many practical situations the agents do not know the functional form of their objectives and can only access the current values of their objective functions at some query point. Such situations arise, for example, in electricity markets with unknown price functions . In such cases, the information structure is referred to as zero-order oracle. Our work focuses on zero-order oracle information settings and, thus, assumes agents to have no access to the analytical form of their cost functions and gradients. The agents instead construct their local query points and get the corresponding cost values from the oracle. Based on these values, the agents estimate their local gradients to be able to follow the step in the gradient play procedure. We formulate the sufficient conditions and provide some concrete example on how to estimate the gradients to guarantee the almost sure convergence of the resulting algorithm to Nash equilibria in -cluster games with strictly monotone game mappings. To the best of our knowledge, we present the first algorithm solving -cluster games with zero-order oracle and the corresponding one-point gradient estimations.
The paper is organized as follows. In Section II we formulated the -cluster game with undirected communication topology in each cluster and zero-order oracle information. Section III introduces the gradient play algorithm which is based on the one-point gradient estimations. The convergence result is presented in Section III as well. Section IV provides an example of query points and gradient estimations which guarantee convergence of the algorithm discussed in Section III. Section V presents some simulation results. Finally, Section VI concludes the paper.
Notations. The set is denoted by . For any function , , is the partial derivative taken in respect to the
th coordinate of the vector variable. We consider real normed space , which is the space of real vectors, i.e. . We use to denote the inner product in . We use to denote the Euclidean norm induced by the standard dot product in . Any mapping is said to be strictly monotone on , if for any , where . We use to denote the ball of the radius and the center and to denote the unit sphere with the center in . We use to denote the projection of to a set . The mathematical expectation of a random value is denoted by . We use the big- notation, that is, the function is as , = as , if for some positive constant .
Ii Nash Equilibrium Seeking
Ii-a Problem Formulation
We consider a non-cooperative game between clusters. Each cluster itself consists of agents. Let and 111All results below are applicable for games with different dimensions of the action sets . The one-dimensional case is considered for the sake of notation simplicity. denote respectively the cost function and the feasible action set of the agent in the cluster . We denote the joint action set of the agents in the cluster by . Each function , , depends on , which represents the joint action of the agents within the cluster , and , denoting the joint action of the agents from all clusters except for the cluster .
The cooperative cost function in the cluster is, thus, .
We assume that the agents within each cluster can interact over an undirected communication graph . The set of nodes is the set of the agents and the set of undirected arcs is such that if and only if , i.e. there is a bidirectional communication link between to , over which information in form of a message can be sent from the agent to the agent and vice versa in the cluster .
However, there is no explicit communication between the clusters. Instead, we consider the following zero-order information structure in the system: No agent has access to the analytical form of any cost function, including its own. Each agent can only observe the value of its local cost function given any joint action of all agents in the system. Formally, given a joint action , each agent , receives the value from a zero-order oracle. Especially, no agent has or receives any information about the gradient.
Let us denote the game between the clusters introduced above by . We make the following assumptions regarding the game :
The -cluster game under consideration is strictly convex. Namely, for all , the set is convex, the cost function is continuously differentiable in for each fixed . Moreover, the game mapping, which is defined as
is strictly monotone on .
Each function is Lipschitz continuous on .
The action sets , , , are compact. Moreover, for each there exists a so called safety ball with and 222Existence of the safety ball is required to construct feasible points for costs’ gradient estimations in the zero-order settings under consideration (see )..
The assumptions above are standard in the literature on both game-theoretic and zero-order optimization . Finally, we make the following assumption on the communication graph, which guarantees sufficient information ”mixing” in the network within each cluster.
The underlying undirected communication graph is connected for all . The associated non-negative mixing matrix defines the weights on the undirected arcs such that if and only if and , .
One of the stable solutions in any game corresponds to a Nash equilibrium defined below.
A vector is called a Nash equilibrium if for any and
Ii-B Existence and Uniqueness of the Nash Equilibrium
In this subsection, we demonstrate the existence of the Nash equilibrium for under Assumptions 1 and 3. For this purpose we recall the results connecting Nash equilibria and solutions of variational inequalities from .
Consider a set and a mapping : . A solution to the variational inequality problem is a set of vectors such that , for any .
The following theorem is the well-known result on the connection between Nash equilibria in games and solutions of a definite variational inequality (see Corollary 1.4.2 in ).
Consider a non-cooperative game . Suppose that the action sets of the players are closed and convex, the cost functions are continuously differentiable and convex in for every fixed on the interior of the joint action set . Then, some vector is a Nash equilibrium in , if and only if , where is the game mapping defined by (1).
Next, we formulate the result guaranteeing existence and uniqueness of in the case of strictly monotone map (see Corollary 2.2.5 and Proposition 2.3.3 in ).
Given the , suppose that is compact and the mapping is strictly monotone. Then, the solution exists and is a singleton.
Iii Main Results
Iii-a Zero-order gradient play between clusters
To deal with the zero-order information available to the agents and local state exchanges within the clusters, we assume each agent from the cluster maintains a local variable
which is her estimation of the joint action of the agents from her cluster . Here, is player ’s estimate of and is the action of agent from cluster . The goal of the agents within each cluster is to update their local variables in such a way that the joint action with converges to the Nash equilibrium in the game between the clusters as time runs. To let the agents achieve this goal, we aim to adapt the standard projected gradient play approach to the cluster game with the zero-order information.
At this point we assume each agent , , based on its local estimation , constructs a feasible query point and sends it to the oracle. As a reply from the oracle, the agent receives the value . The vector here corresponds to the point obtained by some combination of the query vectors sent by the agents from the other clusters. Formally,
where denotes some agent from the cluster , . Further each agent , , uses the received value to obtain the random estimation of her local cost’s gradient at the point , where
corresponds to the local estimations of other agents (one for each cluster different from ) based on which query points are obtained. Thus, . As is an estimation of , we represent this vector by the following decomposition:
where is a random vector reflecting inaccuracy of the obtained estimation, i.e. the estimation error vector. Note that for the joint query point the oracle is free to choose any combination of the local queries defined in (3).
Now we are ready to formulate the gradient play between the clusters. Starting with an arbitrary , each agent updates the local estimation vector , , , as follows:
where the time-dependent parameter corresponds to the step size.
Let be the -algebra generated by the estimations up to time , , . Let be the running average of the agents’ estimations vectors within the cluster . The following proposition describes the behavior of in the long run.
In view of the proposition above and to be able to analyze behavior of the algorithm by means of the running averages , we make the following assumption on the balance between the step size and the error term .
The step size and the error term are such that
In Section IV we shed light on how the gradients can be sampled to guarantee fulfillment of Assumption 5. With Proposition 1 in place, we are ready to prove the main result formulated in the theorem below.
Let be the unique Nash equilibrium in the game , see Theorem 3. We proceed with estimating the distance between and . Let . As , we can use the non-expansion of the projection operator to conclude that almost surely (a.s.)444In the following discussion the big- notation is defined under the limit (see Notations).
where in the last equality we used (5), which implies that a.s.
And, as , we obtain that a.s.
Thus, we get from (7)
Analogously to (16)
Therefore, by averaging both sides of (33) over and taking the conditional expectation in respect to (below we use the notation ), we obtain that a.s.
Moreover, due to Assumption 5,
almost surely. Thus,
Next, let us introduce the vector where . Therefore, summing (38) over implies