1 Introduction
Controlling a robot in an environment where it interacts with other agents is a complex task. Traditional approaches in the literature adopt a predictthenplan architecture. First, predictions of other agents’ trajectories are computed, then they are fed into a planner that considers them as immutable obstacles. This approach is limiting because the effect of the robot’s trajectory on the other agents is ignored. Moreover, it can lead to the “frozen robot” problem that arises when the planner finds that all paths to the goal are unsafe Trautman2010. It is, therefore, crucial for a robot to simultaneously predict the trajectories of other vehicles on the road while planning its own trajectory, in order to capture the reactive nature of all the agents in the scene. ALGAMES provides such a joint trajectory predictor and planner by considering all agents as players in a Nashstyle dynamic game. We envision ALGAMES being run online by a robot in a recedinghorizon loop, at each iteration planning a trajectory for the robot by explicitly accounting for the reactive nature of all agents in its vicinity.
Joint trajectory prediction and planning in scenarios with multiple interacting agents is welldescribed by a dynamic game. Dealing with the gametheoretic aspect of multiagent planning problems is a critical issue that has a broad range of applications. For instance, in autonomous driving, ramp merging, lane changing, intersection crossing, and overtaking maneuvers all comprise some degree of gametheoretic interactions Sadigh2016, Sadigh2016a, FridovichKeil2020a, Dreves2018, Fisac2019, Schmerling2018. Other potential applications include mobile robots navigating in crowds, like packagedelivery robots, tour guides, or domestic robots; robots interacting with people in factories, such as mobile robots or fixedbase multilink manipulators; and competitive settings like drone and car racing Spica2018, Liniger2019.
In this work, we seek solutions to constrained multiplayer dynamic games. In dynamic games, the players’ strategies are sequences of decisions. It is important to notice that, unlike traditional optimization problems, noncooperative games have no “optimal” solution. Depending on the structure of the game, asymmetry between players, etc., different concepts of solutions are possible. In this work, we search for Nashequilibrium solutions. This type of equilibrium models symmetry between the players; All players are treated equally. At such equilibria, no player can reduce its cost by unilaterally changing its strategy. For extensive details about the gametheory concepts addressed in this paper, we refer readers to the work of Basar et al.
Basar1999.Our solver is aimed at finding a Nash equilibrium for multiplayer dynamic games, and can handle general nonlinear state and input constraints. This is particularly important for robotic applications, where the agents often interact through their desire to avoid collisions with one another or with the environment. Such interactions are most naturally represented as (typically nonlinear) state constraints. This is a crucial feature that sets gametheoretic methods for robotics apart from gametheoretic methods in other domains, such as economics, behavioral sciences, and robust control. In these domains, the agent interactions are traditionally represented in the objective functions themselves, and these games typically have no state or input constraints. In mathematics literature, Nash equilibria with constraints are referred to as Generalized Nash Equilibria Facchinei2007. Hence, in this paper we present an augmented Lagrangian solver for finding Generalized Nash Equilibria specifically tailored to robotics applications.
Our solver assumes that players are rational agents acting to minimize their costs. This rational behavior is formulated using the firstorder necessary conditions for Nash equilibria, analogous to the KarushKuhnTucker (KKT) conditions in optimization. By relying on an augmented Lagrangian approach to handle constraints, the solver is able to solve multiplayer games with several agents and a high level of interactions at realtime speeds. Finding a Nash equilibrium for 3 autonomous cars in a freeway merging scenario takes ms. Our primary contributions are:

A general solver for dynamic games aimed at identifying Generalized Nash Equilibrium strategies.

A realtime MPC implementation of the solver mitigating the “frozen robot” problem that arises in complex driving scenarios for nongametheoretic MPC approach. (Fig. 1).

An analysis of the nonuniqueness of Nash equilibria in driving scenarios with constraints and an assessment of the practical impact it has on players’ coordination.

A comparison with iLQGames FridovichKeil2020a. ALGAMES finds Nash equilibria 3 times faster than iLQGames for a fixed constraint satisfaction criterion.
2 Related Work
2.1 Equilibrium Selection
Recent work focused on solving multiplayer dynamic games can be categorized by the type of equilibrium they select. Several works Sadigh2016, Sadigh2016a, Liniger2019, Yoo2012 have opted to search for Stackelberg equilibria, which model an asymmetry of information between players. These approaches are usually formulated for games with two players: a leader and a follower. The leader chooses its strategy first, then the follower selects the best response to the leader’s strategy. Alternatively, a Nash equilibrium does not introduce hierarchy between players; each player’s strategy is the best response to the other players’ strategies. As pointed out in Fisac2019, searching for openloop Stackelberg equilibrium strategies can fail on simple examples. In the context of autonomous driving, for instance, when players’ cost functions only depend on their own state and control trajectories, the solution becomes trivial. The leader ignores mutual collision constraints and the follower has to adapt to this strategy. This behavior can be overly aggressive for the leader (or overly passive for the follower) and does not capture the gametheoretic nature of the problem.
Nash equilibria have been investigated in FridovichKeil2020a, Dreves2018, Spica2018, Britzelmeier2019, Di2018, Di2020, Di2020a. We also take the approach of searching for Nash equilibria, as this type of equilibrium seems better suited to symmetric, multirobot interaction scenarios. Indeed, we have observed more natural behavior emerging from Nash equilibria compared to Stackelberg when solving for openloop strategies.
2.2 GameTheoretic Trajectory Optimization
Most of the algorithms proposed in the robotics literature to solve for gametheoretic equilibria can be grouped into four types: First are algorithms aimed at finding Nash equilibria that rely on decomposition, such as Jacobi or GaussSiedel methods Spica2018, Britzelmeier2019, Wang2019a. These algorithms are based on an iterative bestresponse scheme in which players take turns at improving their strategies considering the other agents’ strategies as immutable. This type of approach is easy to interpret and scales reasonably well with the number of players. However, convergence of these algorithms is not well understood Facchinei2007, and special care is required to capture the gametheoretic nature of the problem Spica2018. Moreover, solving for a Nash equilibrium until convergence can require many iterations, each of which is a (possibly expensive) trajectoryoptimization problem. This can lead to prohibitively long solution times.
Second, there are a variety of algorithms based on dynamic programming. In Fisac2019
, a Markovian Stackelberg strategy is computed via dynamic programming. This approach seems to capture the gametheoretic nature of autonomous driving. However, dynamic programming suffers from the curse of dimensionality and, therefore, practical implementations rely on simplified dynamics models coupled with coarse discretization of the state and input spaces. To counterbalance these approximations, a lowerlevel planner informed by the state values under the Markovian Stackelberg strategy is run. This approach, which scales exponentially with the state dimension, has been demonstrated in a twoplayer setting. Adding more players is likely to prevent realtime application of this algorithm. In contrast, our proposed approach scales polynomially with the number of players (see Section
4.5).Third, algorithms akin to differential dynamic programming have been developed for robust control Morimoto2003 and later applied to gametheoretic problems FridovichKeil2020a, Di2018. This approach scales polynomially with the number of players and is fast enough to run realtime in a MPC fashion FridovichKeil2020a. However, contrary to ALGAMES, this type of approach does not natively handle constraints. Collisionavoidance constraints are typically handled using large penalties that can result in numerical illconditioning which, in turn, can impact the robustness or the convergence rate of the solver. Moreover, it leads to a tradeoff between trajectory efficiency and avoiding collisions with other players.
Finally, algorithms that are analogous to direct methods in trajectory optimization have also been developed Di2020, Di2020a. An algorithm based on a firstorder splitting method that is known to have a linear convergence rate was proposed by Di et al. Di2020a. Di’s experiments show convergence of the algorithm after typically to iterations. A different approach based on Newton’s method has been proposed Di2020, but it is restricted to unconstrained dynamic games. Our solver belongs to this family of approaches. It also relies on a secondorder Newtontype method, but it is able to handle general state and control input constraints. In addition, we demonstrate convergence on relatively complex problems in typically less than iterations.
2.3 Generalized Nash Equilibrium Problems
We focus on finding Nash equilibria for multiplayer games in which players are coupled through shared state constraints (such as collisionavoidance). Therefore, these problems are instances of Generalized Nash Equilibrium Problems (GNEPs). The operations research field has a rich literature on GNEPs Pang2005, Facchinei2006, Facchinei2009, Facchinei2010, Fukushima2011. Exact penalty methods have been proposed to solve GNEPs Facchinei2006, Facchinei2009. Complex constraints such as those that couple players’ strategies are handled using penalties, allowing solution of multiplayer games jointly for all the players. However, these exact penalty methods require minimization of nonsmooth objective functions, which leads to slow convergence rates in practice.
In the same vein, a penalty approach relying on an augmented Lagrangian formulation of the problem has been advanced by Pang et al. Pang2005. This work, however, converts the augmented Lagrangian formulation to a set of KKT conditions, including complementarity constraints. The resulting constraintsatisfaction problem is solved with an offtheshelf linear complementarity problem (LCP) solver that exploits the linearity of a specific problem. Our solver, in contrast, is not tailored for a specific example and can solve general GNEPs. It draws inspiration from the augmented Lagrangian formulation, which does not introduce nonsmooth terms in the objective function, enabling fast convergence. Moreover, this formulation avoids illconditioning, which improves the numerical robustness of our solver.
3 Problem Statement
In the discretized trajectoryoptimization setting with time steps, we denote by the state size, the controlinput size, the state, and the control input of player at the time step . In formulating the game, we do not distinguish between the robot carrying out the computation, and the other agents whose trajectories it is predicting. All agents are modeled equivalently, as is typical in the case of Nashstyle games.
Following the formalism of Facchinei Facchinei2007, we consider the GNEP with players. Each player decides over its control input variables . This is player ’s strategy where denotes the dimension of the control inputs controlled by player and is the dimension of the whole trajectory of player ’s control inputs. By
, we denote the vector of all the players’ strategies except the one of player
. Additionally, we define the trajectory of state variables where , which results from applying all the control inputs decided by the players to a joint dynamical system,(1) 
with denoting the timestep index. The kinodynamic constraints over the whole trajectory can be expressed with equality constraints,
(2) 
The cost function of each player is noted . It depends on player ’s control inputs as well as on the state trajectory , which is shared with all the other players. The goal of player is to select a strategy and a state trajectory that minimizes the cost function . Naturally, the choice of state trajectory is constrained by the other players’ strategies and the dynamics of the system via Equation 2. In addition, the strategy must respect a set of constraints that depends on the state trajectory as well as on the other players strategies (e.g., collisionavoidance constraints). We express this with a concatenated set of inequality constraints . Formally, [2] X, U^νJ^ν(X, U^ν), D(X, U) = 0 C(X, U) ≤0 . The set of Problems (3), forms a GNEP because of the constraints that couple the strategies of all the players. A solution of this GNEP (a generalized Nash equilibrium), is a vector such that, for all , is a solution to (3) with the other players’ strategies fixed to . This means that at an equilibrium point , no player can decrease their cost by unilaterally changing their strategy to any other feasible point.
When solving for a generalized Nash equilibrium of the game, , we identify openloop Nash equilibrium trajectories, in the sense that the whole trajectory is the best response to the other players’ strategies given the initial state of the system . Thus, the control signal is a function of time, not of the current state of the system^{1}^{1}1One might also explore solving for feedback Nash equilibria, where the strategies are functions of the state of all agents. This is an interesting direction for future work. . However, one can repeatedly resolve the openloop game as new information is obtained over time to obtain a policy that is closedloop in the modelpredictive control sense, as demonstrated in Section 7. This formulation is general enough to comprise multiplayer dynamic games with nonlinear constraints on the states and control inputs. Practically, in the context of autonomous driving, the cost function encodes the objective of player , while the concatenated set of constraints, , includes collision constraints coupled between players. We assume differentiability of the constraints and twice differentiability of the cost functions.
4 Augmented Lagrangian Formulation
We propose an algorithm to solve the previously defined GNEP in the context of trajectory optimization. We express the condition that players are acting optimally to minimize their cost functions subject to constraints as an equality constraint. To do so, we first derive the augmented Lagrangian associated with Problem (3) solved by each player. Then, we use the fact that, at an optimal point, the gradient of the augmented Lagrangian is null Bertsekas2014. Therefore, at a generalized Nash equilibrium point, the gradients of the augmented Lagrangians of all players must be null. Concatenating this set of equality constraints with the dynamics equality constraints, we obtain a set of equations that we solve using a quasiNewton rootfinding algorithm.
4.1 Individual Optimality
First, without loss of generality, we suppose that the vector is actually the concatenated set of inequality and equality constraints, i.e., , where is the vector of inequality constraints and is the vector of equality constraints. To embed the notion that each player is acting optimally, we formulate the augmented Lagrangian associated with Problem (3) for player . The dynamics constraints are handled with the Lagrange multiplier term , while the other constraints are dealt with using both a multiplier and a quadratic penalty term specific to the augmented Lagrangian formulation. As a motivation for this differential treatment; one typically handles inequality and highly nonlinear equality constraints with an augmented Lagrangian formulation for its improved robustness. We denote by the Lagrange multipliers associated with the vector of constraints ; is a penalty weight;
(3) 
is a diagonal matrix defined as,
(4) 
where indicates the constraint. It is important to notice that the Lagrange multipliers associated with the dynamics constraints are specific to each player , but the Lagrange multipliers and penalties and are common to all players. Given the appropriate Lagrange multipliers and , the gradient of the augmented Lagrangian with respect to the individual decision variables is null at an optimal point of Problem (3). The fact that player is acting optimally to minimize under the constraints and can therefore be expressed as follows,
(5) 
It is important to note that this equality constraint preserves coupling between players since the gradient depends on the other players’ strategies .
4.2 RootFinding Problem
At a generalized Nash equilibrium, all players are acting optimally and the dynamics constraints are respected. Therefore, to find an equilibrium point, we have to solve the following rootfinding problem,
[2] X, U, μ0, G^ν(X, U, μ^ν) = 0, ∀ ν∈{1, …, M} D(X,U) = 0 , We use Newton’s method to solve the rootfinding problem. We denote by the concatenation of the augmented Lagrangian gradients of all players and the dynamics constraints, , where . We compute the firstorder derivative of with respect to the primal variables and the dual variables that we concatenate in a single vector ,
(6) 
Newton’s method allows us to identify a search direction in the primaldual space,
(7) 
We couple this search direction with a backtracking linesearch Nocedal2006 given in Algorithm 1 to ensure local convergence to a solution using Newton’s Method Nocedal2006 detailed in Algorithm 2.
4.3 Augmented Lagrangian Updates
To obtain convergence of the Lagrange multipliers , we update them with a dualascent step. This update can be seen as shifting the value of the penalty terms into the Lagrange multiplier terms,
(8) 
We also update the penalty weights according to an increasing schedule, with :
(9) 
4.4 Algames
By combining Newton’s method for finding the point where the dynamics is respected and the gradients of the augmented Lagrangians are null with the Lagrange multiplier and penalty updates, we obtain our solver ALGAMES (Augmented Lagrangian GAMEtheoretic Solver) presented in Algorithm 3. The algorithm, which iteratively solves the GNEP, requires as inputs an initial guess for the primaldual variables and initial penalty weights . The algorithm outputs the openloop strategies of all players and the Lagrange multipliers associated with the dynamics constraints .
4.5 Algorithm Complexity
Following a quasiNewton approximation of the matrix Nocedal2006, we neglect some of the secondorder derivative terms associated with the constraints. Therefore, the most expensive part of the algorithm is the Newton step defined by Equation 7. By exploiting the sparsity pattern of the matrix , we can solve Equation 7 using a backsubstitution scheme akin to solving a Riccati equation with complexity . The complexity is cubic in the number of states and the number of control inputs , which are typically linear in the number of players . Therefore, the overall complexity of the algorithm is .
4.6 Algorithm Discussion
Here we discuss the inherent difficulty in solving for Nash equilibria in large problems, and explain some of the limitations of our approach. First of all, finding a Nash equilibrium is a nonconvex problem in general. Indeed, it is known that even for singleshot discrete games, solving for exact Nash equilibria is computationally intractable for a large number of players DaskalakisEtAlSIAMJournalonComputing08ComplexityOfNash. It is, therefore, not surprising that, in our more difficult setting of a dynamic game in continuous space, no guarantees can be provided about finding an exact Nash equilibrium. Furthermore, in complex interaction spaces, constraints can be highly nonlinear and nonconvex. This is the case in the autonomous driving context with collisionavoidance constraints. In this setting, one cannot expect to find an algorithm working in polynomial time with guaranteed convergence to a Nash equilibrium respecting constraints. On the other hand, local convergence of Newton’s method to openloop Nash equilibria has been established in the unconstrained case (that is, starting sufficiently close to the equilibrium, the algorithm will converge to it) Di2020. Our approach solves a sequence of unconstrained problems via the augmented Lagrangian formulation. Each of these problems, therefore, has guaranteed local convergence. However, as expected, the overall method has no guarantee of global convergence to a generalized Nash equilibrium.
Second, our algorithm requires an initial guess for the state and control input trajectories , and the dynamics multipliers . Empirically, we observe that choosing and simply rolling out the dynamics starting from the initial state without any control was a sufficiently good initial guess to get convergence to a local optimum that respects both the constraints and the firstorder optimality conditions. For a detailed empirical study of the convergence of ALGAMES and its failure cases, we refer to Sections 5.5 and 5.6.
Finally, even for simple linearquadratic games, the Nash equilibrium solution is not necessarily unique. In general, an entire subspace of equilibria exists. In this case, the matrix in Equation 7 will be singular. In practice, we regularize this matrix so that large steps
are penalized, resulting in an invertible matrix
.5 Simulations: Design and Setup
We choose to apply our algorithm in the autonomous driving context. Indeed, many maneuvers like lane changing, ramp merging, overtaking, and intersection crossing involve a high level of interaction between vehicles. We assume a single car is computing the trajectories for all cars in its neighborhood, so as to find its own trajectory to act safely among the group. We assume that this car has access to a relatively good estimate of the surrounding cars’ objective functions. Such an estimate could, in principle, be obtained by applying inverse optimal control on observed trajectories of the surrounding cars.
In a real application, the car would compute its strategy as frequently as possible in a recedinghorizon loop to adapt to unforeseen changes in the environment. We demonstrate the feasibility of this approach on complex driving scenarios where a classical predictthenplan architecture fails to overcome the “frozen robot” problem.
5.1 Autonomous Driving Problem
Constraints
Each vehicle in the scene is an agent of the game. Our objective is to find a generalized Nash equilibrium trajectory for all of the vehicles. These trajectories have to be dynamically feasible. The dynamics constraints at time step are expressed as follows,
(10) 
We consider a nonlinear unicycle model for the dynamics of each vehicle. A vehicle state, , is composed of a 2D position, a heading angle and a scalar velocity. The control input is composed of an angular velocity and a scalar acceleration. In addition, it is critical that the trajectories respect collisionavoidance constraints. We model the collision zone of the vehicles as circles of radius . The collision constraints between vehicles are then simply expressed in terms of the position of each vehicle,
(11) 
We also model boundaries of the road to force the vehicles to remain on the roadway. This means that the distance between the vehicle and the closest point, , on each boundary, , has to remain larger than the collisioncircle radius, ,
(12) 
In summary, based on reasonable simplifying assumptions, we have expressed the driving problem in terms of nonconvex and nonlinear coupled constraints.
Cost Function
We use a quadratic cost function penalizing the use of control inputs and the distance between the current state and the desired final state of the trajectory. We also add a quadratic penalty on being close to other cars,
(13)  
controls the distance at which this penalty is “activated”, and controls its magnitude.
5.2 Comparison to iLQGames
In order to evaluate the merits of ALGAMES, we compare it to iLQGames FridovichKeil2020a which is a DDPbased algorithm for solving general dynamic games. Both algorithms solve the problem by iteratively solving linearquadratic approximations that have an analytical solution Basar1999. For iLQGames, the augmented objective function differs from the objective function, , by a quadratic term penalizing constraint violations,
(14) 
Where is defined by,
(15) 
Here
is an optimization hyperparameter that we can tune to satisfy constraints. For ALGAMES, the augmented objective function,
, is actually an augmented Lagrangian, see Equation 3. The hyperparameters for ALGAMES are the initial value of and its increase rate defined in Equation 9.5.3 Timing Experiments
We evaluate the performance of both algorithms in two scenarios (see Figure 2) with the number of players varying from two to four. To compare the speed of both algorithms, we set the termination criterion as a threshold on constraint violations . The timing results averaged over 100 samples are presented in Table 3.a. First, we notice that both algorithms achieve realtime or nearrealtime performance on complex autonomous driving scenarios (the horizon of the solvers is fixed to ).
We observe that the speed performance of ALGAMES and iLQGames are comparable in the ramp merging scenario. For this scenario, we tuned the value of the penalty for iLQGames to . Notice that for all scenarios the dimensions of the problem are scaled so that the velocities and displacements are all the same order of magnitude. For the intersection scenario, we observe that the twoplayer and fourplayer cases both have much higher solve times for iLQGames compared to the 3player case. Indeed, in those two cases, we had to increase the penalty to , otherwise the iLQGames would plateau and never reach the constraint satisfaction criterion. This, in turn, slowed the algorithm down by decreasing the constraint violation convergence rate.
5.4 Discussion
The main takeaway from these experiments is that, for a given scenario, it is generally possible to find a suitable value for that will ensure the convergence of iLQGames to constraint satisfaction. With higher values for , we can reach better constraint satisfaction at the expense of slower convergence rate. In the context of a receding horizon implementation (MPC), finding a good choice of that would suit the whole sequence of scenarios encountered by a vehicle could be difficult. In contrast, the same hyperparameters and were used in ALGAMES for all the experiments across this paper. This supports the idea that, thanks to its adaptive penalty scheme, ALGAMES requires little tuning.
While performing the timing experiments, we also noticed several instances of oscillatory behavior for iLQGames. The solution would oscillate, preventing it from converging. This happened even after an adaptive regularization scheme was implemented to regularize iLQGames’ Riccati backward passes. Oscillatory behavior was not seen with ALGAMES. We hypothesize that this is due to the dual ascent update coupled with the penalty logic detailed in Equations 8 and 4, which add hysteresis to the solver.
5.5 Monte Carlo Analysis
To evaluate the robustness of ALGAMES, we performed a Monte Carlo analysis of its performance on a ramp merging problem. First, we set up a roadway with hard boundaries as pictured in Fig. 2.a. We position two vehicles on the roadway and one on the ramp in a collisionfree initial configuration. We choose a desired final state where the incoming vehicle has merged into the traffic. Our objective is to generate generalized Nash equilibrium trajectories for the three vehicles. These trajectories are collisionfree and cannot be improved unilaterally by any player. To introduce randomness in the solving process, we apply a random perturbation to the initial state of the problem. Specifically, we perturb by adding a uniformly sampled noise. This would typically correspond to displacing the initial position of the vehicles by , changing their initial velocity by and their heading by .
We observe in Figure 3.b, that ALGAMES consistently finds a satisfactory solution to the problem using the same hyperparameters and . Out of the 1000 samples converged to constraint satisfaction while respecting the optimality criterion . By definition, is a merit function for satisfying optimality and dynamics constraints. We also observe that the solver converges to a solution in less than for of the samples. These empirical data tend to support the fact that ALGAMES is able to solve the class of ramp merging problem quickly and reliably.
For comparison, we present in Figure 3.b the results obtained with iLQGames. We apply the same constraint satisfaction criterion . We fixed the value of the penalty hyperparameter for all the samples as it would not be a fair comparison to tune it for each sample. Only 3 samples did not converge with iLQGames, this is a performance comparable to ALGAMES for which 5 samples failed to converge. However, we observe that iLQGames is 3 times slower than ALGAMES with an average solve time of ms compared to ms and require on average 4 times more iterations (9 against 41).
5.6 Solver Failure Cases
The Monte Carlo analysis allows us to identify the typical failure cases of our solver, i.e. the cases where the solver does not satisfy the constraints or the optimality criterion. Typically in such cases, the initial guess, which consists of rolling out the dynamics with no control, is far from a reasonable solution. Since the constraints are ignored during this initial rollout, the car at the back can overtake the car at the front by driving through it. This creates an initial guess where constraints are strongly violated. Moreover, we hypothesize that the tight roadwayboundary constraints tend to strongly penalize solutions that would ’disentangle’ the car trajectories because they would require large boundary violation at first. Therefore, the solver gets stuck in this local optimum where cars overlap each other. Sampling several initial guesses with random initial control inputs and solving in parallel could reduce the occurrence of these failure cases. Also, being able to detect, reject, and resample initial guesses when the initial car trajectories are strongly entangled could also improve the robustness of the solver.
6 NonUniqueness of Nash Equilibria
A Nash equilibrium corresponds to a situation where all players are acting optimally given the other players’ strategies. This is a way for players to compete in a coordinated fashion without communication. However, if the Nash equilibrium is nonunique, the coordination is ambiguous and players have to decide individually which Nash equilibrium to follow. This can lead to inconsistencies. The nonuniqueness of Nash equilibrium solutions has been observed in practical robotics applications such as autonomous driving Peters2020. Peters et al. identified isolated clusters of solutions in unconstrained Nash equilibrium problems and proposed an estimation method to improve players’ coordination. In this section, we detail several underlying causes of nonuniqueness that arise in practical robotics scenarios. Additionally, we present the behavior of ALGAMES in such circumstances.
6.1 LinearQuadratic Dynamic Games
Linearquadratic (LQ) dynamic games are an important building block for optimization algorithms relying on sequentialquadratic approximations such as ALGAMES or iLQGames FridovichKeil2020a. The conditions for the existence and uniqueness of a Nash equilibrium have been extensively studied Basar1976, Abraham2019. In the continuoustime setting, Eisele characterized the different solution regimes for the LQ game, including nonexistence and nonuniqueness Eisele1982. In the discretetime setting, the openloop Nash equilibrium problem is equivalent to a static quadratic game (i.e., a onestep game). For such problems, the Nash equilibrium solutions can either be nonexistant, can form an affine subspace, or be a single point in the case of a unique solution.
Proof Sketch
: We focus on the twoplayer case, the result can easily be extended to the player case. We denote, and , the strategy and quadratic cost function of player ,
(16) 
The firstorder necessary conditions for optimality of a Nash equilibrium, , can be written as an affine equation,
(17) 
The secondorder necessary conditions are independent of the Nash equilibrium point considered. They require positive semidefiniteness of the matrices , for all . Therefore, any point, , in the affine subspace defined by Equation 17, will respect both the firstorder and secondorder necessary conditions for optimality.
In case of a unique Nash Equilibrium, ALGAMES converges in one Newton iteration to the solution. When the Nash equilibrium solutions form an affine subspace, ALGAMES converges to the point in the subspace closest to the initial guess. This is due to the regularization added to the Jacobian of the KKT condition, , defined in Equation 6.
6.2 Isolated Nash Equilibria
We have seen that an LQ game can generate an affine Nash equilibrium subspace. Thus, it cannot lead to multiple isolated Nash equilibria. However, in general, a dynamic game can admit multiple isolated Nash equilibria as highlighted by Peters et al. Peters2020. In unconstrained autonomous driving scenarios, they generally appear when collisionavoidance costs are introduced. These costs are nonconvex and introduce a coupling between the players’ strategies. Typically, these isolated Nash equilibria correspond to “topologically” different driving strategies. For instance, in a ramp merging scenario, the merging vehicle can merge in front of or behind an incoming vehicle (Figure 4.a). The equilibrium point to which ALGAMES converges is typically the closest to the initial guess, thanks to the regularization scheme. This is a desirable property, especially in the MPC setting , because it prevents the replanned trajectory from oscillating between different Nash equilibria.
6.3 Generalized Nash Equilibrium
Identifying the solution set of a GNEP remains a major challenge as pointed out by Fisher et al. in an extensive survey Fischer2014. In general, the solution set of the GNEP can be constituted of one or many isolated points or even nonisolated points. Theoretical results in this domain often rely on strong assumptions, such as convexity of the feasible set, absence of shared constraints, or decoupled cost functions Dreves2018. All these assumptions could be violated in a typical robotic scenario. Indeed, collisionavoidance constraints are shared and nonconvex. Similarly, collisionavoidance costs or congestion terms introduce coupling between the players’ costs.
We explore the structure of the generalized Nash equilibrium (GNE) solutions in the presence of shared collisionavoidance constraints. We denote, , the collisionavoidance constraint between player and player at time step . For each collisionavoidance constraint , we introduce two Lagrange multipliers and ; one for each player. We remark that, for a single constraint, we add two Lagrange multipliers. We denote, , the number of collisionavoidance constraints. By concatenating these constraints with the residual vector , we add entries and rows to its Jacobian. We denote and the “augmented” residual vector and Jacobian matrix. We need to differentiate the residual , with respect to the Lagrange multipliers associated with the collision constraints. This adds columns to the Jacobian . Thus, the Jacobian is an underdetermined linear system, with more columns than rows (Figure 4.b). However, only active collisionavoidance constraints should be included in the Jacobian. Therefore, the Jacobian only has more columns than rows, where denotes the number of active collisionavoidance constraints. Thus, the nullspace of the underdetermined linear system is at least of dimension (Figure 4.b). Consequently, the solution set of a GNEP can potentially be composed of nonisolated points and could span in multiple dimensions locally around a known equilibrium point.
We explore this nullspace at a Nash equilibrium point by slightly disturbing the current generalized Nash equilibrium in one of the nullspace’s directions (Figure 5.a). We obtain a continuum of GNE. Additionally, Figures 5.b and 5.c, present the two main directions in which the solution can drift while remaining a GNE. The nullspace was of dimension
, which corresponds to the number of active constraints at the equilibrium point. Yet, we notice that most of the trajectory variability is captured by a limited number of eigenvectors. We remark that the two principal eigenvectors have an elegant interpretation: they both favor one vehicle over the others. Additionally, they nicely show how disturbing the trajectory of one player influences the trajectories of the other players through the collision constraints. Finally, by combining these two eigenvectors and stepping in the resulting direction, one could favor any of the three vehicles.
6.4 ALGAMES’ Convergence to Normalized Nash Equilibrium
We observe that the multipliers and associated with the shared constraint, , are equal at every iteration of the solver. Indeed, we can assume that the multipliers are initialized with the same value (typically zero). Moveover, these multipliers are updated with identical dual ascent updates, defined in Equation 8,
(18)  
(19)  
(20)  
(21) 
Here, denotes the iteration index. A consequence of this trivial recursion is that the multipliers associated with the same shared constraint are equal at the solution. Therefore, if ALGAMES converges, it converges to a Normalized Nash Equilibrium (NNE) in the sense of Rosen Rosen1965. An NNE is a GNE with the additional requirement that the multipliers associated with shared constraints are equal. This reasoning was applied by Dreves to a potential reduction method Dreves2011. We transcribe it in the augmented Lagrangian context. At an NNE, because the multipliers are equal, the price to pay for violating the collisionavoidance constraint is the same for both players. This can be interpreted as enforcing a notion of “fairness” between the players in addition to optimality. One interesting characteristic of NNE, compared to GNE, is that they are not subject to the nullspace issue described in Section 6.3. Indeed, thanks to the additional constraints enforcing equality between multipliers, active constraints no longer introduce more columns than rows in the KKT system.
7 MPC Implementation of ALGAMES
In this section, we propose an MPC implementation of the algorithm that provides us with a feedback policy instead of an openloop strategy and demonstrates realtime performance. We compare this MPC to a nongametheoretic baseline on a crowded ramp merging which is known to be conducive to the “frozen robot” problem.
7.1 MPC Feedback Policy
The strategies identified by ALGAMES are openloop Nash equilibrium strategies. They are sequences of control inputs. On the contrary, DDPbased approaches like iLQGames solve for feedback Nash equilibrium strategies that provide a sequence of control gains. In the MPC setting, we can obtain a feedback policy with ALGAMES by updating the strategy as fast as possible and only executing the beginning of the strategy. This assumes a fast update rate of the solution. To support the feasibility of the approach, we implemented an MPC on the ramp merging scenario described in Figure 2.a. There are 3 players constantly maintaining a 40 time step strategy with 3 seconds of horizon. We simulate 3 seconds of operation of the MPC by constantly updating the strategies and propagating noisy unicycle dynamics for each vehicle. We compile the results from 100 MPC trajectories in Table 3.c. We obtain a Hz update frequency for the planner on average. We observe similar performance on the intersection problem defined in Figure 2.b, with an update frequency of Hz.
7.2 “Unfreezing” the Robot
To illustrate the benefits of using ALGAMES in a recedinghorizon loop, we compare it to a nongametheoretic baseline MPC. With this baseline, the prediction step and the planning step are decoupled. Specifically, each agent predicts the trajectories of the surrounding vehicles by propagating straight, constant velocity trajectories. Then, each agent plans for itself assuming these predicted trajectories are immutable obstacles. We test these two controllers on a challenging scenario where a vehicle has to merge on a crowded highway as presented in Figure 1. We perform a Monte Carlo analysis by uniformly sampling the initial state, , around a nominal state with perturbations corresponding to a longitudinal displacement, lateral displacement, in angular displacement for each car. Given the initial state, the vehicle on the ramp should be able to merge between the blue and orange cars or the orange and green cars, taking the and place respectively. However, waiting for all cars to pass before merging into place is not a desirable behavior. Indeed, with such a policy, the merging vehicle has to slow down significantly and could get stuck on the ramp if the highway does not clear. We run ALGAMES in a receding horizon loop and the baseline MPC to generate second trajectories for 100 different initial states. We record the position of the merging vehicle at the end of the simulation and compile the results in Figure 6.a.
We observe that the “frozen robot” problem occurs with the baseline MPC for of the simulations. An interpretation of this result is that the vehicle on the ramp cannot find a merging maneuver that is not colliding with its constantvelocity trajectory predictions. Since there is no feasible merging maneuver, the only option left is to wait for the other vehicles to pass before merging.
On the contrary, by running ALGAMES in a recedinghorizon loop, the vehicle merges into traffic in or place in of the simulations (Figure 6.a). ALGAMES avoids the “frozen robot” pitfall in most cases by gradually adjusting its velocity to merge with minimal disruption to the traffic (Figure 1).
7.3 NonUniqueness of Nash Equilibria in Practice
We assess the effect of the nonuniqueness of Nash equilibria in the MPC context. We focus on the coordination issue, that players may face when there exists multiple Nash equilibria. In our experiment, each car independently runs ALGAMES as an MPC policy. Each car plans for itself and predicts the other vehicles’ trajectories. We purposefully provide each player with a very different initial guess, in order to generate a mismatch between the Nash equilibrium solution that each player converges to. We simulate this on a ramp merging scenario with two players. In this scenario, an example of Nash equilibrium mismatch could be that both players think they let the oher player go first (Figure 6.b). The results, presented in Figure 6.c, suggest that most of the mismatches disappear rapidly after the initialization, i.e both players converges to the same Nash equilibrium. This can happen, for instance, when one Nash equilibrium is no longer feasible because it violates the bounds on the control inputs or the boundaries of the road. This positive results mitigates the concern caused by the potential occurrence of nonunique Nash equilibria. Nevertheless, it is also important to analyze the failure cases, where the two Nash equilibrium solutions found by the two players do not coincide. Typically, in these cases, each player’s solution remains fairly constant and does not oscillate between multiple equilibria. In such circumstances, it would be appropriate to estimate the equilibrium that the other player is following in order to switch to this equilibrium. Peters et al. demonstrated the feasibility of this approach in similar scenarios, using a particle filter Peters2020.
8 Conclusions
We have introduced a new algorithm for finding constrained Nash equilibrium trajectories in multiplayer dynamic games. We demonstrated the performance and robustness of the solver through a Monte Carlo analysis on complex autonomous driving scenarios including nonlinear and nonconvex constraints. We have shown realtime performance for up to 4 players and implemented ALGAMES in a recedinghorizon framework to give a feedback policy.
We empirically demonstrated the ability of ALGAMES to mitigate the “frozen robot” problem in comparison to a nongametheoretic receding horizon planner.
The results we obtained from ALGAMES are promising, as they seem to let the vehicles share the responsibility for avoiding collisions, leading to naturallooking trajectories where players are able to negotiate complex, interactive traffic scenarios that are challenging for traditional, nongametheoretic trajectory planners. For this reason, we believe that ALGAMES could be a very efficient tool to generate trajectories in situations where the level of interaction between players is strong. Our implementation of ALGAMES is available at https://github.com/Robotic
ExplorationLab/ALGAMES.jl.
Comments
There are no comments yet.