I Introduction
Controlling a robot in an environment where it interacts with other actors is a complex task. Traditional approaches in the literature adopt a partitioned approach. First, predictions of other actors’ trajectories are computed, then they are fed into a trajectory optimizer that considers them as immutable obstacles. This approach is limiting because the effect of the robot’s trajectory on the other actors is ignored. Moreover, it can lead to the “freezing robot” problem that arises when the planner finds that all paths to the goal are unsafe [Trautman2010]. The consequence is that the robot stops moving or executes unnecessary anticollision maneuvers.
For this reason, dealing with the gametheoretic aspect of the planning problem is a critical issue that has a broad range of applications. For instance, in autonomous driving, ramp merging, lane changing, intersection crossing, and overtaking maneuvers comprise some degree of gametheoretic interactions [Sadigh2016, Sadigh2016a, FridovichKeil2019a, Dreves2018, Fisac2019, Schmerling2018]. Other potential applications include mobile robots navigating in crowds, like package delivery robots, tour guides, or domestic robots. Also, robots interacting with people in factories such as mobile robots, or fixedbase multilink manipulators. Finally, it is applicable in competitive settings, e.g. drone and car racing [Spica2018, Liniger2019].
In this work, we propose to solve constrained multiplayer generalsum dynamic games. In dynamic games, the players’ strategies are sequences of decisions. It is important to notice that unlike traditional optimization problem, noncooperative games have no “optimal” solution. Depending on the structure of the game, asymmetry between players, etc., one achieves different concepts of solutions. In this work, we search for Nash equilibrium solutions. This type of equilibrium models symmetry between the players; all players are treated equally. At such equilibria, no player can reduce his cost by unilaterally changing his strategy. For extensive details about the gametheory concepts addressed in this paper, we refer readers to the work of Bressan
[Bressan2010] and Basar et al. [Basar1999].Our solver is aimed at finding a Nash equilibrium for multiplayer dynamic games, and can handle general nonlinear state and input constraints. This is particularly important for robotic applications, where the agents often interact through their desire to avoid collisions with one another or with the environment. Such interaction is most naturally, and most correctly, represented as (typically nonlinear) state constraints. This is a crucial point that sets gametheoretic methods for robotics apart from gametheoretic methods in other domains, such as economics, behavioral sciences, and robust control. In these domains, the agent interactions are traditionally represented in the objective functions themselves, and these games typically have no state or input constraints. In mathematical game theory, Nash equilibria with constraints are referred to as Generalized Nash Equilibria [Facchinei2007]. Hence, in this paper we present an augmented Lagrangian solver for finding Generalized Nash Equilibria specifically tailored to robotics applications.
Our solver assumes that players are rational agents acting to minimize their costs. This rational behavior is formulated using the firstorder necessary condition for Nash equilibria, analogous to the KarushKuhnTucker (KKT) conditions in optimization. By relying on an augmented Lagrangian approach to robustly handle constraints, the solver is able to solve multiplayer games with numerous agents and a high level of interactions at speeds approaching realtime. Our primary contributions are:

A general solver for dynamic games aimed at identifying Generalized Nash Equilibrium strategies.

Demonstration of the solver’s speed, robustness, and scalability in autonomous driving applications (Fig. 1).
Ii Related Work
Iia Equilibrium Selection
Recent work focused on solving multiplayer dynamic games can be categorized by the type of equilibrium they select. Several works [Sadigh2016, Sadigh2016a, Liniger2019, Yoo2012] opted for the search of Stackelberg equilibria, which models an asymmetry of information between players. It is usually formulated for games with two players, a leader and a follower. The leader chooses his strategy first, then the follower selects the best response to the leader’s strategy. Alternatively, a Nash equilibrium does not introduce hierarchy between players; each player’s strategy is the best response to the other players’ strategies. As pointed out in [Fisac2019], searching for openloop Stackelberg equilibrium strategies can fall flat on simple examples. In the context of autonomous driving, for instance, when players’ cost functions only depend on their own state and control trajectories, the solution becomes trivial. The leader can ignore mutual collision constraints and the follower has to adapt to this strategy. This behavior can be overly aggressive for the leader (or overly passive for the follower) and does not capture the gametheoretic nature of the problem.
Search for Nash equilibria has been investigated in [FridovichKeil2019a, Dreves2018, Spica2018, Britzelmeier2019]. We also take the approach of searching for Nash equilibria, as this type of equilibrium seems better suited to symmetric, multirobot interaction scenarios. Indeed, we have observed more natural behavior emerging from Nash equilibria compared to Stackelberg when solving for openloop strategies.
IiB GameTheoretic Trajectory Optimization
Most of the algorithms proposed in the robotics literature to solve for gametheoretic equilibria can be grouped into three types. First, algorithms relying on a decomposition method such as Jacobi or GaussSiedel methods [Spica2018, Britzelmeier2019, Wang]. These algorithms are based on an iterative best response scheme. All the players take turns at improving their strategies, considering the other agents’ strategies as immutable. It is aimed at finding Nash equilibria. This type of approach is easy to interpret and handles games with numerous players well. However, convergence of these algorithms is not well understood [Facchinei2007], and special care is required to capture the gametheoretic nature of the problem [Spica2018]. Moreover, solving for a Nash equilibrium until convergence can require many iterations, each of which is a (possibly expensive) trajectory optimization problem. This can lead to prohibitively long solution times.
Second, there are a variety of algorithms based on dynamic programming. In [Fisac2019]
, a Markovian Stackelberg strategy is computed via dynamic programming. This approach seems to capture the gametheoretic nature of autonomous driving. However, dynamic programming suffers from the curse of dimensionality, and therefore relies on a simplified dynamics model coupled with a coarse discretization of the state and input space. To counterbalance these approximations, a lowerlevel planner informed by the state values under the Markovian Stackelberg strategy is run. This approach, which scales exponentially with the state dimension, has only been demonstrated in a twoplayer setting. Adding more players would prevent realtime application of this algorithm. Our proposed approach, on the contrary, scales reasonably with the number of players (see Fig.
4).Finally, algorithms akin to differential dynamic programming have been developed for robust control [Morimoto2003] and later applied to gametheoretic problems [FridovichKeil2019a]. This approach scales polynomially with the number of players and is potentially fast enough to run realtime in a model predictive control (MPC) fashion. However, this approach does not handle constraints. Collisionavoidance constraints are typically handled using large penalties that can result in numerical illconditioning and a brittle solver. Moreover, it leads to a tradeoff between trajectory efficiency and avoiding collisions with other players. This approach seems questionable in the autonomous driving context. Our approach, however, can enforce nonlinear state and input constraints in a rigorous way.
IiC Generalized Nash Equilibrium Problems
As mentioned above, we focus on finding Nash equilibria for multiplayer games in which players are coupled through shared state constraints (such as collisionavoidance constraints). Therefore, these problems are instances of Generalized Nash Equilibrium Problems (GNEPs). The operations research field has a rich literature on GNEPs [Pang2005, Facchinei2006, Facchinei2009, Facchinei2009a, Fukushima2011]. Exact penalty methods have been proposed to solve GNEPs [Facchinei2006, Facchinei2009]. Complex constraints such as those that couple players’ strategies are handled using penalties. This allows solution of multiplayer games jointly for all the players, while still being able to reason about complex constraints. However, these exact penalty methods require minimization of nonsmooth objective functions, which turns out to be slow in practice. In the same vein, a penalty approach relying on an augmented Lagrangian formulation of the problem has been advanced by Pang et al. [Pang2005]. This work, however, converts the augmented Lagrangian formulation to a set of KKT conditions, including complementarity constraints. The resulting constraintsatisfaction problem is solved with an offtheshelf linear complementarity problem (LCP) solver that exploits the linearity of a specific problem. Our solver, on the contrary, is not tailored for a specific example and can solve general GNEPs. It draws inspiration from this augmented Lagrangian formulation, which does not introduce nonsmooth terms in the objective function so the solution can be found quickly. Moreover, this formulation avoids illconditioning, which makes our solver numerically robust.
Iii Problem Statement
Following the formalism of Facchinei [Facchinei2007], we consider the GNEP with players. Each player controls the variables . We denote by
the concatenated vector of the individual decision variables,
(1) 
with dimension . By , we denote the vector of all the players’ decision variables except those of player . The cost function of each player is noted . It depends on player ’s variables as well as on all the other players’ variables . The goal of player is to select a strategy that minimizes his cost function , given the other players’ strategies . In addition, the strategy must belong to a set , and we express this constraint with a concatenated set of inequality constraints . Formally, [2] p^νJ^ν(p^ν, p^ν), C^ν(p^ν, p^ν) ≤0 . A solution of the GNEP (a generalized Nash equilibrium), is a vector such that for all , is a solution to (III) with the other players’ strategies fixed to . This means that at an equilibrium point , no player can decrease his cost by unilaterally changing his strategy to any other feasible point.
In the discretized trajectory optimization setting with time steps, we denote by the state size, the control input size, the state, and the control input of player at the time step . In this context, the decision variables of each player designate the primal variables associated with this player. They are the sequences of states and control inputs of player , i.e.
(2) 
Thus, when solving for a generalized Nash equilibrium of the game , we identify openloop Nash equilibrium trajectories, in the sense that the control signal is a function of time, not of the state variables of the players. However, one can repeatedly resolve the openloop game as new information is obtained over time to obtain a policy that is closedloop in the modelpredictive control sense. The cost function encodes the objective of player . The concatenated set of constraints includes dynamics constraints and, in the context of autonomous driving, collision constraints coupled between players. This formulation is general enough to comprise multiplayer generalsum dynamic games with nonlinear constraint on the states and control inputs.
Iv Augmented Lagrangian Formulation
We propose an algorithm to solve the previously defined GNEP in the context of trajectory optimization. We express the fact that players are acting optimally to minimize their cost functions under constraints as an equality. To do so, we first derive the augmented Lagrangian associated with (III) solved by each player. Then, we use the fact that, at an optimal point, the gradient of the augmented Lagrangian is null [Bertsekas2014]. Therefore, at a generalized Nash equilibrium point, the gradients of the augmented Lagrangians of all players must be null. This is a set of equality constraints that we solve using a quasiNewton rootfinding algorithm.
Iva Individual Optimality
First, without loss of generality, we suppose that the vector is actually the concatenated set of inequality and equality constraints, i.e. , where is the vector of inequality constraints and is the vector of equality constraints. To embed the notion that each player is acting optimally, we formulate the augmented Lagrangian associated with (III) for player . We denote by the Lagrange multipliers associated with the vector of constraints ; is a penalty weight vector.
(3) 
Where is a diagonal matrix defined as,
(4) 
where indicates the constraint. Given the appropriate Lagrange multipliers , the gradient of the augmented Lagrangian with respect to the individual primal variables is null at an optimal point of (III). The fact that player is acting optimally to minimize under the constraints can therefore be expressed as follows,
(5) 
It is important to note that this equality constraint preserves coupling between players since the gradient depends on the other players’ strategies .
IvB RootFinding Problem
At a generalized Nash equilibrium, all players are acting optimally. Therefore, to find an equilibrium point, we have to solve the following rootfinding problem,
[2] p0, G^ν(p^ν, p^ν) &= 0, ∀ ν∈{1, …, M} .
We use Newton’s method to solve the rootfinding problem. We denote by the concatenation of the augmented Lagrangian gradients of all players . We compute the first order derivative of with respect to all primal variables, . Newton’s method allows us to identify a search direction in the primal variables space,
(6) 
We couple this search direction with a backtracking linesearch [Nocedal2006] detailed in Algorithm 1 to ensure local convergence to a solution using Newton’s Method [Nocedal2006] presented in Algorithm 2.
IvC Augmented Lagrangian Updates
To obtain convergence of the Lagrange multipliers , we update them with a dualascent step. This update can be seen as shifting the value of the penalty terms into the Lagrange multiplier terms,
(7) 
We also update the penalty weights according to an increasing schedule, with :
(8) 
IvD Algames
By combining Newton’s method for finding the point where gradients of the augmented Lagrangians are null with the Lagrange multiplier and penalty updates, we obtain our solver ALGAMES (Augmented Lagrangian GAMEtheoretic Solver) presented in Algorithm 3. The algorithm, which iteratively solves the GNEP, requires as inputs an initial guess for the primal variables and initial penalty weights . The algorithm outputs the primal variables containing the openloop strategies of all players. Finding a Nash equilibrium is a nonconvex problem in general. There is, therefore, no guarantee about convergence to the global optimum, and our algorithm requires a reasonable initial guess to converge.
V Simulations: Design and Setup
We choose to apply our algorithm in the autonomous driving context. Indeed, many maneuvers like lane changing, ramp merging, overtaking, and intersection crossing involve a high level of interaction between vehicles. Our gametheoretic planner could improve performance in these interactive tasks compared to traditional planners that do not consider coupled interactions among all the vehicles. We assume a single car is computing the trajectories for all cars in its neighborhood, so as to find its own trajectory to act safely among the group. In a real application, this computation would be repeated as frequently as possible in an MPC fashion.
Va Autonomous Driving Problem
VA1 Constraints
Each vehicle in the scene is an actor of the game. Our objective is to find a generalized Nash equilibrium trajectory for each vehicle. These trajectories have to be dynamically feasible. The dynamics constraints at time step are expressed as follows,
(9) 
Although the solver is able to deal with nonlinear constraints arising from complex dynamics models, we consider only doubleintegrator dynamics. A vehicle state is composed of a 2D position and a 2D velocity. The control input is the 2D acceleration. The dynamics constraints can be expressed as,
(10) 
In addition, it is critical that the trajectories respect collisionavoidance constraints. We model the collision zone of the vehicles as circles of radius . The collision constraints between vehicles are then simply expressed in terms of the position of each vehicle,
(11) 
We also model boundaries of the road to force the vehicles to remain on the roadway. This means that the distance between the vehicle and the closest point, , on each boundary, , has to remain larger than the collision circle radius, ,
(12) 
Finally, we enforce a final state constraint on a subset of the state dimensions. With this constraint we can enforce, for instance, a final velocity or a final position of the vehicle along a particular direction. In summary, based on reasonable simplifying assumptions, we have expressed the driving problem in terms of both linear individual constraints and nonconvex coupled constraints.
VA2 Cost Function
We use a quadratic cost function penalizing the use of control inputs and the distance between the current state and the desired final state of the trajectory,
(13) 
This cost function only depends on the decision variables of vehicle . Players’ behaviors are coupled only through collision constraints. We could also add terms depending on other vehicles’ strategies, such as a congestion penalty.
VB Driving Scenarios
We test our solver on three different driving scenarios involving strong interactions between vehicles:
VB1 Ramp Merging
First, we set up a roadway with hard boundaries as pictured in Fig. 1 to demonstrate a rampmerging maneuver. We position multiple vehicles on the roadway in a collisionfree initial configuration. We choose a reasonable desired final state where the incoming vehicle has merged into the traffic. We purposefully place numerous players in a relatively confined space to maximize the level of interaction between players. Our objective is to generate generalized Nash equilibrium trajectories for all the vehicles. These trajectories are collisionfree and cannot be improved unilaterally by any player.
VB2 Lane Changing
The objective for each vehicle is to change lanes while avoiding collisions (Fig. 6). This situation is challenging because it involves a high level of negotiation [Schmerling2018] between drivers in a realworld setting, which results in strongly coupled trajectories.
VB3 Overtaking
A fast vehicle is placed behind a slower one (Fig. 7). The faster vehicle performs an overtaking maneuver to maintain its desired speed.
Vi Simulations: Results
Via Robustness, Speed, and Scalability
ViA1 Robustness
To get a better understanding of the algorithm, we plot the L1norm of the concatenated gradients of the individual Lagrangians and the condition number of the second order derivative matrix in Fig. 2. It corresponds to the ramp merging experiment presented in Figure 5. The gradient curve, similar to a sawtooth wave, surges in value due to dualascent updates that impact the value of the Lagrangians. The rootfinding algorithm gets the gradient to converge towards zero in few iterations. We observe that the condition number of incrementally increases after each penalty update. However, it remains in a reasonable range during the solve. This reasonable conditioning of the numerical problem, combined with the consistent behavior of the rootfinding method, exhibits the robustness of the solver. We also plot constraint satisfaction in Fig. 3, where we observe a linear convergence of the maximum constraint violations.
ViA2 Scalability
Scalability of the algorithm to scenarios with more than two players is highly desirable. Indeed, driving problems like lane merging often involve 3 or 4 players. We solved for fivesecond trajectories ( time steps) while increasing number of actors from 2 to 8 on one core of an AMD Ryzen 2950x processor. Fig. 4 demonstrates nearrealtime performance on scenarios with 2 and 3 players (2.5s and 6.1s respectively). The solving time increases reasonably with the number of players. Compared to other approaches that scale exponentially with the number of players [Fisac2019], our method is still tractable for up to 8 players (oneminute solve time for a 5s trajectory).
ViB Results From Driving Scenarios
ViB1 Ramp Merging
As pictured in Fig. 5, we observe that the merging vehicle in blue is squeezing between the other two vehicles. They are adapting their trajectories to let the blue vehicle merge smoothly. We also see that the blue vehicle is taking a trajectory that accommodates the other vehicles by squeezing against the ramp boundary represented by a black line on Fig. 5. The algorithm demonstrates near realtime performance as it takes 6.1s to solve the 5s trajectory on one core of an AMD Ryzen 2950x processor.
We then test the solver on a more complex problem with 5 vehicles. In addition to the 3 rampmerging vehicles, we add 2 vehicles and one of them is performing a lane change. Fig. 1 presents the highly coupled trajectories that we obtain. We observe that the red and green vehicles are slightly nudging on their right to accommodate for the yellow vehicle overtaking them. This example demonstrates the robustness of the solver to complex multiplayer interactions as well as its scalability. This 5s trajectory is solved in 25.1s.
ViB2 Lane Changing
In Fig. 6, we see the 5s trajectory computed in 2.3s in the lane changing scenario. The orange vehicle starts behind the blue one, but with a higher desired speed. To respect its desired speed while achieving lane change, it passes the blue vehicle before changing lanes.
ViB3 Overtaking
The fivesecond overtaking trajectory computed in 4.2s is presented in Fig. 7. We simply place a vehicle with high speed behind a slower vehicle. The cost function is strongly penalizing deviation from the desired speed along the roadway axis and encouraging the vehicles to end their trajectories in the right lane. This is sufficient to trigger an overtaking maneuver by the faster vehicle. The slower vehicle slightly nudges towards the boundary of the road to accommodate the overtaking vehicle’s trajectory.
ViB4 Rich Autonomous Behavior
Our solver can handle cost functions that depend on the decision variables of all the players . However, we tested our solver with a cost function only dependent on the individual decision variables of each player . In this simple setting, the openloop Stackelberg equilibrium is trivial [Sadigh2016, Sadigh2016a]. The leader chooses his strategy first, ignoring collision constraints. The follower then selects a strategy that has to account for the collision constraints considering the leader’s trajectory as immutable. On the contrary, our approach converges to trajectories that account for the individual objective of all players while sharing the responsibility of avoiding collisions, even with “egoistic” cost functions. The conclusion from these experiments is that solving for Nash equilibrium apparently produces naturallooking trajectories.
Vii Conclusions
We have introduced a new algorithm for finding Nash equilibrium trajectories. We demonstrated the speed and robustness of the solver on complex autonomous driving scenarios including nonlinear and nonconvex constraints. We have shown nearrealtime performance for up to 3 players. In a realworld driving application, replanning would be performed as frequently as possible to give a feedback policy in the sense of MPC. Parallelizing the computation of both the sparse matrix and the gradient should lead to large reductions in solution time, enabling true realtime performance in many realistic scenarios. Indeed, these computations are decomposable across the number of players and the number of time steps. We intend to exploit this enticing property in future work.
The results we obtained from ALGAMES are promising as they seem to let the vehicles share the responsibility for avoiding collisions, leading to seemingly natural trajectories where players are able to negotiate complex, interactive traffic scenarios that are challenging for traditional, nongametheoretic trajectory planners. For this reason, we believe that this solver could be a very efficient tool to generate trajectories in situations where the level of interaction between players is strong.
Acknowledgments
This work was supported in part by DARPA YFA award D18AP00064 and NSF NRI award 1830402.
Comments
There are no comments yet.