ALGAMES: A Fast Solver for Constrained Dynamic Games

10/22/2019
by   Simon Le Cleac'h, et al.
0

Dynamic games are an effective paradigm for dealing with the control of multiple interacting actors. Current algorithms for solving these problems either rely on Hamilton-Jacobi-Isaacs (HJI) methods, dynamic programming (DP), differential dynamic programming (DDP), or an iterative best response approach (IBR). The first two approaches have strong theoretical guarantees; however they becomes intractable in high-dimensional real-world applications. The third approach is grounded in the success of iLQR. It is scalable, but it cannot handle constraints. Finally, the iterative best response algorithm is a heuristic approach with unknown convergence properties, and it can suffer from stability and tractability issues. This paper introduces ALGAMES (Augmented Lagrangian GAME-theoretic Solver), a solver that handles trajectory optimization problems with multiple actors and general nonlinear state and input constraints. We evaluate our solver in the context of autonomous driving on scenarios involving numerous vehicles such as ramp merging, overtaking, and lane changing. We present simulation and timing results demonstrating the speed and the ability of the solver to produce efficient, safe, and natural autonomous behaviors.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 6

04/17/2021

ALGAMES: A Fast Augmented Lagrangian Solver for Constrained Dynamic Games

Dynamic games are an effective paradigm for dealing with the control of ...
09/15/2021

HM-DDP: A Hybrid Multiple-shooting Differential Dynamic Programming Method for Constrained Trajectory Optimization

Trajectory optimization has been used extensively in robotic systems. In...
05/08/2012

Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds

Approximate dynamic programming is a popular method for solving large Ma...
11/03/2020

Iterative Best Response for Multi-Body Asset-Guarding Games

We present a numerical approach to finding optimal trajectories for play...
12/01/2021

Neural Stochastic Dual Dynamic Programming

Stochastic dual dynamic programming (SDDP) is a state-of-the-art method ...
02/08/2019

A Differentiable Augmented Lagrangian Method for Bilevel Nonlinear Optimization

Many problems in modern robotics can be addressed by modeling them as bi...
07/30/2018

Reach-Avoid Problems via Sum-of-Squares Optimization and Dynamic Programming

Reach-avoid problems involve driving a system to a set of desirable conf...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Controlling a robot in an environment where it interacts with other actors is a complex task. Traditional approaches in the literature adopt a partitioned approach. First, predictions of other actors’ trajectories are computed, then they are fed into a trajectory optimizer that considers them as immutable obstacles. This approach is limiting because the effect of the robot’s trajectory on the other actors is ignored. Moreover, it can lead to the “freezing robot” problem that arises when the planner finds that all paths to the goal are unsafe [Trautman2010]. The consequence is that the robot stops moving or executes unnecessary anti-collision maneuvers.

For this reason, dealing with the game-theoretic aspect of the planning problem is a critical issue that has a broad range of applications. For instance, in autonomous driving, ramp merging, lane changing, intersection crossing, and overtaking maneuvers comprise some degree of game-theoretic interactions [Sadigh2016, Sadigh2016a, Fridovich-Keil2019a, Dreves2018, Fisac2019, Schmerling2018]. Other potential applications include mobile robots navigating in crowds, like package delivery robots, tour guides, or domestic robots. Also, robots interacting with people in factories such as mobile robots, or fixed-base multi-link manipulators. Finally, it is applicable in competitive settings, e.g. drone and car racing [Spica2018, Liniger2019].

In this work, we propose to solve constrained multi-player general-sum dynamic games. In dynamic games, the players’ strategies are sequences of decisions. It is important to notice that unlike traditional optimization problem, non-cooperative games have no “optimal” solution. Depending on the structure of the game, asymmetry between players, etc., one achieves different concepts of solutions. In this work, we search for Nash equilibrium solutions. This type of equilibrium models symmetry between the players; all players are treated equally. At such equilibria, no player can reduce his cost by unilaterally changing his strategy. For extensive details about the game-theory concepts addressed in this paper, we refer readers to the work of Bressan

[Bressan2010] and Basar et al. [Basar1999].

Fig. 1: Sequence of images depicting the trajectories obtained by solving for Nash equilibrium strategies. The images appear in chronological order from left to right. The vehicles achieve a ramp merging and a lane change while avoiding collision with other vehicles and respecting road boundaries.

Our solver is aimed at finding a Nash equilibrium for multi-player dynamic games, and can handle general nonlinear state and input constraints. This is particularly important for robotic applications, where the agents often interact through their desire to avoid collisions with one another or with the environment. Such interaction is most naturally, and most correctly, represented as (typically nonlinear) state constraints. This is a crucial point that sets game-theoretic methods for robotics apart from game-theoretic methods in other domains, such as economics, behavioral sciences, and robust control. In these domains, the agent interactions are traditionally represented in the objective functions themselves, and these games typically have no state or input constraints. In mathematical game theory, Nash equilibria with constraints are referred to as Generalized Nash Equilibria [Facchinei2007]. Hence, in this paper we present an augmented Lagrangian solver for finding Generalized Nash Equilibria specifically tailored to robotics applications.

Our solver assumes that players are rational agents acting to minimize their costs. This rational behavior is formulated using the first-order necessary condition for Nash equilibria, analogous to the Karush-Kuhn-Tucker (KKT) conditions in optimization. By relying on an augmented Lagrangian approach to robustly handle constraints, the solver is able to solve multi-player games with numerous agents and a high level of interactions at speeds approaching real-time. Our primary contributions are:

  1. A general solver for dynamic games aimed at identifying Generalized Nash Equilibrium strategies.

  2. Demonstration of the solver’s speed, robustness, and scalability in autonomous driving applications (Fig. 1).

Ii Related Work

Ii-a Equilibrium Selection

Recent work focused on solving multi-player dynamic games can be categorized by the type of equilibrium they select. Several works [Sadigh2016, Sadigh2016a, Liniger2019, Yoo2012] opted for the search of Stackelberg equilibria, which models an asymmetry of information between players. It is usually formulated for games with two players, a leader and a follower. The leader chooses his strategy first, then the follower selects the best response to the leader’s strategy. Alternatively, a Nash equilibrium does not introduce hierarchy between players; each player’s strategy is the best response to the other players’ strategies. As pointed out in [Fisac2019], searching for open-loop Stackelberg equilibrium strategies can fall flat on simple examples. In the context of autonomous driving, for instance, when players’ cost functions only depend on their own state and control trajectories, the solution becomes trivial. The leader can ignore mutual collision constraints and the follower has to adapt to this strategy. This behavior can be overly aggressive for the leader (or overly passive for the follower) and does not capture the game-theoretic nature of the problem.

Search for Nash equilibria has been investigated in [Fridovich-Keil2019a, Dreves2018, Spica2018, Britzelmeier2019]. We also take the approach of searching for Nash equilibria, as this type of equilibrium seems better suited to symmetric, multi-robot interaction scenarios. Indeed, we have observed more natural behavior emerging from Nash equilibria compared to Stackelberg when solving for open-loop strategies.

Ii-B Game-Theoretic Trajectory Optimization

Most of the algorithms proposed in the robotics literature to solve for game-theoretic equilibria can be grouped into three types. First, algorithms relying on a decomposition method such as Jacobi or Gauss-Siedel methods [Spica2018, Britzelmeier2019, Wang]. These algorithms are based on an iterative best response scheme. All the players take turns at improving their strategies, considering the other agents’ strategies as immutable. It is aimed at finding Nash equilibria. This type of approach is easy to interpret and handles games with numerous players well. However, convergence of these algorithms is not well understood [Facchinei2007], and special care is required to capture the game-theoretic nature of the problem [Spica2018]. Moreover, solving for a Nash equilibrium until convergence can require many iterations, each of which is a (possibly expensive) trajectory optimization problem. This can lead to prohibitively long solution times.

Second, there are a variety of algorithms based on dynamic programming. In [Fisac2019]

, a Markovian Stackelberg strategy is computed via dynamic programming. This approach seems to capture the game-theoretic nature of autonomous driving. However, dynamic programming suffers from the curse of dimensionality, and therefore relies on a simplified dynamics model coupled with a coarse discretization of the state and input space. To counterbalance these approximations, a lower-level planner informed by the state values under the Markovian Stackelberg strategy is run. This approach, which scales exponentially with the state dimension, has only been demonstrated in a two-player setting. Adding more players would prevent real-time application of this algorithm. Our proposed approach, on the contrary, scales reasonably with the number of players (see Fig.

4).

Finally, algorithms akin to differential dynamic programming have been developed for robust control [Morimoto2003] and later applied to game-theoretic problems [Fridovich-Keil2019a]. This approach scales polynomially with the number of players and is potentially fast enough to run real-time in a model predictive control (MPC) fashion. However, this approach does not handle constraints. Collision-avoidance constraints are typically handled using large penalties that can result in numerical ill-conditioning and a brittle solver. Moreover, it leads to a trade-off between trajectory efficiency and avoiding collisions with other players. This approach seems questionable in the autonomous driving context. Our approach, however, can enforce nonlinear state and input constraints in a rigorous way.

Ii-C Generalized Nash Equilibrium Problems

As mentioned above, we focus on finding Nash equilibria for multi-player games in which players are coupled through shared state constraints (such as collision-avoidance constraints). Therefore, these problems are instances of Generalized Nash Equilibrium Problems (GNEPs). The operations research field has a rich literature on GNEPs [Pang2005, Facchinei2006, Facchinei2009, Facchinei2009a, Fukushima2011]. Exact penalty methods have been proposed to solve GNEPs [Facchinei2006, Facchinei2009]. Complex constraints such as those that couple players’ strategies are handled using penalties. This allows solution of multi-player games jointly for all the players, while still being able to reason about complex constraints. However, these exact penalty methods require minimization of nonsmooth objective functions, which turns out to be slow in practice. In the same vein, a penalty approach relying on an augmented Lagrangian formulation of the problem has been advanced by Pang et al. [Pang2005]. This work, however, converts the augmented Lagrangian formulation to a set of KKT conditions, including complementarity constraints. The resulting constraint-satisfaction problem is solved with an off-the-shelf linear complementarity problem (LCP) solver that exploits the linearity of a specific problem. Our solver, on the contrary, is not tailored for a specific example and can solve general GNEPs. It draws inspiration from this augmented Lagrangian formulation, which does not introduce nonsmooth terms in the objective function so the solution can be found quickly. Moreover, this formulation avoids ill-conditioning, which makes our solver numerically robust.

Iii Problem Statement

Following the formalism of Facchinei [Facchinei2007], we consider the GNEP with players. Each player controls the variables . We denote by

the concatenated vector of the individual decision variables,

(1)

with dimension . By , we denote the vector of all the players’ decision variables except those of player . The cost function of each player is noted . It depends on player ’s variables as well as on all the other players’ variables . The goal of player is to select a strategy that minimizes his cost function , given the other players’ strategies . In addition, the strategy must belong to a set , and we express this constraint with a concatenated set of inequality constraints . Formally, [2] p^νJ^ν(p^ν, p^-ν), C^ν(p^ν, p^-ν) ≤0 . A solution of the GNEP (a generalized Nash equilibrium), is a vector such that for all , is a solution to (III) with the other players’ strategies fixed to . This means that at an equilibrium point , no player can decrease his cost by unilaterally changing his strategy to any other feasible point.

In the discretized trajectory optimization setting with time steps, we denote by the state size, the control input size, the state, and the control input of player at the time step . In this context, the decision variables of each player designate the primal variables associated with this player. They are the sequences of states and control inputs of player , i.e.

(2)

Thus, when solving for a generalized Nash equilibrium of the game , we identify open-loop Nash equilibrium trajectories, in the sense that the control signal is a function of time, not of the state variables of the players. However, one can repeatedly resolve the open-loop game as new information is obtained over time to obtain a policy that is closed-loop in the model-predictive control sense. The cost function encodes the objective of player . The concatenated set of constraints includes dynamics constraints and, in the context of autonomous driving, collision constraints coupled between players. This formulation is general enough to comprise multi-player general-sum dynamic games with nonlinear constraint on the states and control inputs.

Iv Augmented Lagrangian Formulation

We propose an algorithm to solve the previously defined GNEP in the context of trajectory optimization. We express the fact that players are acting optimally to minimize their cost functions under constraints as an equality. To do so, we first derive the augmented Lagrangian associated with (III) solved by each player. Then, we use the fact that, at an optimal point, the gradient of the augmented Lagrangian is null [Bertsekas2014]. Therefore, at a generalized Nash equilibrium point, the gradients of the augmented Lagrangians of all players must be null. This is a set of equality constraints that we solve using a quasi-Newton root-finding algorithm.

Iv-a Individual Optimality

First, without loss of generality, we suppose that the vector is actually the concatenated set of inequality and equality constraints, i.e. , where is the vector of inequality constraints and is the vector of equality constraints. To embed the notion that each player is acting optimally, we formulate the augmented Lagrangian associated with (III) for player . We denote by the Lagrange multipliers associated with the vector of constraints ; is a penalty weight vector.

(3)

Where is a diagonal matrix defined as,

(4)

where indicates the constraint. Given the appropriate Lagrange multipliers , the gradient of the augmented Lagrangian with respect to the individual primal variables is null at an optimal point of (III). The fact that player is acting optimally to minimize under the constraints can therefore be expressed as follows,

(5)

It is important to note that this equality constraint preserves coupling between players since the gradient depends on the other players’ strategies .

Iv-B Root-Finding Problem

At a generalized Nash equilibrium, all players are acting optimally. Therefore, to find an equilibrium point, we have to solve the following root-finding problem,

[2] p0,                G^ν(p^ν, p^-ν) &= 0, ∀ ν∈{1, …, M} .

We use Newton’s method to solve the root-finding problem. We denote by the concatenation of the augmented Lagrangian gradients of all players . We compute the first order derivative of with respect to all primal variables, . Newton’s method allows us to identify a search direction in the primal variables space,

(6)

We couple this search direction with a backtracking line-search [Nocedal2006] detailed in Algorithm 1 to ensure local convergence to a solution using Newton’s Method [Nocedal2006] presented in Algorithm 2.

1:procedure LineSearch()
2:     Parameters
3:     ,
4:     ,
5:     
6:     Until do
7:     
8:     return
Algorithm 1 Backtracking line-search
1:procedure Newton’sMethod()
2:     Until Convergence do
3:     
4:     
5:     
6:     
7:     
8:     return
Algorithm 2 Newton’s method for root-finding problem
1:procedure ALGAMES()
2:     Initialization
3:     
4:     
5:     
6:     Until Convergence do
7:     
8:     
9:     
10:     return
Algorithm 3 ALGAMES solver

Iv-C Augmented Lagrangian Updates

To obtain convergence of the Lagrange multipliers , we update them with a dual-ascent step. This update can be seen as shifting the value of the penalty terms into the Lagrange multiplier terms,

(7)

We also update the penalty weights according to an increasing schedule, with :

(8)

Iv-D Algames

By combining Newton’s method for finding the point where gradients of the augmented Lagrangians are null with the Lagrange multiplier and penalty updates, we obtain our solver ALGAMES (Augmented Lagrangian GAME-theoretic Solver) presented in Algorithm 3. The algorithm, which iteratively solves the GNEP, requires as inputs an initial guess for the primal variables and initial penalty weights . The algorithm outputs the primal variables containing the open-loop strategies of all players. Finding a Nash equilibrium is a non-convex problem in general. There is, therefore, no guarantee about convergence to the global optimum, and our algorithm requires a reasonable initial guess to converge.

V Simulations: Design and Setup

We choose to apply our algorithm in the autonomous driving context. Indeed, many maneuvers like lane changing, ramp merging, overtaking, and intersection crossing involve a high level of interaction between vehicles. Our game-theoretic planner could improve performance in these interactive tasks compared to traditional planners that do not consider coupled interactions among all the vehicles. We assume a single car is computing the trajectories for all cars in its neighborhood, so as to find its own trajectory to act safely among the group. In a real application, this computation would be repeated as frequently as possible in an MPC fashion.

V-a Autonomous Driving Problem

V-A1 Constraints

Each vehicle in the scene is an actor of the game. Our objective is to find a generalized Nash equilibrium trajectory for each vehicle. These trajectories have to be dynamically feasible. The dynamics constraints at time step are expressed as follows,

(9)

Although the solver is able to deal with nonlinear constraints arising from complex dynamics models, we consider only double-integrator dynamics. A vehicle state is composed of a 2D position and a 2D velocity. The control input is the 2D acceleration. The dynamics constraints can be expressed as,

(10)

In addition, it is critical that the trajectories respect collision-avoidance constraints. We model the collision zone of the vehicles as circles of radius . The collision constraints between vehicles are then simply expressed in terms of the position of each vehicle,

(11)

We also model boundaries of the road to force the vehicles to remain on the roadway. This means that the distance between the vehicle and the closest point, , on each boundary, , has to remain larger than the collision circle radius, ,

(12)

Finally, we enforce a final state constraint on a subset of the state dimensions. With this constraint we can enforce, for instance, a final velocity or a final position of the vehicle along a particular direction. In summary, based on reasonable simplifying assumptions, we have expressed the driving problem in terms of both linear individual constraints and non-convex coupled constraints.

V-A2 Cost Function

We use a quadratic cost function penalizing the use of control inputs and the distance between the current state and the desired final state of the trajectory,

(13)

This cost function only depends on the decision variables of vehicle . Players’ behaviors are coupled only through collision constraints. We could also add terms depending on other vehicles’ strategies, such as a congestion penalty.

V-B Driving Scenarios

We test our solver on three different driving scenarios involving strong interactions between vehicles:

V-B1 Ramp Merging

First, we set up a roadway with hard boundaries as pictured in Fig. 1 to demonstrate a ramp-merging maneuver. We position multiple vehicles on the roadway in a collision-free initial configuration. We choose a reasonable desired final state where the incoming vehicle has merged into the traffic. We purposefully place numerous players in a relatively confined space to maximize the level of interaction between players. Our objective is to generate generalized Nash equilibrium trajectories for all the vehicles. These trajectories are collision-free and cannot be improved unilaterally by any player.

V-B2 Lane Changing

The objective for each vehicle is to change lanes while avoiding collisions (Fig. 6). This situation is challenging because it involves a high level of negotiation [Schmerling2018] between drivers in a real-world setting, which results in strongly coupled trajectories.

V-B3 Overtaking

A fast vehicle is placed behind a slower one (Fig. 7). The faster vehicle performs an overtaking maneuver to maintain its desired speed.

Vi Simulations: Results

Vi-a Robustness, Speed, and Scalability

Vi-A1 Robustness

To get a better understanding of the algorithm, we plot the L1-norm of the concatenated gradients of the individual Lagrangians and the condition number of the second order derivative matrix in Fig. 2. It corresponds to the ramp merging experiment presented in Figure 5. The gradient curve, similar to a sawtooth wave, surges in value due to dual-ascent updates that impact the value of the Lagrangians. The root-finding algorithm gets the gradient to converge towards zero in few iterations. We observe that the condition number of incrementally increases after each penalty update. However, it remains in a reasonable range during the solve. This reasonable conditioning of the numerical problem, combined with the consistent behavior of the root-finding method, exhibits the robustness of the solver. We also plot constraint satisfaction in Fig. 3, where we observe a linear convergence of the maximum constraint violations.

Fig. 2: Behavior of the L1-norm of the gradient and the condition number of the matrix during the solve.
Fig. 3: Convergence of the constraint satisfaction during the solve for the 3-vehicle ramp merging example. The plotted constraints are collision avoidance between vehicles, collision avoidance with road boundaries, and dynamic feasibility. The end of the solve is triggered when all the constraints are satisfied up to a threshold .

Vi-A2 Scalability

Scalability of the algorithm to scenarios with more than two players is highly desirable. Indeed, driving problems like lane merging often involve 3 or 4 players. We solved for five-second trajectories ( time steps) while increasing number of actors from 2 to 8 on one core of an AMD Ryzen 2950x processor. Fig. 4 demonstrates near-real-time performance on scenarios with 2 and 3 players (2.5s and 6.1s respectively). The solving time increases reasonably with the number of players. Compared to other approaches that scale exponentially with the number of players [Fisac2019], our method is still tractable for up to 8 players (one-minute solve time for a 5s trajectory).

Fig. 4: Solving time and number of iterations required to reach the constraint satisfaction threshold vs. the number of players. All scenarios from 2 to 8 players are of comparable complexity. They include one ramp merging and multiple lane changes when the number of players is sufficient.

Vi-B Results From Driving Scenarios

Vi-B1 Ramp Merging

As pictured in Fig. 5, we observe that the merging vehicle in blue is squeezing between the other two vehicles. They are adapting their trajectories to let the blue vehicle merge smoothly. We also see that the blue vehicle is taking a trajectory that accommodates the other vehicles by squeezing against the ramp boundary represented by a black line on Fig. 5. The algorithm demonstrates near real-time performance as it takes 6.1s to solve the 5s trajectory on one core of an AMD Ryzen 2950x processor.

We then test the solver on a more complex problem with 5 vehicles. In addition to the 3 ramp-merging vehicles, we add 2 vehicles and one of them is performing a lane change. Fig. 1 presents the highly coupled trajectories that we obtain. We observe that the red and green vehicles are slightly nudging on their right to accommodate for the yellow vehicle overtaking them. This example demonstrates the robustness of the solver to complex multi-player interactions as well as its scalability. This 5s trajectory is solved in 25.1s.

Fig. 5: Ramp merging scenario: the blue vehicle successfully merges on the highway by squeezing between the other two vehicles.

Vi-B2 Lane Changing

In Fig. 6, we see the 5s trajectory computed in 2.3s in the lane changing scenario. The orange vehicle starts behind the blue one, but with a higher desired speed. To respect its desired speed while achieving lane change, it passes the blue vehicle before changing lanes.

Fig. 6: Lane changing scenario: the two vehicles successfully change lanes.

Vi-B3 Overtaking

The five-second overtaking trajectory computed in 4.2s is presented in Fig. 7. We simply place a vehicle with high speed behind a slower vehicle. The cost function is strongly penalizing deviation from the desired speed along the roadway axis and encouraging the vehicles to end their trajectories in the right lane. This is sufficient to trigger an overtaking maneuver by the faster vehicle. The slower vehicle slightly nudges towards the boundary of the road to accommodate the overtaking vehicle’s trajectory.

Fig. 7: Overtaking scenario: successful overtaking maneuver.

Vi-B4 Rich Autonomous Behavior

Our solver can handle cost functions that depend on the decision variables of all the players . However, we tested our solver with a cost function only dependent on the individual decision variables of each player . In this simple setting, the open-loop Stackelberg equilibrium is trivial [Sadigh2016, Sadigh2016a]. The leader chooses his strategy first, ignoring collision constraints. The follower then selects a strategy that has to account for the collision constraints considering the leader’s trajectory as immutable. On the contrary, our approach converges to trajectories that account for the individual objective of all players while sharing the responsibility of avoiding collisions, even with “egoistic” cost functions. The conclusion from these experiments is that solving for Nash equilibrium apparently produces natural-looking trajectories.

Vii Conclusions

We have introduced a new algorithm for finding Nash equilibrium trajectories. We demonstrated the speed and robustness of the solver on complex autonomous driving scenarios including nonlinear and non-convex constraints. We have shown near-real-time performance for up to 3 players. In a real-world driving application, replanning would be performed as frequently as possible to give a feedback policy in the sense of MPC. Parallelizing the computation of both the sparse matrix and the gradient should lead to large reductions in solution time, enabling true real-time performance in many realistic scenarios. Indeed, these computations are decomposable across the number of players and the number of time steps. We intend to exploit this enticing property in future work.

The results we obtained from ALGAMES are promising as they seem to let the vehicles share the responsibility for avoiding collisions, leading to seemingly natural trajectories where players are able to negotiate complex, interactive traffic scenarios that are challenging for traditional, non-game-theoretic trajectory planners. For this reason, we believe that this solver could be a very efficient tool to generate trajectories in situations where the level of interaction between players is strong.

Acknowledgments

This work was supported in part by DARPA YFA award D18AP00064 and NSF NRI award 1830402.

References