ALGAMES: A Fast Augmented Lagrangian Solver for Constrained Dynamic Games

Dynamic games are an effective paradigm for dealing with the control of multiple interacting actors. This paper introduces ALGAMES (Augmented Lagrangian GAME-theoretic Solver), a solver that handles trajectory-optimization problems with multiple actors and general nonlinear state and input constraints. Its novelty resides in satisfying the first-order optimality conditions with a quasi-Newton root-finding algorithm and rigorously enforcing constraints using an augmented Lagrangian method. We evaluate our solver in the context of autonomous driving on scenarios with a strong level of interactions between the vehicles. We assess the robustness of the solver using Monte Carlo simulations. It is able to reliably solve complex problems like ramp merging with three vehicles three times faster than a state-of-the-art DDP-based approach. A model-predictive control (MPC) implementation of the algorithm, running at more than 60 Hz, demonstrates ALGAMES' ability to mitigate the "frozen robot" problem on complex autonomous driving scenarios like merging onto a crowded highway.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 7

page 10

page 11

page 12

10/22/2019

ALGAMES: A Fast Solver for Constrained Dynamic Games

Dynamic games are an effective paradigm for dealing with the control of ...
12/14/2021

Interaction-Aware Trajectory Prediction and Planning for Autonomous Vehicles in Forced Merge Scenarios

Merging is, in general, a challenging task for both human drivers and au...
02/08/2019

A Differentiable Augmented Lagrangian Method for Bilevel Nonlinear Optimization

Many problems in modern robotics can be addressed by modeling them as bi...
05/09/2021

NMPC trajectory planner for urban autonomous driving

This paper presents a trajectory planner for autonomous driving based on...
11/16/2020

LUCIDGames: Online Unscented Inverse Dynamic Games for Adaptive Trajectory Prediction and Planning

Existing game-theoretic planning methods assume that the robot knows the...
01/15/2021

Constraint Handling in Continuous-Time DDP-Based Model Predictive Control

The Sequential Linear Quadratic (SLQ) algorithm is a continuous-time var...
01/06/2021

A Julia implementation of Algorithm NCL for constrained optimization

Algorithm NCL is designed for general smooth optimization problems where...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Controlling a robot in an environment where it interacts with other agents is a complex task. Traditional approaches in the literature adopt a predict-then-plan architecture. First, predictions of other agents’ trajectories are computed, then they are fed into a planner that considers them as immutable obstacles. This approach is limiting because the effect of the robot’s trajectory on the other agents is ignored. Moreover, it can lead to the “frozen robot” problem that arises when the planner finds that all paths to the goal are unsafe Trautman2010. It is, therefore, crucial for a robot to simultaneously predict the trajectories of other vehicles on the road while planning its own trajectory, in order to capture the reactive nature of all the agents in the scene. ALGAMES provides such a joint trajectory predictor and planner by considering all agents as players in a Nash-style dynamic game. We envision ALGAMES being run on-line by a robot in a receding-horizon loop, at each iteration planning a trajectory for the robot by explicitly accounting for the reactive nature of all agents in its vicinity.

Joint trajectory prediction and planning in scenarios with multiple interacting agents is well-described by a dynamic game. Dealing with the game-theoretic aspect of multi-agent planning problems is a critical issue that has a broad range of applications. For instance, in autonomous driving, ramp merging, lane changing, intersection crossing, and overtaking maneuvers all comprise some degree of game-theoretic interactions Sadigh2016, Sadigh2016a, Fridovich-Keil2020a, Dreves2018, Fisac2019, Schmerling2018. Other potential applications include mobile robots navigating in crowds, like package-delivery robots, tour guides, or domestic robots; robots interacting with people in factories, such as mobile robots or fixed-base multi-link manipulators; and competitive settings like drone and car racing Spica2018, Liniger2019.

In this work, we seek solutions to constrained multi-player dynamic games. In dynamic games, the players’ strategies are sequences of decisions. It is important to notice that, unlike traditional optimization problems, non-cooperative ga-mes have no “optimal” solution. Depending on the structure of the game, asymmetry between players, etc., different concepts of solutions are possible. In this work, we search for Nash-equilibrium solutions. This type of equilibrium models symmetry between the players; All players are treated equally. At such equilibria, no player can reduce its cost by unilaterally changing its strategy. For extensive details about the game-theory concepts addressed in this paper, we refer readers to the work of Basar et al.

Basar1999.

Our solver is aimed at finding a Nash equilibrium for multi-player dynamic games, and can handle general nonlinear state and input constraints. This is particularly important for robotic applications, where the agents often interact through their desire to avoid collisions with one another or with the environment. Such interactions are most naturally represented as (typically nonlinear) state constraints. This is a crucial feature that sets game-theoretic methods for robotics apart from game-theoretic methods in other domains, such as economics, behavioral sciences, and robust control. In these domains, the agent interactions are traditionally represented in the objective functions themselves, and these games typically have no state or input constraints. In mathematics literature, Nash equilibria with constraints are referred to as Generalized Nash Equilibria Facchinei2007. Hence, in this paper we present an augmented Lagrangian solver for finding Generalized Nash Equilibria specifically tailored to robotics applications.

Our solver assumes that players are rational agents acting to minimize their costs. This rational behavior is formulated using the first-order necessary conditions for Nash equilibria, analogous to the Karush-Kuhn-Tucker (KKT) conditions in optimization. By relying on an augmented Lagrangian approach to handle constraints, the solver is able to solve multi-player games with several agents and a high level of interactions at real-time speeds. Finding a Nash equilibrium for 3 autonomous cars in a freeway merging scenario takes ms. Our primary contributions are:

  1. A general solver for dynamic games aimed at identifying Generalized Nash Equilibrium strategies.

  2. A real-time MPC implementation of the solver mitigating the “frozen robot” problem that arises in complex driving scenarios for non-game-theoretic MPC approach. (Fig. 1).

  3. An analysis of the non-uniqueness of Nash equilibria in driving scenarios with constraints and an assessment of the practical impact it has on players’ coordination.

  4. A comparison with iLQGames Fridovich-Keil2020a. ALGAMES finds Nash equilibria 3 times faster than iLQGames for a fixed constraint satisfaction criterion.

2 Related Work

2.1 Equilibrium Selection

Recent work focused on solving multi-player dynamic games can be categorized by the type of equilibrium they select. Several works Sadigh2016, Sadigh2016a, Liniger2019, Yoo2012 have opted to search for Stackelberg equilibria, which model an asymmetry of information between players. These approaches are usually formulated for games with two players: a leader and a follower. The leader chooses its strategy first, then the follower selects the best response to the leader’s strategy. Alternatively, a Nash equilibrium does not introduce hierarchy between players; each player’s strategy is the best response to the other players’ strategies. As pointed out in Fisac2019, searching for open-loop Stackelberg equilibrium strategies can fail on simple examples. In the context of autonomous driving, for instance, when players’ cost functions only depend on their own state and control trajectories, the solution becomes trivial. The leader ignores mutual collision constraints and the follower has to adapt to this strategy. This behavior can be overly aggressive for the leader (or overly passive for the follower) and does not capture the game-theoretic nature of the problem.

Nash equilibria have been investigated in Fridovich-Keil2020a, Dreves2018, Spica2018, Britzelmeier2019, Di2018, Di2020, Di2020a. We also take the approach of searching for Nash equilibria, as this type of equilibrium seems better suited to symmetric, multi-robot interaction scenarios. Indeed, we have observed more natural behavior emerging from Nash equilibria compared to Stackelberg when solving for open-loop strategies.

2.2 Game-Theoretic Trajectory Optimization

Most of the algorithms proposed in the robotics literature to solve for game-theoretic equilibria can be grouped into four types: First are algorithms aimed at finding Nash equilibria that rely on decomposition, such as Jacobi or Gauss-Siedel methods Spica2018, Britzelmeier2019, Wang2019a. These algorithms are based on an iterative best-response scheme in which players take turns at improving their strategies considering the other agents’ strategies as immutable. This type of approach is easy to interpret and scales reasonably well with the number of players. However, convergence of these algorithms is not well understood Facchinei2007, and special care is required to capture the game-theoretic nature of the problem Spica2018. Moreover, solving for a Nash equilibrium until convergence can require many iterations, each of which is a (possibly expensive) trajectory-optimization problem. This can lead to prohibitively long solution times.

Second, there are a variety of algorithms based on dynamic programming. In Fisac2019

, a Markovian Stackelberg strategy is computed via dynamic programming. This approach seems to capture the game-theoretic nature of autonomous driving. However, dynamic programming suffers from the curse of dimensionality and, therefore, practical implementations rely on simplified dynamics models coupled with coarse discretization of the state and input spaces. To counterbalance these approximations, a lower-level planner informed by the state values under the Markovian Stackelberg strategy is run. This approach, which scales exponentially with the state dimension, has been demonstrated in a two-player setting. Adding more players is likely to prevent real-time application of this algorithm. In contrast, our proposed approach scales polynomially with the number of players (see Section

4.5).

Third, algorithms akin to differential dynamic programming have been developed for robust control Morimoto2003 and later applied to game-theoretic problems Fridovich-Keil2020a, Di2018. This approach scales polynomially with the number of players and is fast enough to run real-time in a MPC fashion Fridovich-Keil2020a. However, contrary to ALGAMES, this type of approach does not natively handle constraints. Collision-avoidance constraints are typically handled using large penalties that can result in numerical ill-conditioning which, in turn, can impact the robustness or the convergence rate of the solver. Moreover, it leads to a trade-off between trajectory efficiency and avoiding collisions with other players.

Finally, algorithms that are analogous to direct methods in trajectory optimization have also been developed Di2020, Di2020a. An algorithm based on a first-order splitting method that is known to have a linear convergence rate was proposed by Di et al. Di2020a. Di’s experiments show convergence of the algorithm after typically to iterations. A different approach based on Newton’s method has been proposed Di2020, but it is restricted to unconstrained dynamic games. Our solver belongs to this family of approaches. It also relies on a second-order Newton-type method, but it is able to handle general state and control input constraints. In addition, we demonstrate convergence on relatively complex problems in typically less than iterations.

2.3 Generalized Nash Equilibrium Problems

We focus on finding Nash equilibria for multi-player games in which players are coupled through shared state constraints (such as collision-avoidance). Therefore, these problems are instances of Generalized Nash Equilibrium Problems (GNEPs). The operations research field has a rich literature on GNEPs Pang2005, Facchinei2006, Facchinei2009, Facchinei2010, Fukushima2011. Exact penalty methods have been proposed to solve GNEPs Facchinei2006, Facchinei2009. Complex constraints such as those that couple players’ strategies are handled using penalties, allowing solution of multi-player games jointly for all the players. However, these exact penalty methods require minimization of nonsmooth objective functions, which leads to slow convergence rates in practice.

In the same vein, a penalty approach relying on an augmented Lagrangian formulation of the problem has been advanced by Pang et al. Pang2005. This work, however, converts the augmented Lagrangian formulation to a set of KKT conditions, including complementarity constraints. The resulting constraint-satisfaction problem is solved with an off-the-shelf linear complementarity problem (LCP) solver that exploits the linearity of a specific problem. Our solver, in contrast, is not tailored for a specific example and can solve general GNEPs. It draws inspiration from the augmented Lagrangian formulation, which does not introduce nonsmooth terms in the objective function, enabling fast convergence. Moreover, this formulation avoids ill-conditioning, which improves the numerical robustness of our solver.

3 Problem Statement

In the discretized trajectory-optimization setting with time steps, we denote by the state size, the control-input size, the state, and the control input of player at the time step . In formulating the game, we do not distinguish between the robot carrying out the computation, and the other agents whose trajectories it is predicting. All agents are modeled equivalently, as is typical in the case of Nash-style games.

Following the formalism of Facchinei Facchinei2007, we consider the GNEP with players. Each player decides over its control input variables . This is player ’s strategy where denotes the dimension of the control inputs controlled by player and is the dimension of the whole trajectory of player ’s control inputs. By

, we denote the vector of all the players’ strategies except the one of player

. Additionally, we define the trajectory of state variables where , which results from applying all the control inputs decided by the players to a joint dynamical system,

(1)

with denoting the time-step index. The kinodynamic constraints over the whole trajectory can be expressed with equality constraints,

(2)

The cost function of each player is noted . It depends on player ’s control inputs as well as on the state trajectory , which is shared with all the other players. The goal of player is to select a strategy and a state trajectory that minimizes the cost function . Naturally, the choice of state trajectory is constrained by the other players’ strategies and the dynamics of the system via Equation 2. In addition, the strategy must respect a set of constraints that depends on the state trajectory as well as on the other players strategies (e.g., collision-avoidance constraints). We express this with a concatenated set of inequality constraints . Formally, [2] X, U^νJ^ν(X, U^ν), D(X, U) = 0 C(X, U) ≤0 . The set of Problems (3), forms a GNEP because of the constraints that couple the strategies of all the players. A solution of this GNEP (a generalized Nash equilibrium), is a vector such that, for all , is a solution to (3) with the other players’ strategies fixed to . This means that at an equilibrium point , no player can decrease their cost by unilaterally changing their strategy to any other feasible point.

When solving for a generalized Nash equilibrium of the game, , we identify open-loop Nash equilibrium trajectories, in the sense that the whole trajectory is the best response to the other players’ strategies given the initial state of the system . Thus, the control signal is a function of time, not of the current state of the system111One might also explore solving for feedback Nash equilibria, where the strategies are functions of the state of all agents. This is an interesting direction for future work. . However, one can repeatedly resolve the open-loop game as new information is obtained over time to obtain a policy that is closed-loop in the model-predictive control sense, as demonstrated in Section 7. This formulation is general enough to comprise multi-player dynamic games with nonlinear constraints on the states and control inputs. Practically, in the context of autonomous driving, the cost function encodes the objective of player , while the concatenated set of constraints, , includes collision constraints coupled between players. We assume differentiability of the constraints and twice differentiability of the cost functions.

4 Augmented Lagrangian Formulation

We propose an algorithm to solve the previously defined GNEP in the context of trajectory optimization. We express the condition that players are acting optimally to minimize their cost functions subject to constraints as an equality constraint. To do so, we first derive the augmented Lagrangian associated with Problem (3) solved by each player. Then, we use the fact that, at an optimal point, the gradient of the augmented Lagrangian is null Bertsekas2014. Therefore, at a generalized Nash equilibrium point, the gradients of the augmented Lagrangians of all players must be null. Concatenating this set of equality constraints with the dynamics equality constraints, we obtain a set of equations that we solve using a quasi-Newton root-finding algorithm.

4.1 Individual Optimality

First, without loss of generality, we suppose that the vector is actually the concatenated set of inequality and equality constraints, i.e., , where is the vector of inequality constraints and is the vector of equality constraints. To embed the notion that each player is acting optimally, we formulate the augmented Lagrangian associated with Problem (3) for player . The dynamics constraints are handled with the Lagrange multiplier term , while the other constraints are dealt with using both a multiplier and a quadratic penalty term specific to the augmented Lagrangian formulation. As a motivation for this differential treatment; one typically handles inequality and highly nonlinear equality constraints with an augmented Lagrangian formulation for its improved robustness. We denote by the Lagrange multipliers associated with the vector of constraints ; is a penalty weight;

(3)

is a diagonal matrix defined as,

(4)

where indicates the constraint. It is important to notice that the Lagrange multipliers associated with the dynamics constraints are specific to each player , but the Lagrange multipliers and penalties and are common to all players. Given the appropriate Lagrange multipliers and , the gradient of the augmented Lagrangian with respect to the individual decision variables is null at an optimal point of Problem (3). The fact that player is acting optimally to minimize under the constraints and can therefore be expressed as follows,

(5)

It is important to note that this equality constraint preserves coupling between players since the gradient depends on the other players’ strategies .

4.2 Root-Finding Problem

At a generalized Nash equilibrium, all players are acting optimally and the dynamics constraints are respected. Therefore, to find an equilibrium point, we have to solve the following root-finding problem,

[2] X, U, μ0,                G^ν(X, U, μ^ν) = 0, ∀ ν∈{1, …, M} D(X,U) = 0 , We use Newton’s method to solve the root-finding problem. We denote by the concatenation of the augmented Lagrangian gradients of all players and the dynamics constraints, , where . We compute the first-order deri-vative of with respect to the primal variables and the dual variables that we concatenate in a single vector ,

(6)

Newton’s method allows us to identify a search direction in the primal-dual space,

(7)

We couple this search direction with a backtracking line-search Nocedal2006 given in Algorithm 1 to ensure local convergence to a solution using Newton’s Method Nocedal2006 detailed in Algorithm 2.

1:procedure LineSearch()
2:     Parameters
3:     ,
4:     ,
5:     
6:     Until do
7:     
8:     return
Algorithm 1 Backtracking line-search
1:procedure Newton’sMethod()
2:     Until Convergence do
3:     
4:     
5:     
6:     
7:     
8:     return
Algorithm 2 Newton’s method for root-finding problem
1:procedure ALGAMES()
2:     Initialization
3:     
4:     
5:     
6:     
7:     Until Convergence do
8:     
9:      Eq. 8
10:      Eq. 9
11:     return
Algorithm 3 ALGAMES solver

4.3 Augmented Lagrangian Updates

To obtain convergence of the Lagrange multipliers , we update them with a dual-ascent step. This update can be seen as shifting the value of the penalty terms into the Lagrange multiplier terms,

(8)

We also update the penalty weights according to an increasing schedule, with :

(9)

4.4 Algames

By combining Newton’s method for finding the point where the dynamics is respected and the gradients of the augmented Lagrangians are null with the Lagrange multiplier and penalty updates, we obtain our solver ALGAMES (Augmented Lagrangian GAME-theoretic Solver) presented in Algorithm 3. The algorithm, which iteratively solves the GNEP, requires as inputs an initial guess for the primal-dual variables and initial penalty weights . The algorithm outputs the open-loop strategies of all players and the Lagrange multipliers associated with the dynamics constraints .

4.5 Algorithm Complexity

Following a quasi-Newton approximation of the matrix Nocedal2006, we neglect some of the second-order derivative terms associated with the constraints. Therefore, the most expensive part of the algorithm is the Newton step defined by Equation 7. By exploiting the sparsity pattern of the matrix , we can solve Equation 7 using a back-substitution scheme akin to solving a Riccati equation with complexity . The complexity is cubic in the number of states and the number of control inputs , which are typically linear in the number of players . Therefore, the overall complexity of the algorithm is .

4.6 Algorithm Discussion

Here we discuss the inherent difficulty in solving for Nash equilibria in large problems, and explain some of the limitations of our approach. First of all, finding a Nash equilibrium is a non-convex problem in general. Indeed, it is known that even for single-shot discrete games, solving for exact Nash equilibria is computationally intractable for a large number of players DaskalakisEtAlSIAMJournalonComputing08ComplexityOfNash. It is, therefore, not surprising that, in our more difficult setting of a dynamic game in continuous space, no guarantees can be provided about finding an exact Nash equilibrium. Furthermore, in complex interaction spaces, constraints can be highly nonlinear and nonconvex. This is the case in the autonomous driving context with collision-avoidance constraints. In this setting, one cannot expect to find an algorithm working in polynomial time with guaranteed convergence to a Nash equilibrium respecting constraints. On the other hand, local convergence of Newton’s method to open-loop Nash equilibria has been established in the unconstrained case (that is, starting sufficiently close to the equilibrium, the algorithm will converge to it) Di2020. Our approach solves a sequence of unconstrained problems via the augmented Lagrangian formulation. Each of these problems, therefore, has guaranteed local convergence. However, as expected, the overall method has no guarantee of global convergence to a generalized Nash equilibrium.

Second, our algorithm requires an initial guess for the state and control input trajectories , and the dynamics multipliers . Empirically, we observe that choosing and simply rolling out the dynamics starting from the initial state without any control was a sufficiently good initial guess to get convergence to a local optimum that respects both the constraints and the first-order optimality conditions. For a detailed empirical study of the convergence of ALGAMES and its failure cases, we refer to Sections 5.5 and 5.6.

Finally, even for simple linear-quadratic games, the Nash equilibrium solution is not necessarily unique. In general, an entire subspace of equilibria exists. In this case, the matrix in Equation 7 will be singular. In practice, we regularize this matrix so that large steps

are penalized, resulting in an invertible matrix

.

5 Simulations: Design and Setup

We choose to apply our algorithm in the autonomous driving context. Indeed, many maneuvers like lane changing, ramp merging, overtaking, and intersection crossing involve a high level of interaction between vehicles. We assume a single car is computing the trajectories for all cars in its neighborhood, so as to find its own trajectory to act safely among the group. We assume that this car has access to a relatively good estimate of the surrounding cars’ objective functions. Such an estimate could, in principle, be obtained by applying inverse optimal control on observed trajectories of the surrounding cars.

In a real application, the car would compute its strategy as frequently as possible in a receding-horizon loop to adapt to unforeseen changes in the environment. We demonstrate the feasibility of this approach on complex driving scenarios where a classical predict-then-plan architecture fails to overcome the “frozen robot” problem.

5.1 Autonomous Driving Problem

Constraints

Each vehicle in the scene is an agent of the game. Our objective is to find a generalized Nash equilibrium trajectory for all of the vehicles. These trajectories have to be dynamically feasible. The dynamics constraints at time step are expressed as follows,

(10)

We consider a nonlinear unicycle model for the dynamics of each vehicle. A vehicle state, , is composed of a 2D position, a heading angle and a scalar velocity. The control input is composed of an angular velocity and a scalar acceleration. In addition, it is critical that the trajectories respect collision-avoidance constraints. We model the collision zone of the vehicles as circles of radius . The collision constraints between vehicles are then simply expressed in terms of the position of each vehicle,

(11)

We also model boundaries of the road to force the vehicles to remain on the roadway. This means that the distance between the vehicle and the closest point, , on each boundary, , has to remain larger than the collision-circle radius, ,

(12)

In summary, based on reasonable simplifying assumptions, we have expressed the driving problem in terms of non-convex and non-linear coupled constraints.

Cost Function

We use a quadratic cost function penalizing the use of control inputs and the distance between the current state and the desired final state of the trajectory. We also add a quadratic penalty on being close to other cars,

(13)

controls the distance at which this penalty is “activated”, and controls its magnitude.

5.2 Comparison to iLQGames

In order to evaluate the merits of ALGAMES, we compare it to iLQGames Fridovich-Keil2020a which is a DDP-based algorithm for solving general dynamic games. Both algorithms solve the problem by iteratively solving linear-quadratic approximations that have an analytical solution Basar1999. For iLQGames, the augmented objective function differs from the objective function, , by a quadratic term penalizing constraint violations,

(14)

Where is defined by,

(15)

Here

is an optimization hyperparameter that we can tune to satisfy constraints. For ALGAMES, the augmented objective function,

, is actually an augmented Lagrangian, see Equation 3. The hyperparameters for ALGAMES are the initial value of and its increase rate defined in Equation 9.

Figure 2: Two driving environments are considered: a ramp merging scenario (top) and an intersection crossing scenario (bottom).
Figure 3: We compare ALGAMES and iLQGames, both in terms of solve time (top) and ability to reliably enforce constraints (middle). Additionally, we evaluate the update frequency of ALGAMES in the MPC setting (bottom).

5.3 Timing Experiments

We evaluate the performance of both algorithms in two scenarios (see Figure 2) with the number of players varying from two to four. To compare the speed of both algorithms, we set the termination criterion as a threshold on constraint violations . The timing results averaged over 100 samples are presented in Table 3.a. First, we notice that both algorithms achieve real-time or near-real-time performance on complex autonomous driving scenarios (the horizon of the solvers is fixed to ).

We observe that the speed performance of ALGAMES and iLQGames are comparable in the ramp merging scenario. For this scenario, we tuned the value of the penalty for iLQGames to . Notice that for all scenarios the dimensions of the problem are scaled so that the velocities and displacements are all the same order of magnitude. For the intersection scenario, we observe that the two-player and four-player cases both have much higher solve times for iLQGames compared to the 3-player case. Indeed, in those two cases, we had to increase the penalty to , otherwise the iLQGames would plateau and never reach the constraint satisfaction criterion. This, in turn, slowed the algorithm down by decreasing the constraint violation convergence rate.

5.4 Discussion

The main takeaway from these experiments is that, for a given scenario, it is generally possible to find a suitable value for that will ensure the convergence of iLQGames to constraint satisfaction. With higher values for , we can reach better constraint satisfaction at the expense of slower convergence rate. In the context of a receding horizon implementation (MPC), finding a good choice of that would suit the whole sequence of scenarios encountered by a vehicle could be difficult. In contrast, the same hyperparameters and were used in ALGAMES for all the experiments across this paper. This supports the idea that, thanks to its adaptive penalty scheme, ALGAMES requires little tuning.

While performing the timing experiments, we also noticed several instances of oscillatory behavior for iLQGames. The solution would oscillate, preventing it from converging. This happened even after an adaptive regularization scheme was implemented to regularize iLQGames’ Riccati backward passes. Oscillatory behavior was not seen with ALGAMES. We hypothesize that this is due to the dual ascent update coupled with the penalty logic detailed in Equations 8 and 4, which add hysteresis to the solver.

5.5 Monte Carlo Analysis

To evaluate the robustness of ALGAMES, we performed a Monte Carlo analysis of its performance on a ramp merging problem. First, we set up a roadway with hard boundaries as pictured in Fig. 2.a. We position two vehicles on the roadway and one on the ramp in a collision-free initial configuration. We choose a desired final state where the incoming vehicle has merged into the traffic. Our objective is to generate generalized Nash equilibrium trajectories for the three vehicles. These trajectories are collision-free and cannot be improved unilaterally by any player. To introduce randomness in the solving process, we apply a random perturbation to the initial state of the problem. Specifically, we perturb by adding a uniformly sampled noise. This would typically correspond to displacing the initial position of the vehicles by , changing their initial velocity by and their heading by .

We observe in Figure 3.b, that ALGAMES consistently finds a satisfactory solution to the problem using the same hyperparameters and . Out of the 1000 samples converged to constraint satisfaction while respecting the optimality criterion . By definition, is a merit function for satisfying optimality and dynamics constraints. We also observe that the solver converges to a solution in less than for of the samples. These empirical data tend to support the fact that ALGAMES is able to solve the class of ramp merging problem quickly and reliably.

For comparison, we present in Figure 3.b the results obtained with iLQGames. We apply the same constraint satisfaction criterion . We fixed the value of the penalty hyperparameter for all the samples as it would not be a fair comparison to tune it for each sample. Only 3 samples did not converge with iLQGames, this is a performance comparable to ALGAMES for which 5 samples failed to converge. However, we observe that iLQGames is 3 times slower than ALGAMES with an average solve time of ms compared to ms and require on average 4 times more iterations (9 against 41).

5.6 Solver Failure Cases

The Monte Carlo analysis allows us to identify the typical failure cases of our solver, i.e. the cases where the solver does not satisfy the constraints or the optimality criterion. Typically in such cases, the initial guess, which consists of rolling out the dynamics with no control, is far from a reasonable solution. Since the constraints are ignored during this initial rollout, the car at the back can overtake the car at the front by driving through it. This creates an initial guess where constraints are strongly violated. Moreover, we hypothesize that the tight roadway-boundary constraints tend to strongly penalize solutions that would ’disentangle’ the car trajectories because they would require large boundary violation at first. Therefore, the solver gets stuck in this local optimum where cars overlap each other. Sampling several initial guesses with random initial control inputs and solving in parallel could reduce the occurrence of these failure cases. Also, being able to detect, reject, and re-sample initial guesses when the initial car trajectories are strongly entangled could also improve the robustness of the solver.

6 Non-Uniqueness of Nash Equilibria

A Nash equilibrium corresponds to a situation where all players are acting optimally given the other players’ strategies. This is a way for players to compete in a coordinated fashion without communication. However, if the Nash equilibrium is non-unique, the coordination is ambiguous and players have to decide individually which Nash equilibrium to follow. This can lead to inconsistencies. The non-unique-ness of Nash equilibrium solutions has been observed in practical robotics applications such as autonomous driving Peters2020. Peters et al. identified isolated clusters of solutions in unconstrained Nash equilibrium problems and proposed an estimation method to improve players’ coordination. In this section, we detail several underlying causes of non-uniqueness that arise in practical robotics scenarios. Additionally, we present the behavior of ALGAMES in such circumstances.

6.1 Linear-Quadratic Dynamic Games

Linear-quadratic (LQ) dynamic games are an important building block for optimization algorithms relying on sequential-quadratic approximations such as ALGAMES or iLQGames Fridovich-Keil2020a. The conditions for the existence and uniqueness of a Nash equilibrium have been extensively studied Basar1976, Abraham2019. In the continuous-time setting, Eisele characterized the different solution regimes for the LQ game, including non-existence and non-uniqueness Eisele1982. In the discrete-time setting, the open-loop Nash equilibrium problem is equivalent to a static quadratic game (i.e., a one-step game). For such problems, the Nash equilibrium solutions can either be non-existant, can form an affine subspace, or be a single point in the case of a unique solution.

Proof Sketch

: We focus on the two-player case, the result can easily be extended to the -player case. We denote, and , the strategy and quadratic cost function of player ,

(16)

The first-order necessary conditions for optimality of a Nash equilibrium, , can be written as an affine equation,

(17)

The second-order necessary conditions are independent of the Nash equilibrium point considered. They require positive semi-definiteness of the matrices , for all . Therefore, any point, , in the affine subspace defined by Equation 17, will respect both the first-order and second-order necessary conditions for optimality.

In case of a unique Nash Equilibrium, ALGAMES converges in one Newton iteration to the solution. When the Nash equilibrium solutions form an affine subspace, ALGAMES converges to the point in the subspace closest to the initial guess. This is due to the regularization added to the Jacobian of the KKT condition, , defined in Equation 6.

6.2 Isolated Nash Equilibria

We have seen that an LQ game can generate an affine Nash equilibrium subspace. Thus, it cannot lead to multiple isolated Nash equilibria. However, in general, a dynamic game can admit multiple isolated Nash equilibria as highlighted by Peters et al. Peters2020. In unconstrained autonomous driving scenarios, they generally appear when collision-avoidance costs are introduced. These costs are non-convex and introduce a coupling between the players’ strategies. Typically, these isolated Nash equilibria correspond to “topologically” different driving strategies. For instance, in a ramp merging scenario, the merging vehicle can merge in front of or behind an incoming vehicle (Figure 4.a). The equilibrium point to which ALGAMES converges is typically the closest to the initial guess, thanks to the regularization scheme. This is a desirable property, especially in the MPC setting , because it prevents the re-planned trajectory from oscillating between different Nash equilibria.

Figure 4: We illustrate isolated Nash equilibria (top), and non-isolated Nash equilibrium solutions stemming from an underdetermined KKT system (bottom).

6.3 Generalized Nash Equilibrium

Identifying the solution set of a GNEP remains a major challenge as pointed out by Fisher et al. in an extensive survey Fischer2014. In general, the solution set of the GNEP can be constituted of one or many isolated points or even non-isolated points. Theoretical results in this domain often rely on strong assumptions, such as convexity of the feasible set, absence of shared constraints, or decoupled cost functions Dreves2018. All these assumptions could be violated in a typical robotic scenario. Indeed, collision-avoidance constraints are shared and non-convex. Similarly, collision-avoidance costs or congestion terms introduce coupling between the players’ costs.

We explore the structure of the generalized Nash equilibrium (GNE) solutions in the presence of shared collision-avoidance constraints. We denote, , the collision-avoidance constraint between player and player at time step . For each collision-avoidance constraint , we introduce two Lagrange multipliers and ; one for each player. We remark that, for a single constraint, we add two Lagrange multipliers. We denote, , the number of collision-avoidance constraints. By concatenating these constraints with the residual vector , we add entries and rows to its Jacobian. We denote and the “augmented” residual vector and Jacobian matrix. We need to differentiate the residual , with respect to the Lagrange multipliers associated with the collision constraints. This adds columns to the Jacobian . Thus, the Jacobian is an underdetermined linear system, with more columns than rows (Figure 4.b). However, only active collision-avoidance constraints should be included in the Jacobian. Therefore, the Jacobian only has more columns than rows, where denotes the number of active collision-avoidance constraints. Thus, the nullspace of the underdetermined linear system is at least of dimension (Figure 4.b). Consequently, the solution set of a GNEP can potentially be composed of non-isolated points and could span in multiple dimensions locally around a known equilibrium point.

We explore this nullspace at a Nash equilibrium point by slightly disturbing the current generalized Nash equilibrium in one of the nullspace’s directions (Figure 5.a). We obtain a continuum of GNE. Additionally, Figures 5.b and 5.c, present the two main directions in which the solution can drift while remaining a GNE. The nullspace was of dimension

, which corresponds to the number of active constraints at the equilibrium point. Yet, we notice that most of the trajectory variability is captured by a limited number of eigenvectors. We remark that the two principal eigenvectors have an elegant interpretation: they both favor one vehicle over the others. Additionally, they nicely show how disturbing the trajectory of one player influences the trajectories of the other players through the collision constraints. Finally, by combining these two eigenvectors and stepping in the resulting direction, one could favor any of the three vehicles.

Figure 5: We explore the generalized equilibrium subspace in one direction and obtain a sequence of GNE (top). Additionally, we represent the two principal vectors along which the trajectories can evolve while remaining a GNE (middle & bottom).

6.4 ALGAMES’ Convergence to Normalized Nash Equilibrium

We observe that the multipliers and associated with the shared constraint, , are equal at every iteration of the solver. Indeed, we can assume that the multipliers are initialized with the same value (typically zero). Moveover, these multipliers are updated with identical dual ascent updates, defined in Equation 8,

(18)
(19)
(20)
(21)

Here, denotes the iteration index. A consequence of this trivial recursion is that the multipliers associated with the same shared constraint are equal at the solution. Therefore, if ALGAMES converges, it converges to a Normalized Nash Equilibrium (NNE) in the sense of Rosen Rosen1965. An NNE is a GNE with the additional requirement that the multipliers associated with shared constraints are equal. This reasoning was applied by Dreves to a potential reduction method Dreves2011. We transcribe it in the augmented Lagrangian context. At an NNE, because the multipliers are equal, the price to pay for violating the collision-avoidance constraint is the same for both players. This can be interpreted as enforcing a notion of “fairness” between the players in addition to optimality. One interesting characteristic of NNE, compared to GNE, is that they are not subject to the nullspace issue described in Section 6.3. Indeed, thanks to the additional constraints enforcing equality between multipliers, active constraints no longer introduce more columns than rows in the KKT system.

7 MPC Implementation of ALGAMES

In this section, we propose an MPC implementation of the algorithm that provides us with a feedback policy instead of an open-loop strategy and demonstrates real-time performance. We compare this MPC to a non-game-theoretic baseline on a crowded ramp merging which is known to be conducive to the “frozen robot” problem.

7.1 MPC Feedback Policy

The strategies identified by ALGAMES are open-loop Nash equilibrium strategies. They are sequences of control inputs. On the contrary, DDP-based approaches like iLQGames solve for feedback Nash equilibrium strategies that provide a sequence of control gains. In the MPC setting, we can obtain a feedback policy with ALGAMES by updating the strategy as fast as possible and only executing the beginning of the strategy. This assumes a fast update rate of the solution. To support the feasibility of the approach, we implemented an MPC on the ramp merging scenario described in Figure 2.a. There are 3 players constantly maintaining a 40 time step strategy with 3 seconds of horizon. We simulate 3 seconds of operation of the MPC by constantly updating the strategies and propagating noisy unicycle dynamics for each vehicle. We compile the results from 100 MPC trajectories in Table 3.c. We obtain a Hz update frequency for the planner on average. We observe similar performance on the intersection problem defined in Figure 2.b, with an update frequency of Hz.

7.2 “Unfreezing” the Robot

To illustrate the benefits of using ALGAMES in a receding-horizon loop, we compare it to a non-game-theoretic baseline MPC. With this baseline, the prediction step and the planning step are decoupled. Specifically, each agent predicts the trajectories of the surrounding vehicles by propagating straight, constant velocity trajectories. Then, each agent plans for itself assuming these predicted trajectories are immutable obstacles. We test these two controllers on a challenging scenario where a vehicle has to merge on a crowded highway as presented in Figure 1. We perform a Monte Carlo analysis by uniformly sampling the initial state, , around a nominal state with perturbations corresponding to a longitudinal displacement, lateral displacement, in angular displacement for each car. Given the initial state, the vehicle on the ramp should be able to merge between the blue and orange cars or the orange and green cars, taking the and place respectively. However, waiting for all cars to pass before merging into place is not a desirable behavior. Indeed, with such a policy, the merging vehicle has to slow down significantly and could get stuck on the ramp if the highway does not clear. We run ALGAMES in a receding horizon loop and the baseline MPC to generate -second trajectories for 100 different initial states. We record the position of the merging vehicle at the end of the simulation and compile the results in Figure 6.a.

We observe that the “frozen robot” problem occurs with the baseline MPC for of the simulations. An interpretation of this result is that the vehicle on the ramp cannot find a merging maneuver that is not colliding with its constant-velocity trajectory predictions. Since there is no feasible merging maneuver, the only option left is to wait for the other vehicles to pass before merging.

On the contrary, by running ALGAMES in a receding-horizon loop, the vehicle merges into traffic in or place in of the simulations (Figure 6.a). ALGAMES avoids the “frozen robot” pitfall in most cases by gradually adjusting its velocity to merge with minimal disruption to the traffic (Figure 1).

Figure 6: We evaluate the ability of ALGAMES to avoid the “frozen robot” problem and to handle Nash equilibrium non-uniqueness.

7.3 Non-Uniqueness of Nash Equilibria in Practice

We assess the effect of the non-uniqueness of Nash equilibria in the MPC context. We focus on the coordination issue, that players may face when there exists multiple Nash equilibria. In our experiment, each car independently runs ALGAMES as an MPC policy. Each car plans for itself and predicts the other vehicles’ trajectories. We purposefully provide each player with a very different initial guess, in order to generate a mismatch between the Nash equilibrium solution that each player converges to. We simulate this on a ramp merging scenario with two players. In this scenario, an example of Nash equilibrium mismatch could be that both players think they let the oher player go first (Figure 6.b). The results, presented in Figure 6.c, suggest that most of the mismatches disappear rapidly after the initialization, i.e both players converges to the same Nash equilibrium. This can happen, for instance, when one Nash equilibrium is no longer feasible because it violates the bounds on the control inputs or the boundaries of the road. This positive results mitigates the concern caused by the potential occurrence of non-unique Nash equilibria. Nevertheless, it is also important to analyze the failure cases, where the two Nash equilibrium solutions found by the two players do not coincide. Typically, in these cases, each player’s solution remains fairly constant and does not oscillate between multiple equilibria. In such circumstances, it would be appropriate to estimate the equilibrium that the other player is following in order to switch to this equilibrium. Peters et al. demonstrated the feasibility of this approach in similar scenarios, using a particle filter Peters2020.

8 Conclusions

We have introduced a new algorithm for finding constrained Nash equilibrium trajectories in multi-player dynamic games. We demonstrated the performance and robustness of the solver through a Monte Carlo analysis on complex autonomous driving scenarios including nonlinear and non-convex constraints. We have shown real-time performance for up to 4 players and implemented ALGAMES in a receding-horizon framework to give a feedback policy. We empirically demonstrated the ability of ALGAMES to mitigate the “frozen robot” problem in comparison to a non-game-theoretic receding horizon planner. The results we obtained from ALGAMES are promising, as they seem to let the vehicles share the responsibility for avoiding collisions, leading to natural-looking trajectories where players are able to negotiate complex, interactive traffic scenarios that are challenging for traditional, non-game-theoretic trajectory planners. For this reason, we believe that ALGAMES could be a very efficient tool to generate trajectories in situations where the level of interaction between players is strong. Our implementation of ALGAMES is available at https://github.com/Robotic
ExplorationLab/ALGAMES.jl.

References