DeepAI
Log In Sign Up

GTP-SLAM: Game-Theoretic Priors for Simultaneous Localization and Mapping in Multi-Agent Scenarios

03/30/2022
by   Chih-Yuan Chiu, et al.
0

Robots operating in complex, multi-player settings must simultaneously model the environment and the behavior of human or robotic agents who share that environment. Environmental modeling is often approached using Simultaneous Localization and Mapping (SLAM) techniques; however, SLAM algorithms usually neglect multi-player interactions. In contrast, a recent branch of the motion planning literature uses dynamic game theory to explicitly model noncooperative interactions of multiple agents in a known environment with perfect localization. In this work, we fuse ideas from these disparate communities to solve SLAM problems with game theoretic priors. We present GTP-SLAM, a novel, iterative best response-based SLAM algorithm that accurately performs state localization and map reconstruction in an uncharted scene, while capturing the inherent game-theoretic interactions among multiple agents in that scene. By formulating the underlying SLAM problem as a potential game, we inherit a strong convergence guarantee. Empirical results indicate that, when deployed in a realistic traffic simulation, our approach performs localization and mapping more accurately than a standard bundle adjustment algorithm across a wide range of noise levels.

READ FULL TEXT VIEW PDF
07/11/2022

SLAM Backends with Objects in Motion: A Unifying Framework and Tutorial

Simultaneous Localization and Mapping (SLAM) algorithms are frequently d...
03/08/2021

Advances in Inference and Representation for Simultaneous Localization and Mapping

Simultaneous localization and mapping (SLAM) is the process of construct...
09/06/2021

Predicting Performance of SLAM Algorithms

Among the abilities that autonomous mobile robots should exhibit, map bu...
09/23/2019

Active collaboration in relative observation for Multi-agent visual SLAM based on Deep Q Network

This paper proposes a unique active relative localization mechanism for ...
11/05/2018

SLAMBooster: An Application-aware Controller for Approximation in SLAM

Simultaneous Localization and Mapping (SLAM) is the problem of construct...
09/13/2022

Optimizing SLAM Evaluation Footprint Through Dynamic Range Coverage Analysis of Datasets

Simultaneous Localization and Mapping (SLAM) is considered an ever-evolv...
03/02/2022

Distributed Riemannian Optimization with Lazy Communication for Collaborative Geometric Estimation

We present the first distributed optimization algorithm with lazy commun...

References

I Introduction

To navigate safely and efficiently in real-world scenarios, autonomous vehicles must accurately represent dynamic, uncharted environments, and execute robust motion plans that account for multi-player interactions in the scene. This requires a careful fusion of state estimation, prediction, and path planning modules in vehicle autonomy stacks. However, current, state-of-the-art autonomous navigation pipelines typically treat these problems separately and do not incorporate direct feedback between them.

Regarding estimation, algorithms tackling the Simultaneous Localization and Mapping (SLAM) problem aim to accurately reconstruct an uncharted environment while also localizing the “ego” player within it [LeonardDurrantWhyte1991SimultaneousMapBuilding, Cadena2016PastPresentandFuture, davison2018futuremapping1, davison2019futuremapping2]. In recent years, popular inference algorithms for SLAM, such as factor graph-based methods, have also been used to solve challenging motion planning and optimal control problems jointly with SLAM problems in unknown environments [Dellaert2017FactorGraphsforRobotPerception]. However, these approaches often do not account for purposeful interactions among dynamic players in the environment. It is thus unclear whether these methods can truly model safe, efficient, and robust motion plans generated by an ego player operating in the vicinity of other players.

Interactions between independently-minded, mobile players are naturally modeled as dynamic games between rational actors with differing objectives [isaacs1954differential, basar1998DynamicNoncooperativeGameTheory, starr1969nonzero, starr1969further]. Recent advances in game-theoretic motion planning exploit this structure to predict the responses of other players to one’s own decisions, and identify a desirable equilibrium strategy [fisac2015reach, fridovich2019efficient]. To the best of the authors’ knowledge, however, game-theoretic formulations of noncooperative multi-player interactions have not yet been considered in SLAM tasks.

In this work, we formulate the SLAM task from the perspective of an ego player, who is interacting with multiple other players while simultaneously estimating all players’ positions and all landmark locations. Inspired by the dynamic game theory literature, we first establish mild assumptions under which this problem can be formulated as a potential game. We then present a factor graph-based algorithm to solve this game and prove that it is guaranteed to converge to a local equilibrium. Unlike existing SLAM methods, this approach tightly integrates estimation, prediction, and decision-making for multiple players, simultaneously. Empirical results illustrate that, compared to standard bundle adjustment, incorporating game-theoretic interaction priors leads to higher localization and map reconstruction accuracy in a realistic traffic scenario.

Ii Related Work

Ii-a Simultaneous Localization and Mapping

Simultaneous Localization and Mapping is a fundamental state estimation task with a well-developed literature in the robotics community [LeonardDurrantWhyte1991SimultaneousMapBuilding, Cadena2016PastPresentandFuture], the unsolved aspects of which continue to attract great interest [davison2018futuremapping1, davison2019futuremapping2]. A standard method for solving SLAM problems is to reformulate the underlying maximum a posteriori (MAP) estimation problem into a nonlinear least squares problem, which can then be solved via factor graph optimization [DellaertKaess2006SquareRootSAM, KaessDellaert2012iSAM2].

In recent years, factor graphs have been used to formulate a wide range of robotics problems beyond the SLAM task in static environments, including model predictive control and trajectory tracking [TaDellaert2014AFactorGraphApproachtoEstimationAndMPC, Dellaert2017FactorGraphsforRobotPerception]. Factor graph-based methods have also been used to solve the dynamic SLAM problem, which involves the reconstruction of uncharted environments with dynamic players [zhang2020vdoslam] who may share measurement information [ZhangDellaert2021MRiSAM2]. These methods typically infer time-dependent variables pertaining to multiple players without accounting for players’ interactive, and likely noncooperative, behavior. By contrast, our approach explicitly accounts for purposeful and potentially noncooperative interactions between multiple players by using iterative best response to search for local Nash equilibria of the players’ variables.

Ii-B Multi-Player Path Planning via Dynamic Games

In robotics applications, interactions between multiple players are naturally modeled as dynamic games. In particular, scenarios in which two groups of players have opposing objectives, such as robust control problems and pursuit-evasion games, are often formulated as zero-sum dynamic games [fisac2015reach, FisacSastry2015PursuitEvasionDefense]. Meanwhile, problems in which multiple players have only partially conflicting objectives, such as path planning in busy traffic, are posed as general-sum dynamic games [fridovich2019efficient, peters2021rss]. Although solutions to continuous-time dynamic games are characterized by coupled Hamilton-Jacobi-Bellman (HJB) PDEs [starr1969nonzero, starr1969further, bansal2017hamilton]

, solving these equations is typically intractable due to the so-called “curse of dimensionality,”

[bellman1966dynamic] i.e., their computation time grows exponentially in the state space dimension. For this reason, such methods are impractical in many multi-player scenarios of interest.

In contrast, our work uses an iterative best response (IBR) scheme, in which each player takes a turn solving for their optimal strategy while assuming all other players’ strategies are fixed [fisac2018hierarchical, wang2018game, Kavuncu2021PotentialiLQR, ZanardiFrazzoli2021UrbanDrivingGamesWithLexicographicPreferences]. By replacing the dynamic game with a sequence of optimal control problems, the computational burden of solving for a local Nash equilibrium strategy is substantially reduced. Indeed, IBR has been successfully applied to a wide range of multi-player interaction scenarios, such as hierarchical planning for autonomous driving [fisac2018hierarchical] and racing [wang2018game]. Moreover, IBR is suitable for our application because it can be embedded in a factor graph-based framework, by iteratively solving estimation problems over the variables relevant to each player, while holding all other players’ variables fixed.

Our work also draws inspiration from the potential games literature, which exploits the cost structure of multi-player interactions in certain robotics applications [Kavuncu2021PotentialiLQR, ZanardiFrazzoli2021UrbanDrivingGamesWithLexicographicPreferences]. In particular, when cost terms that couple different players are all symmetric, there exists a single optimal control problem whose solution gives Nash equilibrium strategies of the interacting players. Recent literature indicates that iterative methods which exploit this symmetry structure in potential games can be more efficient than those that do not [Kavuncu2021PotentialiLQR]. Our approach draws inspiration from this observation: we recast the multi-agent SLAM problem under study as a potential game, and perform IBR in a manner that preserves its potential structure.

Iii Preliminaries

Below, we introduce core concepts in dynamic game theory. Readers are directed to [basar1998DynamicNoncooperativeGameTheory] for further details.

Iii-a Open-Loop Nash Equilibria

Consider the -player, -stage general-sum dynamic game , with nonlinear, discrete-time system dynamics for each player and time , given by:

(1)

where , , and are respectively the state, control input, and (differentiable) state transition map of player at time , and is the associated covariance matrix. Below, for each player , we use the shorthand , and . Moreover, we define and , where and .

In this work, we jointly estimate the trajectories of all players from the perspective of one particular player, referred to below as player 1 or the ego player. Other players are termed non-ego players. The ego player observes landmarks, whose global positions in are given by . Players also observe each others’ positions. These measurements, at each time , are given by:

(2)
(3)

where is the measurement of landmark by the ego player, for each , while is the measurement by player of player , for each , . Here, and denote associated covariance matrices. Additionally, each player’s objective is defined by , with for each . In this work, we presume that the ego player knows other players’ objectives . While this is certainly a strong assumption in practice, recent work has established that it is possible to infer unknown parameters of players’ objectives in such games efficiently [peters2021rss, cleac2020ral]. Thus equipped, we now define the Nash equilibrium of the GTP-SLAM problem.

Definition 1

(Open-Loop Nash equilibrium, [basar1998DynamicNoncooperativeGameTheory, Ch. 6]) We call an open-loop Nash equilibrium of if no player can lower their cost by unilaterally deviating from their control while all other players’ controls, , remains fixed, i.e.,

(4)

Iii-B Potential Dynamic Games

Our approach leverages well-established convergence guarantees of iterative best response (IBR) algorithms in the setting of potential games. For clarity, we define a finite-stage potential game as follows.

Definition 2

(Potential Dynamic Game, [Kavuncu2021PotentialiLQR], [FonsecaMoralesHernandezLerma2018PotentialDifferentialGames]) An -player, -stage general-sum dynamic game is called a potential game if there exists an optimal control problem, defined over all players’ controls , whose solutions are Nash equilibria of the game .

In Section IV-B, we will recast the multi-player, noncooperative SLAM problem of interest into a potential game, and establish mild assumptions under which an appropriate IBR algorithm converges.

Iv Methods

Our main contribution is GTP-SLAM, a novel SLAM algorithm for multi-player scenes, motivated by iterative best response. GTP-SLAM aims to jointly estimate the dynamic states and control inputs of all players in the scene, as well as landmark positions. It does so from the ego player’s perspective, while accounting for noncooperative, game-theoretic interactions between the players.

Iv-a Constructing the GTP-SLAM Factor Graph

Fig. 1: Factor graphs for (Left) GTP-SLAM, our IBR-based algorithm and (Right) a standard bundle adjustment approach, for a two-player example. Red and blue nodes represent dynamic variables (states , controls ) for players 1 and 2, respectively, while gray nodes indicate variables temporarily held constant. Square, circular, and triangular nodes represent states, controls, and landmarks, respectively. Green factors represent dynamics constraints, blue factors represent landmark and inter-player distance measurements, and black factors represent priors on states and controls, e.g. for lane tracking, heading alignment, and control effort minimization.

We begin by expressing the players’ noncooperative preferences as factors in a bipartite factor graph. Each factor is a function which encodes residual error

among the connected variables. That is, factors are vector-valued maps with which we may compute the joint likelihood of all input variables. Following standard Gaussian assumptions, we use the Mahalanobis distance associated to each factor (i.e.,

for factor and covariance ) to compute the negative log-likelihood of a collection of variables. Concretely, then, we construct a factor graph from the following terms:

(5)
(6)
(7)
(8)
(9)

which are color-coded in Figure 1. For example, the ternary dynamics factor (6) computes the difference between vehicles’ states and those predicted by the appropriate state transition function (1). Likewise, the factors (7) between pairs of states belonging to players describe interactions between pairs of players. For example, to encode collision avoidance, we may set . The factors denote the difference between expected and actual landmark measurements made by the ego player. Finally, , where , denotes inter-player position measurements between the ego player and each non-ego player.

The maximum a posteriori (MAP) estimation problem faced by each player, then, is the minimization of a sum of squared factors. In other words, each player’s individual decision problem is a nonlinear least squares problem, when other players’ variables are held constant. Neglecting interaction factors, landmarks, and inter-player measurements (which couple players’ variables together), we compute the partial log-likelihood of player ’s variables as:

(10)

Note that (10) does not include interaction terms () or measurements ( or ). This is because these quantities depend upon multiple players’ variables jointly, and also be cause measurements pertaining to landmarks and other players’ states are only assumed to be collected by the ego player. Including these terms, the ego player’s full estimation problem is thus given by:

(11)

while each non-ego player’s MAP problem is given by:

(12)

for each .

Iv-B GTP-SLAM as a Potential Game

Next, we illustrate that the GTP-SLAM problem of Section IV-A is a potential game (Lemma 1, Proposition 1). This connection to potential games is critical, as it suggests a locally-convergent solution method for GTP-SLAM problems given in Section IV-C (Corollary 1). The following results are based upon established concepts in the literature [FonsecaMoralesHernandezLerma2018PotentialDifferentialGames, ZanardiFrazzoli2021UrbanDrivingGamesWithLexicographicPreferences, Kavuncu2021PotentialiLQR]; here, we illustrate their pertinence to the noncooperative SLAM problem.

Lemma 1

Consider an -player, -stage dynamic game , with fixed initial condition . Suppose the system dynamics of each player given by (1), and the cost function of each player is of the form:

respectively, where , and satisfy:

Then is a potential game corresponding to the optimal control problem of minimizing the potential function:

(13)

subject to the dynamics (1), for each .

Refer to the appendix.

Given the result of Lemma 1, we now show that the GTP-SLAM game structure given in (10) is consistent with a potential game.

Proposition 1

The GTP-SLAM game, with players’ objectives given by (10), is a potential game.

In the context of (10), we have:

Thus, by Lemma 1, the game encoded in GTP-SLAM is a potential dynamic game.

Iv-C Iterative Best Response

To find Nash equilibria of the GTP-SLAM game with objectives given by (IV-A) and (IV-A), we employ Algorithm 1, an approach inspired by iterative best response (IBR). Specifically, Algorithm 1 proceeds in rounds, where each player minimizes its MAP objective while holding variables pertaining to other players fixed. Convergence is guaranteed by the following corollary due to [ZanardiFrazzoli2021UrbanDrivingGamesWithLexicographicPreferences].

Corollary 1

Algorithm 1 converges to an open-loop Nash equilibrium when applied to potential games of the form of Definition 2, if the maximum number of iterations is set to and the convergence tolerance is set to .

See [ZanardiFrazzoli2021UrbanDrivingGamesWithLexicographicPreferences, Proposition 1]

Data: Maximum number of iterations , Convergence tolerance , Cost functions .
Result: Nash equilibrium variables .
1 while  and  do
2        for  do
3              
4        end for
5       
6 end while
return
Algorithm 1 Solving the GTP-SLAM Problem

V Results

V-a Simulation Setup

To demonstrate the importance of game-theoretic priors in multi-player SLAM problems, we simulate a highway driving scenario, as shown in Figure 3. Specifically, four vehicles change lanes over a kilometer-long stretch of highway while avoiding collision and maintaining a desired speed. We assume that each vehicle follows Dubins paths, i.e., moves with constant speed (, here), and can control its yaw rate. Vehicle motion is discretized at intervals of . These dynamics constitute the state transition maps . Moreover, we assume that the highway is sparsely populated with occasional landmarks, e.g., exit signs, speed limit signs, and light poles, shown as yellow circles in Figure 3.

A local Nash equilibrium of the highway driving game is found by applying Algorithm 1 with fixed initial states for all players and neglecting measurement likelihood factors. To understand the role of game-theoretic interactions in SLAM problems, we conduct a Monte Carlo study of the highway driving scenario of Figure 3, with results recorded in Figure 2

. For each noise standard deviation level in the set

, we ran 50 experiments, each with a slightly perturbed set of initial conditions. For each experiment, we simulated random measurements of all landmarks, and of all non-ego players’ planar coordinates, with respect to the ego player’s local frame. We then ran Algorithm 1 to convergence, and compared the results to a standard bundle adjustment approach that neglected game-theoretic priors. That is, by ignoring (7) for the ego player and (5), (6), (7) for all non-ego players, the GTP-SLAM problem reduces to a single MAP problem which may be solved jointly for all players at once. Throughout all simulations, we use GTSAM [dellaert2012gtsam] to construct the factors above, compute Jacobians, and implement Levenberg-Marquardt steps [nocedal2006numerical] for both GTP-SLAM and bundle adjustment.

V-B Discussion

Figure 2 records the localization and map reconstruction error of GTP-SLAM (red) and standard bundle adjustment (blue). Compared to the bundle adjustment baseline, the localization and map reconstruction error for GTP-SLAM is lower across all noise standard deviation levels, and degrades more gracefully as noise levels increase. In particular, conventional bundle adjustment becomes numerically unstable at low noise levels; by contrast, the introduction of game-theoretic priors appears to yield a more well-conditioned estimation problem. In summary, these results indicate that game-theoretic priors introduce additional structure in an otherwise complex estimation problem, enabling reliable recovery of vehicle states and map landmarks.

Fig. 2: Root-mean-square error of estimated positions vs. standard deviation of measurement noise, for player positions and landmarks.
Fig. 3: Schematic of the highway example. Here, players 1 (red), 2 (blue), 3 (green), and 4 (purple) navigate a kilometer-long stretch of highway and interact with each other while performing lane changes. The ego player detects landmarks in the scene, which describe objects common to realistic highway scenarios, e.g., speed limit signs, exit signs, light poles, etc.

Vi Conclusion and Future Work

We present a novel method for Simultaneous Localization and Mapping (SLAM) in dynamic scenes in which multiple players interact noncooperatively. Our approach is inspired by recent advances in numerical algorithms for solving dynamic games, and exploits the structure of potential games to ensure reliable convergence. Empirical results illustrate that our algorithm outperforms standard bundle adjustment methods in localization and map reconstruction accuracy.

We foresee several important directions for future work. First, our experiments do not yet consider loop closures, which are essential for the long-term recovery of static scenes in SLAM tasks. It is thus critical to study how to best incorporate game-theoretic priors when detecting and enforcing loop closures. Second, we demonstrated our method in full-graph optimization problems; in practice, however, SLAM graphs are often optimized incrementally, as measurements are acquired in real-time. Our approach readily extends to this setting. Finally, our method only computes open-loop game strategies, corresponding to feedforward, rather than feedback, controls. Future work will investigate game-theoretic SLAM priors in more complicated strategy spaces.

[Proof of Proposition 1]

Proposition 1 follows directly from analogous proofs established in [Kavuncu2021PotentialiLQR, GonzalezSanchez2014DynamicPotentialGames], rephrased here for completeness.

Let be given by (13), with a minimizer given by , . For any player , and any unilateral deviation in player ’s controls away from , i.e., , with corresponding state trajectory :

(We have made implicit the dependence of and on the landmarks , for notational convenience.) This condition is precisely that which defines open-loop Nash equilibria (Definition 1). Hence, is an open-loop Nash equilibrium of the game .