## References

## I Introduction

To navigate safely and efficiently in real-world scenarios, autonomous vehicles must accurately represent dynamic, uncharted environments, and execute robust motion plans that account for multi-player interactions in the scene. This requires a careful fusion of state estimation, prediction, and path planning modules in vehicle autonomy stacks. However, current, state-of-the-art autonomous navigation pipelines typically treat these problems separately and do not incorporate direct feedback between them.

Regarding estimation, algorithms tackling the Simultaneous Localization and Mapping (SLAM) problem aim to accurately reconstruct an uncharted environment while also localizing the “ego” player within it [LeonardDurrantWhyte1991SimultaneousMapBuilding, Cadena2016PastPresentandFuture, davison2018futuremapping1, davison2019futuremapping2]. In recent years, popular inference algorithms for SLAM, such as factor graph-based methods, have also been used to solve challenging motion planning and optimal control problems jointly with SLAM problems in unknown environments [Dellaert2017FactorGraphsforRobotPerception]. However, these approaches often do not account for purposeful interactions among dynamic players in the environment. It is thus unclear whether these methods can truly model safe, efficient, and robust motion plans generated by an ego player operating in the vicinity of other players.

Interactions between independently-minded, mobile players are naturally modeled as dynamic games between rational actors with differing objectives [isaacs1954differential, basar1998DynamicNoncooperativeGameTheory, starr1969nonzero, starr1969further]. Recent advances in game-theoretic motion planning exploit this structure to predict the responses of other players to one’s own decisions, and identify a desirable equilibrium strategy [fisac2015reach, fridovich2019efficient]. To the best of the authors’ knowledge, however, game-theoretic formulations of noncooperative multi-player interactions have not yet been considered in SLAM tasks.

In this work, we formulate the SLAM task from the perspective of an ego player, who is interacting with multiple other players while simultaneously estimating all players’ positions and all landmark locations. Inspired by the dynamic game theory literature, we first establish mild assumptions under which this problem can be formulated as a potential game.
We then present a factor graph-based algorithm to solve this game and prove that it is guaranteed to converge to a local equilibrium. Unlike existing SLAM methods, this approach tightly integrates estimation, prediction, and decision-making *for multiple players, simultaneously*.
Empirical results illustrate that, compared to standard bundle adjustment, incorporating game-theoretic interaction priors leads to higher localization and map reconstruction accuracy in a realistic traffic scenario.

## Ii Related Work

### Ii-a Simultaneous Localization and Mapping

Simultaneous Localization and Mapping is a fundamental state estimation task with a well-developed literature in the robotics community [LeonardDurrantWhyte1991SimultaneousMapBuilding, Cadena2016PastPresentandFuture], the unsolved aspects of which continue to attract great interest [davison2018futuremapping1, davison2019futuremapping2]. A standard method for solving SLAM problems is to reformulate the underlying maximum a posteriori (MAP) estimation problem into a nonlinear least squares problem, which can then be solved via factor graph optimization [DellaertKaess2006SquareRootSAM, KaessDellaert2012iSAM2].

In recent years, factor graphs have been used to formulate a wide range of robotics problems beyond the SLAM task in static environments, including model predictive control and trajectory tracking [TaDellaert2014AFactorGraphApproachtoEstimationAndMPC, Dellaert2017FactorGraphsforRobotPerception]. Factor graph-based methods have also been used to solve the dynamic SLAM problem, which involves the reconstruction of uncharted environments with dynamic players [zhang2020vdoslam] who may share measurement information [ZhangDellaert2021MRiSAM2]. These methods typically infer time-dependent variables pertaining to multiple players without accounting for players’ interactive, and likely noncooperative, behavior. By contrast, our approach explicitly accounts for purposeful and potentially noncooperative interactions between multiple players by using iterative best response to search for local Nash equilibria of the players’ variables.

### Ii-B Multi-Player Path Planning via Dynamic Games

In robotics applications, interactions between multiple players are naturally modeled as dynamic games. In particular, scenarios in which two groups of players have opposing objectives, such as robust control problems and pursuit-evasion games, are often formulated as zero-sum dynamic games [fisac2015reach, FisacSastry2015PursuitEvasionDefense]. Meanwhile, problems in which multiple players have only partially conflicting objectives, such as path planning in busy traffic, are posed as general-sum dynamic games [fridovich2019efficient, peters2021rss]. Although solutions to continuous-time dynamic games are characterized by coupled Hamilton-Jacobi-Bellman (HJB) PDEs [starr1969nonzero, starr1969further, bansal2017hamilton]

, solving these equations is typically intractable due to the so-called “curse of dimensionality,”

[bellman1966dynamic] i.e., their computation time grows exponentially in the state space dimension. For this reason, such methods are impractical in many multi-player scenarios of interest.In contrast, our work uses an iterative best response (IBR) scheme, in which each player takes a turn solving for their optimal strategy while assuming all other players’ strategies are fixed [fisac2018hierarchical, wang2018game, Kavuncu2021PotentialiLQR, ZanardiFrazzoli2021UrbanDrivingGamesWithLexicographicPreferences]. By replacing the dynamic game with a sequence of optimal control problems, the computational burden of solving for a local Nash equilibrium strategy is substantially reduced. Indeed, IBR has been successfully applied to a wide range of multi-player interaction scenarios, such as hierarchical planning for autonomous driving [fisac2018hierarchical] and racing [wang2018game]. Moreover, IBR is suitable for our application because it can be embedded in a factor graph-based framework, by iteratively solving estimation problems over the variables relevant to each player, while holding all other players’ variables fixed.

Our work also draws inspiration from the potential games literature, which exploits the cost structure of multi-player interactions in certain robotics applications [Kavuncu2021PotentialiLQR, ZanardiFrazzoli2021UrbanDrivingGamesWithLexicographicPreferences]. In particular, when cost terms that couple different players are all symmetric, there exists a single optimal control problem whose solution gives Nash equilibrium strategies of the interacting players. Recent literature indicates that iterative methods which exploit this symmetry structure in potential games can be more efficient than those that do not [Kavuncu2021PotentialiLQR]. Our approach draws inspiration from this observation: we recast the multi-agent SLAM problem under study as a potential game, and perform IBR in a manner that preserves its potential structure.

## Iii Preliminaries

Below, we introduce core concepts in dynamic game theory. Readers are directed to [basar1998DynamicNoncooperativeGameTheory] for further details.

### Iii-a Open-Loop Nash Equilibria

Consider the -player, -stage general-sum dynamic game , with nonlinear, discrete-time system dynamics for each player and time , given by:

(1) |

where , , and are respectively the state, control input, and (differentiable) state transition map of player at time , and is the associated covariance matrix. Below, for each player , we use the shorthand , and . Moreover, we define and , where and .

In this work, we jointly estimate the trajectories of all players from the perspective of one particular player, referred to below as player 1 or the ego player. Other players are termed non-ego players. The ego player observes landmarks, whose global positions in are given by . Players also observe each others’ positions. These measurements, at each time , are given by:

(2) | ||||

(3) |

where is the measurement of landmark by the ego player, for each , while is the measurement by player of player , for each , . Here, and denote associated covariance matrices. Additionally, each player’s objective is defined by , with for each . In this work, we presume that the ego player knows other players’ objectives . While this is certainly a strong assumption in practice, recent work has established that it is possible to infer unknown parameters of players’ objectives in such games efficiently [peters2021rss, cleac2020ral]. Thus equipped, we now define the Nash equilibrium of the GTP-SLAM problem.

###### Definition 1

(Open-Loop Nash equilibrium, [basar1998DynamicNoncooperativeGameTheory, Ch. 6])
We call an *open-loop Nash equilibrium* of if no player can lower their cost by unilaterally deviating from their control while all other players’ controls, , remains fixed, i.e.,

(4) |

### Iii-B Potential Dynamic Games

Our approach leverages well-established convergence guarantees of iterative best response (IBR) algorithms in the setting of potential games. For clarity, we define a finite-stage potential game as follows.

###### Definition 2

(Potential Dynamic Game, [Kavuncu2021PotentialiLQR], [FonsecaMoralesHernandezLerma2018PotentialDifferentialGames])
An -player, -stage general-sum dynamic game is called a *potential game* if there exists an optimal control problem, defined over all players’ controls , whose solutions are Nash equilibria of the game .

In Section IV-B, we will recast the multi-player, noncooperative SLAM problem of interest into a potential game, and establish mild assumptions under which an appropriate IBR algorithm converges.

## Iv Methods

Our main contribution is GTP-SLAM, a novel SLAM algorithm for multi-player scenes, motivated by iterative best response. GTP-SLAM aims to jointly estimate the dynamic states and control inputs of all players in the scene, as well as landmark positions. It does so from the ego player’s perspective, while accounting for noncooperative, game-theoretic interactions between the players.

### Iv-a Constructing the GTP-SLAM Factor Graph

We begin by expressing the players’ noncooperative preferences as factors in a bipartite factor graph.
Each factor is a function which encodes *residual error*

among the connected variables. That is, factors are vector-valued maps with which we may compute the joint likelihood of all input variables. Following standard Gaussian assumptions, we use the Mahalanobis distance associated to each factor (i.e.,

for factor and covariance ) to compute the negative log-likelihood of a collection of variables. Concretely, then, we construct a factor graph from the following terms:(5) | ||||

(6) | ||||

(7) | ||||

(8) | ||||

(9) |

which are color-coded in Figure 1. For example, the ternary dynamics factor (6) computes the difference between vehicles’ states and those predicted by the appropriate state transition function (1). Likewise, the factors (7) between pairs of states belonging to players describe interactions between pairs of players. For example, to encode collision avoidance, we may set . The factors denote the difference between expected and actual landmark measurements made by the ego player. Finally, , where , denotes inter-player position measurements between the ego player and each non-ego player.

The maximum a posteriori (MAP) estimation problem faced by each player, then, is the minimization of a sum of squared factors. In other words, each player’s individual decision problem is a nonlinear least squares problem, when other players’ variables are held constant. Neglecting interaction factors, landmarks, and inter-player measurements (which couple players’ variables together), we compute the partial log-likelihood of player ’s variables as:

(10) | ||||

Note that (10) does not include interaction terms () or measurements ( or ). This is because these quantities depend upon multiple players’ variables jointly, and also be cause measurements pertaining to landmarks and other players’ states are only assumed to be collected by the ego player. Including these terms, the ego player’s full estimation problem is thus given by:

(11) |

while each non-ego player’s MAP problem is given by:

(12) |

for each .

### Iv-B GTP-SLAM as a Potential Game

Next, we illustrate that the GTP-SLAM problem of Section IV-A is a potential game (Lemma 1, Proposition 1). This connection to potential games is critical, as it suggests a locally-convergent solution method for GTP-SLAM problems given in Section IV-C (Corollary 1). The following results are based upon established concepts in the literature [FonsecaMoralesHernandezLerma2018PotentialDifferentialGames, ZanardiFrazzoli2021UrbanDrivingGamesWithLexicographicPreferences, Kavuncu2021PotentialiLQR]; here, we illustrate their pertinence to the noncooperative SLAM problem.

###### Lemma 1

Consider an -player, -stage dynamic game , with fixed initial condition . Suppose the system dynamics of each player given by (1), and the cost function of each player is of the form:

respectively, where , and satisfy:

Then is a potential game corresponding to the optimal control problem of minimizing the potential function:

(13) |

subject to the dynamics (1), for each .

Refer to the appendix.

### Iv-C Iterative Best Response

To find Nash equilibria of the GTP-SLAM game with objectives given by (IV-A) and (IV-A), we employ Algorithm 1, an approach inspired by iterative best response (IBR). Specifically, Algorithm 1 proceeds in rounds, where each player minimizes its MAP objective while holding variables pertaining to other players fixed. Convergence is guaranteed by the following corollary due to [ZanardiFrazzoli2021UrbanDrivingGamesWithLexicographicPreferences].

###### Corollary 1

See [ZanardiFrazzoli2021UrbanDrivingGamesWithLexicographicPreferences, Proposition 1]

## V Results

### V-a Simulation Setup

To demonstrate the importance of game-theoretic priors in multi-player SLAM problems, we simulate a highway driving scenario, as shown in Figure 3. Specifically, four vehicles change lanes over a kilometer-long stretch of highway while avoiding collision and maintaining a desired speed. We assume that each vehicle follows Dubins paths, i.e., moves with constant speed (, here), and can control its yaw rate. Vehicle motion is discretized at intervals of . These dynamics constitute the state transition maps . Moreover, we assume that the highway is sparsely populated with occasional landmarks, e.g., exit signs, speed limit signs, and light poles, shown as yellow circles in Figure 3.

A local Nash equilibrium of the highway driving game is found by applying Algorithm 1 with fixed initial states for all players and neglecting measurement likelihood factors. To understand the role of game-theoretic interactions in SLAM problems, we conduct a Monte Carlo study of the highway driving scenario of Figure 3, with results recorded in Figure 2

. For each noise standard deviation level in the set

, we ran 50 experiments, each with a slightly perturbed set of initial conditions. For each experiment, we simulated random measurements of all landmarks, and of all non-ego players’ planar coordinates, with respect to the ego player’s local frame. We then ran Algorithm 1 to convergence, and compared the results to a standard bundle adjustment approach that neglected game-theoretic priors. That is, by ignoring (7) for the ego player and (5), (6), (7) for all non-ego players, the GTP-SLAM problem reduces to a single MAP problem which may be solved jointly for all players at once. Throughout all simulations, we use GTSAM [dellaert2012gtsam] to construct the factors above, compute Jacobians, and implement Levenberg-Marquardt steps [nocedal2006numerical] for both GTP-SLAM and bundle adjustment.### V-B Discussion

Figure 2 records the localization and map reconstruction error of GTP-SLAM (red) and standard bundle adjustment (blue). Compared to the bundle adjustment baseline, the localization and map reconstruction error for GTP-SLAM is lower across all noise standard deviation levels, and degrades more gracefully as noise levels increase. In particular, conventional bundle adjustment becomes numerically unstable at low noise levels; by contrast, the introduction of game-theoretic priors appears to yield a more well-conditioned estimation problem. In summary, these results indicate that game-theoretic priors introduce additional structure in an otherwise complex estimation problem, enabling reliable recovery of vehicle states and map landmarks.

## Vi Conclusion and Future Work

We present a novel method for Simultaneous Localization and Mapping (SLAM) in dynamic scenes in which multiple players interact noncooperatively. Our approach is inspired by recent advances in numerical algorithms for solving dynamic games, and exploits the structure of potential games to ensure reliable convergence. Empirical results illustrate that our algorithm outperforms standard bundle adjustment methods in localization and map reconstruction accuracy.

We foresee several important directions for future work. First, our experiments do not yet consider loop closures, which are essential for the long-term recovery of static scenes in SLAM tasks. It is thus critical to study how to best incorporate game-theoretic priors when detecting and enforcing loop closures. Second, we demonstrated our method in full-graph optimization problems; in practice, however, SLAM graphs are often optimized incrementally, as measurements are acquired in real-time. Our approach readily extends to this setting. Finally, our method only computes open-loop game strategies, corresponding to feedforward, rather than feedback, controls. Future work will investigate game-theoretic SLAM priors in more complicated strategy spaces.

[Proof of Proposition 1]

Proposition 1 follows directly from analogous proofs established in [Kavuncu2021PotentialiLQR, GonzalezSanchez2014DynamicPotentialGames], rephrased here for completeness.

Let be given by (13), with a minimizer given by , . For any player , and any unilateral deviation in player ’s controls away from , i.e., , with corresponding state trajectory :

(We have made implicit the dependence of and on the landmarks , for notational convenience.) This condition is precisely that which defines open-loop Nash equilibria (Definition 1). Hence, is an open-loop Nash equilibrium of the game .