I Introduction
Multirobot simultaneous localization and mapping (SLAM) is a fundamental capability for many realworld robotic applications. Pose graph optimization (PGO) is the backbone of state of the art approaches to multirobot SLAM, which fuses individual trajectories together and endows participating robots with a common spatial understanding of the environment. Many approaches to multirobot PGO require the centralized processing of observations at a base station, which is communication intensive and vulnerable to single point of failure. In contrast, decentralized approaches are favorable as they effectively mitigate communication, privacy, and vulnerability concerns associated with centralization.
Recent works on distributed PGO have achieved important progress; see e.g., [1, 2] and the references therein. However, to the best of our knowledge, existing distributed algorithms are inherently synchronous, which necessitates that robots, for instance, pass messages over the network or wait at predetermined points, in order to ensure uptodate information sharing during distributed optimization. Doing so may incur considerable communication overhead and increase the complexity of implementation. On the other hand, simply dropping synchronization in the execution of synchronous algorithms may slow down convergence or even cause divergence, both in theory and practice.
In this work, we overcome the aforementioned challenge by proposing ASAPP (Asynchronous StochAstic Parallel Pose Graph Optimization), the first asynchronous and provably convergent algorithm for distributed PGO. We take inspiration from existing parallel and asynchronous algorithms [3, 4, 5, 6, 7], and adapt these ideas to solve the nonconvex Riemannian optimization problem underlying PGO. In ASAPP, each robot executes its local optimization loop at a high rate, without waiting for updates from others over the network. This makes ASAPP easier to implement in practice and flexible against communication delay. Furthermore, we show that the same algorithm can be applied straightforwardly to solve the socalled rankrestricted semidefinite relaxations of PGO, a crucial class of nonconvex Riemannian optimization problems that lies at the heart of recent PGO solvers with global optimality guarantees [8, 9, 2].
Since asynchronous algorithms allow communication delays to be substantial and unpredictable, it is usually unclear under what conditions they converge in practice. In this work, we provide a rigorous answer to this question and establishe the first known convergence result for asynchronous algorithms on the nonconvex PGO problem. In particular, we show that as long as the worstcase delay is not arbitrarily large, ASAPP always achieves global firstorder convergence using a sufficiently small stepsize. The derived stepsize depends on the maximum delay and inherent problem sparsity, and furthermore reduces to the well known constant of (where is the Lipschitz constant) for synchronous algorithms when there is no delay. In our experiments, we verify the convergence property of ASAPP and demonstrate its resilience against a wide range of communication delays.
Contributions We present ASAPP, the first asynchronous algorithm to solve distributed PGO and its rankrestricted semidefinite relaxations. Under suitable hypotheses of the worstcase delay due to asynchrony, we prove that ASAPP converges to a firstorder critical point for a sufficiently small stepsize, and establish a global sublinear convergence rate. The stepsize we derive depends on the worstcase delay and inherent problem sparsity, and furthermore matches result in existing synchronous algorithms when delay is zero. Numerical evaluations on simulated and realworld datasets demonstrate that ASAPP outperforms baseline algorithms in terms of overall execution time, and furthermore is resilient against a wide range of communication delays. Both results show the practical value of the proposed algorithm in a realistic distributed optimization setting.
Preliminaries on Riemannian Optimization
This work relies heavily on the firstorder geometry of Riemannian manifolds. The reader is referred to [10] for a rigorous treatment of this subject. In SLAM, examples of matrix manifolds that frequently appear include the orthogonal group , special orthogonal group , and the special Euclidean group . In this work, we use to denote a general matrix submanifold, where is the socalled ambient space (in this work, is always the Euclidean space). Each point on the manifold has an associated tangent space . Informally, contains all possible directions of change at while staying on . As
is a vector space, we also endow it with the standard Frobenius inner product, i.e., for two tangent vectors
, . The inner product induces a norm . Finally, a tangent vector can be mapped back to the manifold through a retraction , which is a smooth mapping that preserves the firstorder structure of the manifold [10].Riemannian optimization considers minimizing a function on the manifold. Firstorder Riemannian optimization algorithms, including the one proposed in this work, often use the Riemannian gradient , which corresponds to the direction of steepest ascent in the tangent space. For matrix submanifolds, the Riemannian gradient is obtained by an orthogonal projection of the usual Euclidean gradient onto the tangent space, i.e., [10]. We call a firstorder critical point if .
Ii Related Work
Iia Distributed and Parallel PGO
In pursuit of decentralized asynchronous algorithms, we note that synchronized decentralized PGO has been wellstudied. Tron et al. [11, 12, 13, 14] propose a distributed consensus protocol based on Riemannian gradient descent. The key insight which departs from vanilla distributed gradient method is the definition of a set of reshaped cost functions based on the geodesic distance, under which the method provably converges. A similar gradientbased method with linesearch has also been proposed [15]. Choudhary et al. [16] propose the alternating direction method of multipliers (ADMM) as a decentralized method to solve PGO. However, convergence of ADMM is not established due to the nonconvex nature of the optimization problem. More recently, Choudhary et al. [1] propose a twostage approach where each stage uses distributed successive overrelaxation (SOR) [3] to solve a relaxed or linearized PGO problem. The twostage approach [1]
is further combined with outlier rejection schemes in
[17]. In our recent work [2], we avoid explicit linearization by directly optimizing PGO and its rankrestricted semidefinite relaxations [9]. The proposed solver performs distributed blockcoordinate descent over the product of Riemannian manifolds, and provably converges to firstorder critical points with global sublinear rate. In a separate line of research, Fan and Murphey [18] propose an accelerated PGO solver suitable for distributed optimization based on generalized proximal methods.IiB Asynchronous Parallel Optimization
The aforementioned works are promising, but critically rely on synchronization
which limits their practical value for networked autonomous systems. However, we note that within the broader optimization literature, there is a plethora of works on parallel and asynchronous optimization, partially motivated by popular applications in largescale machine learning and deep learning. Study of asynchronous gradientbased algorithms began with the seminal work of Bertsekas and Tsitsilis
[3], and have led to the recent development of asynchronous randomized block coordinate and stochastic gradient algorithms, see [4, 5, 6, 19, 7, 20, 21] and references therein. We are especially interested in asynchronous parallel schemes for nonconvex optimization, which have been studied in [7, 21]. In this work, we generalize these approaches to the setting where the feasible set is the product of nonconvex matrix manifolds, motivated by PGO. Our model of asynchrony is comparable to [20]: workers exchange local parameters asynchronously during optimization. However, unlike [20], we obviate the need for local averaging to achieve consensus, as each robot is only responsible for updating its own trajectory.Iii Problem Formulation
In this section, we formally define pose graph optimization (PGO) in the context of multirobot SLAM. Given relative pose measurements (possibly between different robots), we aim to jointly estimate the trajectories of all robots in a global reference frame. Let be the set of indices associated with robots. Denote the pose of robot at time step as , where is the dimension of the estimation problem. Here is a rotation matrix, and is a translation vector. A relative pose measurement from to is denoted as . We assume the following standard noise model for our measurements [2, 8, 9],
(1)  
(2) 
Above, denotes the true (i.e., noiseless) relative transformation. The isotropic Langevin noise on rotations [8] plays an analogous role as the Gaussian noise on translations. We note that our formulation trivially generalizes to the case where the values of and vary for different measurements. In the following, we drop this variation for notation simplicity. Given noisy measurements of the form (1)(2), we seek to find the maximum likelihood pose graph configurations for all robots in . Doing so amounts to the following nonconvex program [8].
Problem 1 (Maximum Likelihood Estimation).
() 
Problem () can be compactly represented with a pose graph , where each vertex in corresponds to a single pose owned by a robot. Observe that the sum in the objective is taken over all edges in , where an edge from to is formed if there is a relative measurement from to . Fig. 0(a) shows an example pose graph.
In this paper, we further consider the rankrestricted semidefinite relaxation of () [8, 9]. Denote the Stiefel manifold as , where . The rank relaxation of () is defined as the following nonconvex Riemannian optimization problem.
Problem 2 (Rankrestricted Semidefinite Relaxation).
() 
Observe that for , the Stiefel manifold is identical to the orthogonal group . In this case, () is referred to as the orthogonal relaxation of (), obtained by dropping the determinant constraint on . As increases beyond , we obtain a hierarchy of rankrestricted problems, each having the form of () but with a slightly “lifted” search space as determined by . This hierarchy of rankrestricted problems lies at the heart of the socalled Riemannian Staircase procedure [22] for solving the semidefinite relaxation of (), which has proven extremely successful in the design of PGO solvers with global optimality guarantees [8, 9, 2]. Once we solve (), either globally or locally to a critical point, we can apply a distributed rounding procedure (e.g., as detailed in [2]) to obtain a feasible solution to the original MLE problem (). In addition, note that () shares the same sparsity structure as encoded by the pose graph.
For the purpose of designing decentralized algorithms (Section IV), it is more convenient to rewrite () and () into a more abstract form at the level of robots, which may be done as follows.
Problem 3 (Robotlevel Optimization Problem).
(P)  
In (P), each variable concatenates all variables owned by robot . For instance, for (), contains all the “lifted” rotation and translation variables of robot . Let be the number of poses of robot . Then,
(5)  
(6) 
The cost function in (P) consists of a set of shared costs between pairs of robots, and a set of private costs for individual robots. Intuitively, is formed by relative measurements between any of robot ’s poses and ’s poses. In contrast, is formed by relative measurements within robot ’s own trajectory.
Similar to the way a pose graph is defined, we can encode the structure of (P) using a robotlevel graph ; see Fig. 0(b). can be viewed as a “reduced” graph of the pose graph, in which each vertex corresponds to the entire trajectory of a single robot . Two robots are connected in if they share any relative measurements . In this case, we call a neighboring robot of , and a neighboring pose of robot . If a pose variable is not a neighboring pose to any other robots, we call this pose a private pose [2]. We note that for robot to evaluate the shared cost , it only needs to know its neighboring poses in robot ’s trajectory (see Fig. 1). This property is crucial in preserving the privacy of participating robots [1, 2], i.e., at any time, a robot does not need to share its private poses with any of its teammates.
Iv Proposed Algorithm
We present our main algorithm, Asynchronous Stochastic Parallel Pose Graph Optimization (ASAPP), for solving distributed PGO problems of the form (P). Our algorithm is inspired by asynchronous stochastic coordinate descent (e.g., see [6]), in which multiple processors update randomly selected coordinates of the variable concurrently. In the context of distributed PGO, each coordinate corresponds to the stacked relative pose observations of a single robot as defined in (P).
In a practical multirobot SLAM scenario, each robot can optimize its own pose estimates at any time, and can additionally share its (nonprivate) poses with others when communication is available. Correspondingly, each robot running ASAPP has two concurrent onboard processes, which we refer to as the optimization thread and communication thread. We emphasize that the robots perform both optimization and communication completely in parallel and without synchronization with each other. We begin by describing the communication thread and then proceed to the optimization thread. Without loss of generality, we describe the algorithm from the perspective of robot .
Iva Communication Thread
As part of the communication module, each robot implements a local data structure, called a cache, that contains the robot’s own variable , together with the most recent copies of neighboring poses received from the robot’s neighbors. We note that since only can modify , the value of in robot ’s cache is guaranteed to be uptodate at anytime. In contrast, the copies of neighboring poses from other robots can be outofdate due to communication delay. For example, by the time robot receives and uses a copy of robot ’s poses, might have already updated its poses due to its local optimization process. In Section V, we show that ASAPP is resilient against such network delay. Nevertheless, for ASAPP to converge, we still assume that the total delay induced by the communication process remains bounded. We formally introduce this assumption in Section V.
The communication thread performs the following two operations over the cache.
Receive: After receiving a neighboring pose, e.g., from a neighboring robot over the network, the communication thread updates the corresponding entry in the cache to store the new value.
Send: Periodically (when communication is available), robot also transmits its latest public pose variables (i.e., poses that have interrobot measurements with other robots) to its neighbors. Recall from Section III that robot does not need to send its private poses, as these poses are not needed by other robots to optimize their estimates.
IvB Optimization Thread
Concurrent to the communication thread, the optimization thread is invoked by a local clock that ticks according to a Poisson process of rate .
Definition 1 (Poisson process [23]).
Consider a sequence
of positive, independent random variables that represent the time elapsed between consecutive events (in this case, clock ticks). Let
be the number of events up to time . The counting process is a Poisson process with rate if the interarrival timeshave a common exponential distribution function,
(7) 
The use of Poisson clocks originates from the design of randomized gossip algorithms by Boyd et al. [24] and is a commonly used tool for analyzing the global behavior of distributed randomized algorithms. We assume that the rate parameter is equal and shared among robots. In practice, we can adjust based on the extent of network delay and the robots’ computational capacity. Using this local clock, the optimization thread performs the following operations in a loop.
Read: For each neighboring robot , read the value of stored in the local cache. Denote the read values as . Recall that can be outdated, for example if robot has not received the latest messages from robot . In addition, read the value of , denoted as . Recall from Section IVA that is guaranteed to be uptodate.
In practice, only contains the set of neighboring poses from robot since is independent from the rest of ’s poses (Fig. 1). However, for ease of notation and analysis (Section V), we treat as if it contains the entire set of ’s poses.
Compute: Form the local cost function for robot , denoted as , by aggregating relevant costs in (P) that involve ,
(8) 
Compute the Riemannian gradient at robot ’s current estimate ,
(9) 
Update: At the next local clock tick, update in the direction of the negative gradient,
(10) 
where is a constant stepsize. Equation (10) gives the simplest update rule that robots can follow, and forms the basis of our convergence analysis in Section V
. To further accelerate convergence in practice, stateoftheart solvers often implement a heuristic known as
preconditioning [8, 9, 2]. We note that ASAPP can be straightforwardly extended to use preconditioning, by using the following alternative update direction,(11) 
In (11), is a linear, symmetric, and positive definite mapping on the tangent space that approximates the inverse of Riemannian Hessian. Intuitively, preconditioning helps firstorder methods benefit from using the (approximate) secondorder geometry of the cost function, which often results in significant speedup especially on poorly conditioned problems.
IvC Implementation Details
To make the local clock model valid, we require that the total execution time of the ReadComputeUpdate sequence be smaller than the interarrival time of the Poisson clock, so that the current sequence can finish before the next one starts. This requirement is fairly lax in practice, as all three steps only involve minimal computation and access to local memory. In the worst case, since the interarrival time is determined by on average [23], one can also decrease the clock rate to create more time for each update.
In addition, we note that although the optimization and communication threads run concurrently, minimal synchronization is required to ensure the socalled atomic read and write of individual poses. Specifically, a thread cannot read a pose in the cache if the other thread is actively modifying its value (otherwise the read value would not be valid). Such synchronization can be easily enforced using software locks. In practice, however, due to the large number of poses owned by each robot, the aforementioned synchronization only happens relatively rarely.
V Convergence Analysis
Va Global View of the Algorithm
In Section IV, we described ASAPP from the local perspective of each robot. For the purpose of establishing convergence, however, we need to analyze the systematic behavior of this algorithm from a global perspective [6, 19, 20, 24]. To do so, let be a virtual counter that counts the total number of Update operations applied by all robots. In addition, let the random variable represent the robot that updates at global iteration . We emphasize that and are purely used for theoretical analysis, and are unknown to any of the robots in practice.
Recall from Section IVB that all Update steps are generated by independent Poisson processes, each with rate . In the global perspective, merging these local processes is equivalent to creating a single, global Poisson clock with rate
. Furthermore, at any time, all robots have equal probabilities of generating the next
Update step, i.e., for all ,is i.i.d. uniformly distributed over the set
. See [23] for proofs of these results.Using this result, we can write the iterations of ASAPP from the global view; see Algorithm 1. We use to represent the value of all robots’ poses after global iterations (i.e., after total Update steps). Note that lives on the product manifold . At global iteration , a robot is selected from uniformly at random (line 4). Robot then follows the steps in Section IVB to update its own variable (line 58). We have used the fact that is always uptodate (line 5), while is outdated for total Update steps (line 6). Except robot , all other robots do not update (line 9). As an additional notation that will be useful for later analysis, we note that line 9 can be equivalently written as with .
VB Sufficient Conditions for Convergence
We establish sufficient conditions for ASAPP to converge to firstorder critical points. Due to space limitation, all proofs are deferred to the appendix. We adopt the commonly used partially asynchronous model [3], which assumes that delay caused by asynchrony is not arbitrarily large. In practice, the magnitude of delay is affected by various factors such as the rate of communication (Section IVA), the rate of local optimization (Section IVB), and intrinsic network latency. For the purpose of analysis, we assume that all these factors can be summarized into a single constant , which bounds the maximum delay in terms of number of global iterations (i.e., Update steps applied by all robots) in Algorithm 1.
Assumption 1 (Bounded Delay).
In Algorithm 1, there exists a constant such that for all .
For both the MLE problem () and its rankrestricted semidefinite relaxations (), the gradients of the cost functions enjoy a Lipschitztype condition, which is proved in our previous work [2] and will be used extensively in the rest of the analysis.
Lemma 1 (Lipschitztype gradient for pullbacks [2]).
The condition (12) is first proposed by [25] as an adaptation of Lipschitz continuous gradient to Riemannian optimization. Using the bounded delay assumption and the Lipschitztype condition in (12), we can proceed to analyze the change in cost function after a single iteration of Algorithm 1 (in the global view). We formally state the result in the following lemma.
Lemma 2 (Descent Property of Algorithm 1).
In (13), the last term on the right hand side sums over the squared norms of a set of , where each corresponds to the update taken by a neighbor at an earlier iteration . This term is a direct consequence of delay in the system, and is also the main obstacle for proving convergence in the asynchronous setting. Indeed, without this term, it is straightforward to verify that any stepsize that satisfies guarantees , and thus leads to convergent behavior. With the last term in (13), however, the overall cost could increase after each iteration.
While the delaydependent error term gives rise to additional challenges, our next theorem states that with sufficiently small stepsize, this error term is inconsequential and ASAPP provably converges to firstorder critical points.
Theorem 1 (Global convergence of Asapp).
Let be any global lower bound on the optimum of (P). Define . Let be an upper bound on the stepsize that satisfies,
(14) 
In particular, the following choice of satisfies (14):
(15) 
Under Assumption 1, if , ASAPP converges to a firstorder critical point with global sublinear rate. Specifically, after total update steps,
(16) 
Remark 1.
To the best of our knowledge, Theorem 1 establishes the first convergence result for asynchronous algorithms when solving a nonconvex optimization problem over the product of matrix manifolds. While the existence of a convergent stepsize is of theoretical importance, we further note that its expression offers the correct qualitative insights with respect to various problemspecific parameters, which we discuss next.
Relation with maximum delay (): increases as maximum delay decreases. Intuitively, as communication becomes increasingly available, each robot may take larger steps without causing divergence. The inverse relationship between and is well known in the asynchronous optimization literature, and is first established by Bertsekas and Tsitsilis [3] in the Euclidean setting.
Relation with problem sparsity (): increases as decreases. Recall that is defined as the ratio between the maximum number of neighbors a robot has and the total number of robots. Thus, is a measure of sparsity of the robotlevel graph . Intuitively, as becomes more sparse, robots can use larger stepsize as their problems become increasingly decoupled. Such positive correlation between and problem sparsity has been a crucial feature in stateoftheart asynchronous algorithms; see e.g., [5].
Vi Experimental Results
We implement ASAPP in C++ and evaluate its performance on both simulated and realworld PGO datasets. We use ROPTLIB [26] for manifold related computations, and the Robot Operating System (ROS) [27] for interrobot communication. The Poisson clock is implemented by halting the optimization thread after each iteration for a random amount of time exponentially distributed with rate (default to Hz). Since the time taken by each iteration is negligible, we expect the practical difference between this implementation and the theoretical model in Section IVB to be insignificant. All robots are simulated as separate ROS nodes running on a desktop computer with an Intel i7 quadcore CPU and GB memory.
Prior to distributed optimization, we initialize all robots’ trajectory estimates by propagating noisy relative measurements along a spanning tree of the global pose graph. Compared to other initialization techniques (e.g., the distributed chordal initialization in [1]), the spanning tree initialization incurs minimal communication cost and usually (in lownoise regimes) produces a reasonably good initial solution.
For each PGO problem, we use ASAPP to solve its rankrestricted semidefinite relaxation with . During optimization, we record the evolution of the Riemannian gradient norm , which measures convergence to a firstorder critical point. In addition, we also record the optimality gap , where is a global minimizer to the PGO problem () computed using the centralized semidefinite relaxation [9]. After optimization, we also round the solution to and then compute the rotation and translation root mean squared error (RMSE) with respect to the global minimizer.
Via Evaluation in Simulation
We evaluate ASAPP in a simulated multirobot SLAM scenario in which robots move next to each other in a 3D grid with lawn mower trajectories (Fig. 1(a)). Each robot has poses. With probability , loop closures within and across trajectories are generated for poses within m of each other. All measurements are corrupted by Langevin rotation noise with standard deviation, and Gaussian translation noise with m standard deviation. As is commonly done in prior work [4, 5, 6, 19, 7, 20, 21], in our experiments we select the stepsize empirically, in this case .
In the first experiment, we simulate a fixed communication delay by letting each robot communicate every s. We compare the performance of ASAPP (without preconditioning) against a baseline algorithm in which each robot uses the secondorder Riemannian trustregion (RTR) method to optimize its local variable, similar to the approach in [2]. RTR has emerged as the default solver in the synchronous setting due to its global convergence guarantees and ability to exploit secondorder geometry of the cost function. For a comprehensive evaluation, we record the performance of this baseline at different optimization rates (i.e. frequency at which robots update their local trajectories).
Fig. 1(b) shows the optimality gaps achieved by the evaluated algorithms as a function of wall clock time. The corresponding reduction in the Riemannian gradient norm is shown in Fig. 1(c). ASAPP outperforms all variants of the baseline algorithm (dashed curves). We note that the behavior of the baseline algorithm is expected. At a low rate, e.g., Hz (dark blue dashed curve), the baseline algorithm is essentially synchronous as each robot has access to uptodate poses from others. The empirical convergence speed is nevertheless slow, since each robot needs to wait for uptodate information to arrive after each iteration. At a high rate, e.g., Hz (dark yellow dashed curve), robots essentially behave asynchronously. However, since RTR does not regulate stepsize at each iteration, robots often significantly alter their solutions in the wrong direction (as a result of using outdated information), which leads to slow convergence or even nonconvergence. Lastly, we observe that at an intermediate rate, e.g., Hz, convergence speed of the baseline approaches that of ASAPP. However, we emphasize that the baseline algorithm does not provide any convergence guarantees. In contrast, ASAPP is provably convergent, and furthermore is able to exploit asynchrony effectively to achieve speedup.
In addition, we also evaluate ASAPP under a wide range of communication delays. Due to space limitation, we only show performance in terms of gradient norm in Fig. 3. We note that ASAPP converges in all cases, demonstrating its resilience against various delays in practice. Furthermore, as delay decreases, convergence becomes faster as robots have access to more uptodate information from each other.
ViB Evaluation on benchmark PGO datasets
Datasets  # Poses  # Edges  Stepsize  Init. Opt. Gap  Final Opt. Gap  Gradnorm  Rot. Error [deg]  Trans. Error [m] 

CSAIL (2D)  1045  1171  628.7  0.10  0.55  0.22  0.004  
Intel Research Lab (2D)  1228  1483  342.2  0.82  0.62  0.99  0.003  
Parking Garage (3D)  1661  6275  418.2  0.22  0.17  3.00  0.01  
Sphere (3D)  2500  4949  694.3  14.7  2.79  1.32  0.01 
To further demonstrate the effectiveness of ASAPP, we evaluate the algorithm on several benchmark SLAM datasets. Each dataset is divided into segments simulating a collaborative SLAM mission with robots. Due to space limitations, figures of these datasets are provided in the appendix. To accelerate empirical convergence, we run ASAPP with preconditioning as described in Section IVB. By default, we use the spanning tree initialization before running ASAPP. On the synthetic Sphere dataset, however, the spanning tree initialization gives a particularly poor initial guess, and we use the distributed chordal initialization [1] instead. Table I reports the performance of ASAPP after running for s under a fixed communication delay of s.
We first note that with preconditioning, ASAPP can afford larger stepsizes (recall that without preconditioning, we had to use a stepsize of in the previous section). This demonstrates the power of preconditioning in countering the poor conditioning of the optimization problem. Furthermore, on all datasets, ASAPP converges and significantly reduces the optimality gap from the initial solution. Furthermore, the small rotation and translation errors (last two columns) with respect to the global minimizer also indicate that the solutions returned by ASAPP are nearoptimal.
Vii Conclusion
We presented ASAPP, the first asynchronous and provably delaytolerant algorithm to solve distributed pose graph optimization and its rankrestricted semidefinite relaxations. ASAPP enables each robot to run its local optimization process at a high rate, without waiting for updates from its peers over the network. Assuming a worstcase bound on the communication delay, we established the global firstorder convergence of ASAPP, and showed the existence of a convergent stepsize whose value depends on the worstcase delay and inherent problem sparsity. When there is no delay, we further showed that this stepsize matches exactly with the corresponding constant in synchronous algorithms. Numerical evaluations on both simulation and realworld datasets confirm the advantages of ASAPP in reducing overall execution time, and demonstrate its resilience against a wide range of communication delay.
Our theoretical study in Section V is based on a worstcase analysis and involves constants such as the maximum delay and Lipschitz constant that are hard to determine in practice. Future work could consider a less conservative strategy (e.g., based on averagecase analysis) and furthermore explicitly estimate these constants. Another open question is conditions under which stronger performance guarantees may hold, i.e., secondorder or global minimum. For synchronous firstorder algorithms, recent works have shown promising results towards this new direction [28, 29].
References
 [1] S. Choudhary, L. Carlone, C. Nieto, J. Rogers, H. I. Christensen, and F. Dellaert, “Distributed mapping with privacy and communication constraints: Lightweight algorithms and objectbased models,” The International Journal of Robotics Research, vol. 36, no. 12, pp. 1286–1311, 2017.
 [2] Y. Tian, K. Khosoussi, and J. P. How, “Blockcoordinate descent on the riemannian staircase for certifiably correct distributed rotation and pose synchronization,” tech. rep., Massachusetts Institute of Technology, 2019.
 [3] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributed computation: numerical methods, vol. 23. Prentice hall Englewood Cliffs, NJ, 1989.
 [4] A. Agarwal and J. C. Duchi, “Distributed delayed stochastic optimization,” Advances in Neural Information Processing Systems (NIPS), 2011.

[5]
F. Niu, B. Recht, C. Re, and S. Wright, “Hogwild: A lockfree approach to parallelizing stochastic gradient descent,”
Advances in Neural Information Processing Systems (NIPS), 2011.  [6] J. Liu and S. J. Wright, “Asynchronous stochastic coordinate descent: Parallelism and convergence properties,” SIAM Journal on Optimization, 2015.
 [7] X. Lian, Y. Huang, Y. Li, and J. Liu, “Asynchronous parallel stochastic gradient for nonconvex optimization,” Advances in Neural Information Processing Systems (NIPS), 2015.
 [8] D. M. Rosen, L. Carlone, A. S. Bandeira, and J. J. Leonard, “Sesync: A certifiably correct algorithm for synchronization over the special euclidean group,” The International Journal of Robotics Research, 2019.
 [9] J. Briales and J. GonzalezJimenez, “Cartansync: Fast and global se(d)synchronization,” IEEE Robotics and Automation Letters, Oct 2017.
 [10] P.A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on matrix manifolds. Princeton University Press, 2009.
 [11] R. Tron and R. Vidal, “Distributed imagebased 3d localization of camera sensor networks,” in Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, 2009.
 [12] R. Tron, Distributed optimization on manifolds for consensus algorithms and camera network localization. The Johns Hopkins University, 2012.
 [13] R. Tron and R. Vidal, “Distributed 3d localization of camera sensor networks from 2d image measurements,” IEEE Transactions on Automatic Control, vol. 59, pp. 3325–3340, Dec 2014.
 [14] R. Tron, J. Thomas, G. Loianno, K. Daniilidis, and V. Kumar, “A distributed optimization framework for localization and formation control: Applications to visionbased measurements,” IEEE Control Systems Magazine, vol. 36, pp. 22–44, Aug 2016.
 [15] J. Knuth and P. Barooah, “Collaborative 3d localization of robots from relative pose measurements using gradient descent on manifolds,” in 2012 IEEE International Conference on Robotics and Automation, 2012.
 [16] S. Choudhary, L. Carlone, H. I. Christensen, and F. Dellaert, “Exactly sparse memory efficient slam using the multiblock alternating direction method of multipliers,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015.
 [17] P. Lajoie, B. Ramtoula, Y. Chang, L. Carlone, and G. Beltrame, “Doorslam: Distributed, online, and outlier resilient slam for robotic teams,” IEEE Robotics and Automation Letters, 2020.
 [18] T. Fan and T. D. Murphey, “Generalized proximal methods for pose graph optimization,” in The International Symposium on Robotics Research, 2019.
 [19] J. Liu, S. J. Wright, C. Ré, V. Bittorf, and S. Sridhar, “An asynchronous parallel stochastic coordinate descent algorithm,” Journal of Machine Learning Research, 2015.
 [20] X. Lian, W. Zhang, C. Zhang, and J. Liu, “Asynchronous decentralized parallel stochastic gradient descent,” in Proceedings of the 35th International Conference on Machine Learning (ICML), 2018.
 [21] L. Cannelli, F. Facchinei, V. Kungurtsev, and G. Scutari, “Asynchronous parallel algorithms for nonconvex optimization,” Mathematical Programming, 2019.
 [22] N. Boumal, “A riemannian lowrank method for optimization over semidefinite matrices with blockdiagonal constraints,” tech. rep., 2015.
 [23] H. Tijms, A First Course in Stochastic Models. John Wiley and Sons, Ltd, 2004.
 [24] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2508–2530, 2006.
 [25] N. Boumal, P.A. Absil, and C. Cartis, “Global rates of convergence for nonconvex optimization on manifolds,” IMA Journal of Numerical Analysis, vol. 39, pp. 1–33, 02 2018.
 [26] W. Huang, P.A. Absil, K. A. Gallivan, and P. Hand, “Roptlib: an objectoriented c++ library for optimization on riemannian manifolds,” tech. rep., Florida State University, 2016.

[27]
M. Quigley, K. Conley, B. P. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros: an opensource robot operating system,” in
ICRA Workshop on Open Source Software, 2009.  [28] C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan, “How to escape saddle points efficiently,” in International Conference on Machine Learning, 2017.
 [29] C. Criscitiello and N. Boumal, “Efficiently escaping saddle points on manifolds,” in Neural Information Processing Systems Conference, 2019.