# Optimization on manifolds: A symplectic approach

There has been great interest in using tools from dynamical systems and numerical analysis of differential equations to understand and construct new optimization methods. In particular, recently a new paradigm has emerged that applies ideas from mechanics and geometric integration to obtain accelerated optimization methods on Euclidean spaces. This has important consequences given that accelerated methods are the workhorses behind many machine learning applications. In this paper we build upon these advances and propose a framework for dissipative and constrained Hamiltonian systems that is suitable for solving optimization problems on arbitrary smooth manifolds. Importantly, this allows us to leverage the well-established theory of symplectic integration to derive "rate-matching" dissipative integrators. This brings a new perspective to optimization on manifolds whereby convergence guarantees follow by construction from classical arguments in symplectic geometry and backward error analysis. Moreover, we construct two dissipative generalizations of leapfrog that are straightforward to implement: one for Lie groups and homogeneous spaces, that relies on the tractable geodesic flow or a retraction thereof, and the other for constrained submanifolds that is based on a dissipative generalization of the famous RATTLE integrator.

## Authors

• 12 publications
• 13 publications
• 41 publications
• 218 publications
• ### Accelerated Optimization on Riemannian Manifolds via Discrete Constrained Variational Integrators

A variational formulation for accelerated optimization on normed spaces ...
04/15/2021 ∙ by Valentin Duruisseaux, et al. ∙ 0

• ### Accelerated Optimization in the PDE Framework: Formulations for the Manifold of Diffeomorphisms

We consider the problem of optimization of cost functionals on the infin...
04/04/2018 ∙ by Ganesh Sundaramoorthi, et al. ∙ 0

• ### Conformal Symplectic and Relativistic Optimization

Although momentum-based optimization methods have had a remarkable impac...
03/11/2019 ∙ by Guilherme França, et al. ∙ 0

• ### A Discrete Variational Derivation of Accelerated Methods in Optimization

Many of the new developments in machine learning are connected with grad...
06/04/2021 ∙ by Cédric M. Campos, et al. ∙ 0

• ### Hamiltonian Monte Carlo on Symmetric and Homogeneous Spaces via Symplectic Reduction

The Hamiltonian Monte Carlo method generates samples by introducing a me...
03/07/2019 ∙ by Alessandro Barp, et al. ∙ 0

• ### Neural Ordinary Differential Equations on Manifolds

Normalizing flows are a powerful technique for obtaining reparameterizab...
06/11/2020 ∙ by Luca Falorsi, et al. ∙ 0

• ### On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems

We introduce a class of first-order methods for smooth constrained optim...
07/17/2021 ∙ by Michael Muehlebach, et al. ∙ 14

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We are concerned with the optimization problem

 minq∈Qf(q), (1.1)

where is a smooth manifold—called a configuration manifold—and

is smooth. Optimization problems on manifolds have several important applications in machine learning, statistics, and applied mathematics, including maxcut problems, phase retrieval, linear and nonlinear eigenvalue problems, principal component analysis, clustering, and dimensionality reduction, to name a few. In particular, configuration manifolds arise in the context of rank and orthogonality constraints, leading to nonconvex optimization problems. Such problems also appear in statistical physics, e.g., in finding ground states of disordered systems such as spin glasses.

Usually the geometry of is specified by a Riemannian metric that assigns at each point an inner product, , on the tangent space . We are interested in two important classes of manifolds. One is defined through a set of independent constraints, , satisfying

 ψa(q)=0,a=1,…,m. (1.2)

In this case is a -dimensional level set embedded into the higher -dimensional manifold ; i.e., where . See Fig. 1 for an illustration. The geometric properties of , in particular its Riemannian metric, are then induced by —which is usually a Euclidean space although it can be a general smooth manifold. The other main class of manifolds that we are interested in consists of Lie groups and homogeneous spaces where the Riemannian geometry is induced by an inner product on its associated Lie algebra.

We shall adopt a different—and in some sense complementary—approach than that traditionally found in the optimization literature; see, e.g., [AbsilBook, Sra:2016, Pymanot, Ahn20, Alimisis21] and references therein. The standard approach to optimization on manifolds relies on the Riemannian geometry of and employs the geodesic or gradient flows to obtain optimization schemes. Such an approach is valid once a Riemannian metric has been specified and requires approximating the geodesic flow. It is worth noticing, however, that this can only be done for a handful of manifolds with invariant Riemannian metrics. The dynamics and the underlying geodesic computations take place on the tangent bundle .

Geodesic flows can be seen as Hamiltonian flows over a manifold. More generally, the canonical way to define a dynamical system associated to an arbitrary smooth manifold is through its cotangent bundle (see again Fig. 1). The reason is that the cotangent bundle of any smooth manifold is itself a symplectic manifold and any dynamics that preserves its symplectic structure is, at least locally, necessarily a Hamiltonian flow. From this perspective, the natural way to approach problem (1.1) is through a dissipative Hamiltonian dynamics on where plays the role of an external potential; dissipation is necessary so that the phase space contracts to a point that corresponds to a solution of the optimization problem (1.1). Relationships between Hamiltonian systems and optimization algorithms on Euclidean spaces has recently seen an explosion of research; see, e.g., [Wibisono:2016, Franca:2020, Franca:2021, betancourt2018symplectic, muehlebach2021optimization, bravetti2019optimization, Franca:2021b]. In particular, Ref. [Franca:2021] proposes a general framework that transfers analysis and design tools from conservative to dissipative settings. Herein we will further extend this approach to arbitrary manifolds and for systems with constraints. In particular, this will allow us to solve problem (1.1) even when the configuration manifold is only implicitly defined; i.e., we do not even need to know the Riemannian metric of .

It has long been known that finding accurate long-term numerical approximations of a dynamical flow of interest is usually a challenging, if not hopeless, problem. Importantly, it is not always the problem that one should attempt to solve, a realization that led to the theory of geometric integrators. In this theory the goal is to identify critical properties of the dynamics and to preserve these properties under discretization [HairerBook, mclachlan2002splitting, mclachlan2006geometric, leimkuhler2004simulating]. Examples of such desiderata include conservation of energy, measures, or symmetries, decay of a Lyapunov function, and preservation of a symplectic, Poisson or contact structure. The theory of symplectic integrators, and in particular that of variational integrators (i.e., discrete Lagrangian mechanics), plays a prominent role in the simulation of conservative Hamiltonian systems since such integrators not only preserve the symplectic structure and phase space volume but also the energy up to a bounded error (and in many cases they preserve symmetries of the Hamiltonian as well). Such methods are numerically stable and extremely efficient in practice [marsden2001discrete].

Our strategy in this paper involves adapting the theory of symplectic integrators to the dissipative and constrained case via symplectification and splitting procedures. Our point of departure is the framework recently proposed in [Franca:2021], which applies to the Euclidean case and has been shown to be competive with widely used accelerated methods in that setting. We extend the framework to arbitrary smooth manifolds, and thereby to a significantly wider range of potential applications, including problems not only in machine learning and optimization, but also in molecular dynamics, control theory, complex systems, and statistical mechanics more generally. Our approach provides a first principles derivation—based on symplectic geometry and backward error analysis—of optimization methods on manifolds that emulate dissipative Hamiltonian systems in that they closely preserve the rates of convergence and have long-term stability via the preservation of a geometric structure.

We organize our work as follows. In Sec. 2 we provide a streamlined introduction to our algorithms in the general setting of constrained submanifolds and Lie groups; we provide a rigorous treatment in the Appendix. In Sec. 3 we construct the symplectification of dissipative Hamiltonian systems with constraints. We start with the conservative case and add extra dimensions to incorporate constraints and the explicit time dependency until reaching a symplectic manifold. The original system can then be recovered by a gauge fixing. In Sec. 3.4 we present a concrete example involving an arbitrary geodesic equation with dissipation, constraints, and a drift. This model is general enough to capture most cases of interest in practice. In Sec. 4 we discuss the construction of presymplectic integrators and state their most important properties for this dissipative constrained setting. In Sec. 5 we provide numerical experiments that illustrate the feasibility of our approach.

## 2 Algorithms

### 2.1 Submanifolds defined by level sets

Let us introduce a simple and practical algorithm that solves the optimization problem (1.1) subject to constraints (1.2), which define level sets on the ambient space (see Fig. 1). For simplicity, we assume that the Riemannian metric of is constant—this does not represent a limitation since, thanks to Nash or Whitney embedding theorems, the manifold can always be be embedded into for a sufficiently large . Thus, define the projection

 Xg(q)≡∂qψ(q)g−1∂qψ(q)T,Pg(q)≡I−X−1g(q)∂qψ(q)g−1, (2.1)

where denotes the Jacobian of . A specific algorithm that can be derived within our framework is specified as follows:

 pℓ+1/2 =μPg(qℓ)[pℓ−(h/2)∇f(qℓ)], (2.2a) ¯pℓ+1/2 =pℓ+1/2−(hμ/2)∂qψ(qℓ)Tλℓ, (2.2b) qℓ+1 =qℓ+hχg−1¯pℓ+1/2, (2.2c) 0 =ψ(qℓ+1), (2.2d) pℓ+1 =Pg(qℓ+1)[μ¯pℓ+1/2−(h/2)∇f(qℓ+1)], (2.2e)

where is the iteration number, is the discretization step size, is responsible for introducing dissipation, and . (Instead of a constant one can also use an adaptive which would be associated to a time-dependent dissipation term.) Note that is the position coordinates and their conjugate momenta. We make a number of remarks regarding this method:

1. Conveniently, it uses an Euclidean gradient, , instead of parallel transports or geodesic flows (exponential maps) as required by existing optimization approaches on manifolds [Pymanot, AbsilBook].

2. It operates in the cotangent bundle instead of the tangent bundle . Actually, it simulates a dynamical system parametrized on ; hence, it naturally includes “acceleration” from a geometric standpoint.

3. It is designed to always stay on the manifold , which is enforced by the nonlinear algebraic equation (2.2d). In principle, this equation can be solved for any smooth manifold ; note that it does not even require knowing the metric of . This equation can be solved with any nonlinear solver (such as Newton or quasi-Newton) and this is how the Lagrange multipliers ’s are determined.

4. This method is based on a more efficient version of RATTLE [Andersen:1983, Leimkuhler:1994, Reich:1996] that first splits the Hamiltonian and only applies RATTLE to the kinetic energy, thus providing a symplectic and robust approximation of the geodesic flow [leimkuhler2016efficient, graham2019manifold, au2020manifold]; see Appendix C for details. This method can also be easily modified to include a reversibility check as proposed in [lelievre2019hybrid].

5. We have the freedom to choose a constant (positive definite and symmetric) matrix that acts as a preconditioner and may improve convergence in optimization. In particular, for the purposes of optimization it is convenient to implement a rescaling, ; this corresponds to a change of variables that effectively redefines the discretization step size [Franca:2020].

6. When (conservative case) it recovers the method of [leimkuhler2016efficient, graham2019manifold, au2020manifold] which is widely employed in numerical simulations in areas such as molecular dynamics and sampling. Thus, the method (2.2) is a “dissipative” generalization of RATTLE. Moreover, in the absence of constraints it recovers one of the symplectic methods of [Franca:2021] and can be seen as a dissipative generalization of the leapfrog.

7. It is second order accurate and, critically, has a dissipative (shadow) Hamiltonian. As a consequence, it approximately preserves the convergence rates of the underlying dissipative Hamiltonian system.

8. The method was designed to solve optimization problems with nonlinear equality constraints (1.2). However, it is also suitable when some of the functions are replaced by inequality constraints, . In this case, the difficult part of the problem consists precisely in enforcing the boundary since when the minimum lies in the interior the constraints become inactive; i.e., we have an unconstrained optimization problem in the interior region. Therefore, this method can also be easily adapted to solve trust region problems.

### 2.2 Lie groups

Consider a matrix representation of the Lie group —which is the manifold —equipped with a bi-invariant metric. Its Lie algebra

is a vector space of matrices and we let

be a orthogonal basis thereof. The “dissipative leapfrog” integrator on Lie groups that we propose is

 Yℓ+1/2 =μ[Yℓ−(h/2)Tr(∂Xf(Xℓ)⋅Xℓ⋅Ti)Ti], (2.3a) Xℓ+1 =Xℓexp(hg−1χYℓ+1/2), (2.3b) Yℓ+1 =μYℓ+1/2−(h/2)Tr(∂Xf(Xℓ+1)⋅Xℓ+1⋅Ti)Ti, (2.3c)

where , with denoting the entries of the matrix (the dot and denote the matrix product and trace, respectively), and . As before, , , , and now is a constant (not a constant matrix as before); we refer to the Appendix D for the derivation of this method. Note that the “momenta” is an element of the Lie algebra and “” is the matrix exponential, which defines a map from the Lie algebra to the Lie group—this exponential can also be replaced by any symplectic approximation such as a Cayley transform.

Importantly, the above integrator may also be applied to naturally reductive homogeneous spaces, which include, for example, Stiefel and Grassmannian manifolds, the space of positive definite matrices (and their complex analogues), projective spaces, and affine spaces. In that case one simply needs to restrict the momentum to a vector space complementary to the Lie algebra of the isotropy group, as discussed in the context of sampling [barp2019hamiltonian].

## 3 Hamiltonian systems

In this section we construct the geometric formalism behind dissipative Hamiltonian systems subject to constraints and emphasize the symplectification procedure where such systems can be embedded into a higher-dimensional symplectic manifold—this will be important later when considering discretizations. We must assume familiarity with differential geometry and Hamiltonian systems; we thus refer to, e.g., [MarsdenBook, ArnoldBook] for background or the Appendix of [Franca:2021] for a quick review.

###### Definition 3.1 (presymplectic manifold).

Let be a smooth manifold of dimension , , endowed with a 2-form of rank which is closed. Then is a presymplectic manifold and is called a presymplectic form. If then is nondegenerate.

The nondegeneracy condition means that defines an isomorphism between the tangent and cotangent bundles—akin to the musical isomorphisms in pseudo-Riemannian manifolds. Presymplectic manifolds generalize the notion of symplectic manifolds (recovered when

). While symplectic manifolds can exist only in even dimensions, presymplectic manifolds can also exist in odd dimensions, in which case the presymplectic form

is necessarily degenerate and thus noninvertible. As we will discuss in a moment, the phase space of a conservative Hamiltonian system is a symplectic manifold while the phase space of a constrained—as well as a dissipative—Hamiltonian system is a presymplectic manifold; e.g., in Fig.

1, is a symplectic manifold but is a presymplectic manifold. Elegant numerical methods such as symplectic integrators were developed for conservative systems. Their most important properties—preservation of symplectic form, long-term stability, and near energy conservation—rely on the existence of a conservation law. This clearly breaks down in a dissipative setting. However, as recently demonstrated [Franca:2021], if one is able to “symplectify” the system then it is possible to extend these results to nonconservative settings. Here we will further extend such symplectification procedures to constrained cases.

### 3.1 Conservative and constrained

The Dirac-Bergmann theory of constrained Hamiltonian systems [Dirac:1950, Bergman:1954, Dirac] was developed to canonically quantize gauge theories. In this framework, the so-called primary constraints have the form . However, for the purposes of optimization it is natural to consider constraints only on the position coordinates (1.2). Such constraints are imposed by adding Lagrange multipliers to the original Lagrangian or Hamiltonian. In what follows we shall assume we have an embedding, , and assume that the Hamiltonian of interest is obtained as the restriction of a function that is also denoted by . Thus, the Hamiltonian motion on is defined as a constrained motion on that, as we shall see below, takes the form of a differential-algebraic equation. This assumption essentially means that the function to be minimized on is given as a function on the -dimensional manifold . Furthermore, we can consider a higher-dimensional space

that includes the Lagrange multipliers as degrees of freedom. Thus, on

, we consider the extended Hamiltonian

 ¯H(q,p,λ,π)≡H(q,p)+λaψa(q), (3.1)

where are local coordinates on (which the reader may think of as ). Note that denote the Lagrange multipliers and we have introduced their conjugate momenta —we are using Einstein’s summation convention. We have the standard Hamilton’s equations on with Hamiltonian . On the constraint surface the corresponding equations of motion are

 ˙qi=∂H∂pi,˙pi=−∂H∂qi−λa∂ψa∂qi,˙λa=0,˙πa=ψa(q)=0, (3.2)

which is a differential-algebraic equation on . This defines the motion on the submanifold . Moreover, on the dynamics is described by an ordinary differential equation that is obtained by solving for the Lagrange multipliers, i.e., substituting explicitly into (3.2). Geometrically, the motion can be derived via the degenerate Hamilton’s equations of the presymplectic 2-form , obtained by restricting the symplectic form of [marsden2001discrete]. This shows that the phase space of a conservative and constrained Hamiltonian system is a presymplectic manifold (Definition 3.1); we note that this fact has already been emphasized in [Gotay:1978, McLachlan:2014].

### 3.2 Nonconservative

Nonconservative Hamiltonian systems were discussed in [Franca:2021] and this is the setting of interest for optimization. Such systems can be described by a time-dependent Hamiltonian, . Hamilton’s equations have the standard form, however the conservation law is now replaced by

 dHdt=∂H∂t, (3.3)

which captures the phenomenon of dissipation. Since there are no constraints, (see Fig. 1). The dynamics of these systems can be analyzed by embedding into a higher-dimensional symplectic manifold, , where . Thus, let us promote time to a new coordinate, , and also introduce its conjugate momentum . We thus have the Hamiltonian

 H(q,p)=H(q0,q1,…,qn,p0,p1,…,pn). (3.4)

Without fixing any degrees of freedom we have Hamilton’s equations given by

 dqμds=∂H∂pμ,dpμds=−∂H∂qμ, (3.5)

where and the time evolution is now parametrized by . The phase space of this system is the cotangent bundle . The symplectic form is and is a symplectic manifold. Requiring that the vector field matches the vector field of the original nonconservative system yields [Franca:2021]

 H(q0,…,qn,p0,…,pn)≡p0+H(q0,q1…,qn,p1,…,pn), (3.6)

with and (the latter is the Hamiltonian as a function of time, i.e., with the actual trajectories replaced). Thus, Eq. (3.6) represents an embedded hypersurface in which is the phase space of the nonconservative system. Denoting this hypersurface by we thus have , i.e., the symplectic form now has rank over a manifold of dimension so it is degenerate. We conclude that, as in the conservative constrained case, the phase space of a dissipative Hamiltonian system is a presymplectic manifold (Definition 3.1).

### 3.3 Nonconservative and constrained

We now consider the situation where we have an explicit time-dependent Hamiltonian subject to constraints. Combining the previous cases we introduce the extended Hamiltonian

 ¯H(q,p,λ,π)≡H(q,p)+λa¯ψa(q), (3.7)

with being the Hamiltonian defined in (3.4) which is a function on (recall that ). Besides the time let us also append the Lagrange multipliers to the base manifold, i.e., we define . Note that in the absence of constraints we have , recovering Sec. 3.2, while in the conservative case , recovering Sec. 3.1. Hence the Hamiltonian absorbs both the explicit time dependency of the nonconservative system and the constraints as degrees of freedom and thus defines a dynamics over the symplectic manifold with given by Eq. (3.9) below. As will become clear in the next section through a concrete example, for Hamiltonians of interest the constraint term has the form

 ¯ψa(q0,q1,…,qn)≡α(q0)ψa(q1,…,qn), (3.8)

where is the original time variable, are the constraints (1.2), and the function accounts for dissipation—note that it does not influence the constraint . Without yet fixing any degrees of freedom, the equations of motion are given by standard Hamilton’s equations (3.5) with , i.e., the dynamics is defined on the symplectic manifold (of dimension ) which carries the symplectic form

 ¯Ω≡dqi∧dpiconservative+dλa∧dπaconstrained+dq0∧dp0dissipative. (3.9)

We now fix some of the degrees of freedom. Enforcing the constraints, i.e., setting , yields

which is a differential-algebraic equation (DAE) on . Further fixing and the system is then projected into the hypersurface (3.6)—which is the phase space —yielding

 dqidt=∂H∂pi,dpidt=−∂H∂qi−α(t)λa∂ψa∂qi,ψa(q)=0, (3.11)

where . This is again a differential-algebraic equation with degrees of freedom which are not all independent. Under mild assumptions (see Eq. (3.15) below) the solution for is unique so that we can finally substitute for the Lagrange multipliers to obtain an ordinary differential equation (ODE) on (of dimension ) associated to the original base manifold .

On the surface where the constraints are satisfied the second term in (3.9) vanishes and so does the third term by the gauge fixing implied by (3.6). Thus, the Hamiltonian system (3.7) on is a symplectification of the original nonconservative and constrained Hamiltonian system on . After removing the spurious degrees of freedom the symplectic form becomes , which is degenerate. Therefore, once again, the phase space is a presymplectic manifold (Definition 3.1). The above symplectification procedure can be summarized as follows (we indicate the respective dimensions of each manifold):

 (3.12)

Note that only is a symplectic manifold—the others are presymplectic manifolds—and from down the system is restricted to the constraint surface. Viewed from the ambient space, the system stays on such a submanifold due to explicit constraints in the equations of motion (which are DAEs). However, the system can also be described by its intrinsic degrees of freedom on , i.e., without reference to Lagrange multipliers or constraint functions (these are ODEs). The motion is uniquely specified provided we can solve for the Lagrange multipliers in terms of .

From the ambient space perspective, the equations of motion (3.11) depend on since there are derivatives of which may not vanish. We thus have the issue of whether the Lagrange multipliers are uniquely determined. Let us show that this is indeed the case when the constraints satisfy a mild condition. Differentiating with respect to time the last equation of (3.11):

These are the so-called hidden constraints. Differentiating once more:

 αλb[∂ψa∂qi∂2H∂pipj∂ψb∂qj]=∂H∂pi∂2ψa∂qi∂qj∂H∂pj+∂ψa∂qi[∂2H∂pi∂qj∂H∂pj+∂2H∂t∂pi−∂2H∂pi∂pj∂H∂qj]. (3.14)

Denoting the Jacobian matrix by we conclude that is uniquely determined as a function of provided the matrix

 ∂qψ(∂ppH)∂qψT (3.15)

is invertible—this is the case when has full rank and the Hessian is invertible in the kernel of . Thus, we can substitute obtained from (3.14) into (3.11) and the resulting system of differential equations has a locally unique solution. It is important to note that the initial conditions must be consistent, i.e., must obey Eq. (3.14), meaning that if the system starts on the constraint surface then it remains on this submanifold for all times.

As a side comment (that will not be needed in the reminder of the paper), note that one can also write (3.11) in Dirac’s framework [Dirac]. Assuming the secondary constraints have been resolved, i.e., included together with the primary constraints, which we now assume to have the more general form ,111Here we use the standard notation where denotes “weak equality” that holds only on the constraint surface. let us define and to write the equations of motion in terms of Poisson brackets:

 dQIds≈{QI,¯H},dPIds≈{PI,¯H}, (3.16)

with (3.7) and (3.8). Writing these equations back to the original coordinates we obtain:

 ˙qi ≈{qi,H}+α(t)λa{qi,ψa}, (3.17a) ˙pi ≈{pi,H}+α(t)λa{pi,ψa}, (3.17b)

where now . These are precisely the same as (3.11), except that the second term of (3.17a) yields an additional due to the more general form of nonholonomic constraints.222Note that it is possible to follow the derivation of algorithm (2.2) in the Appendix and include derivative terms to account for the more general constraints . The Diract bracket assumes the standard form:

 {f,g}D≡{f,g}+∑A,B{f,ψA}CAB{ψB,g}, (3.18)

where runs over the second-class333Recall that is first-class if for all other constraints , i.e., its vector field is everywhere tangent to the constraint surface (they generate gauge transformations). Otherwise, is second-class and should be removed by the Dirac bracket (3.18); this prescription isolates the physical degrees of freedom up to a gauge. constraints, and the matrix is guaranteed to have an inverse, denoted by —note that (3.18) has the standard form even in this dissipative case thanks to the choice (3.8), i.e., never appears in and thus can be factored outside the Poisson brackets and cancels out.

### 3.4 Dissipative geodesic equation with drift and constraints

Given a function to be minimized, we now construct a concrete Hamiltonian of interest that we shall study in the remaining sections. Let be an -dimensional Riemannian manifold equipped with a metric . As usual, we denote its components by and the components of its inverse by ; we can now use to lower and raise indices and this establishes an isomorphism between and (recall Fig. 1). We consider the Lagrangian of a dissipative and constrained system given by

 ¯L≡eη(t)[12gij(q)˙qi˙qj−f(q)−λaψa(q)], (3.19)

where is a function of time that is responsible for introducing dissipation. When this reduces to the Lagrangian of a conservative system, while when the system is unconstrained. The Euler-Lagrange equations yield

 ¨qi+Γijk˙qj˙qk+γ˙qi=−gij∂f∂qj−gijλa∂ψa∂qj,ψa(q)=0, (3.20)

where and are the Christoffel symbols. This is a generalization of the geodesic equation, i.e., we have introduced dissipation, constraints, and a drift . The standard geodesic equation is recovered when , , and —and this differential-algebraic equation evolves in the tangent bundle .

Let us now move to the cotangent bundle . The canonical momentum is defined as

 pi≡∂¯L∂˙qi=eη(t)gij˙qj, (3.21)

from which we obtain the Hamiltonian:

 ¯H=12e−η(t)gij(q)pipj+eη(t)[f(q)+λaψa(q)]. (3.22)

Note that in (3.8) is simply . We have obtained constrained Hamilton’s equations on :

 ˙qi=e−ηgijpj,˙pi=−e−η2∂gkℓ∂qipkpℓ−eη∂f∂qi−eηλa∂ψa∂qi,ψa(q)=0. (3.23)

One can check that combining these equations yields precisely (3.20). The (intrinsic) phase space of the system is

 T∗Q≡T∗M∣∣ψ=0={(q,p)∈T∗M|ψ(q)=0,∂qψTg−1p=0}. (3.24)

The requirement that is a Riemannian metric automatically satisfies (3.15) (assuming that has full rank).

## 4 “Rate-matching” geometric integrators

We now state general results for geometric integrators of nonconservative and constrained Hamiltonian systems that generalize classical results for symplectic integration of conservative, unconstrained systems. We are thereby able to construct numerical methods for solving problem (1.1) from first principles.

###### Definition 4.1 (presymplectic integrator).

A numerical map (with step size ) is said to be a presymplectic integrator for a constrained and nonconservative Hamiltonian system if it is obtained by reducing (i.e., by gauge fixing) a constrained symplectic integrator with a generating map to its symplectification.

Critically, the requirement that the integrator has a generating map ensures the existence of a globally defined shadow Hamiltonian without any assumption on the topology of the manifold. In particular, splitting methods such as leapfrog and variational integrators such as RATTLE do have a generating map given by a discrete Lagrangian [marsden2001discrete].

Concretely, we consider (3.7) which is a conservative system and therefore suitable for the application of a symplectic integrator. We then fix and to project the system onto the hypersurface (3.4), thus recovering the original degrees of freedom of the nonconservative constrained system. Importantly, does not participate in the dynamics since it is only a fixed function of time, i.e., does not couple to the other degrees of freedom and for this reason can be ignored. Moreover, the discretization of is exact. Thus, the only source of discretization error comes from the (constrained) symplectic integrator applied to the higher-dimensional system. As a consequence, the resulting “dissipative” presymplectic integrator has the same order of accuracy as the conservative symplectic integrator.

###### Theorem 4.2.

Consider a presymplectic integrator of order for a constrained and nonconservative Hamiltonian system whose true flow is denoted by . This method preserves the time-varying Hamiltonian up to a bounded error:

 H∘ϕℓℓ=H∘φtℓ+O(hr), (4.1)

for and some constant .

###### Proof.

Consider the symplectification of the Hamiltonian vector field:

 XH=Xp0+XH=∂∂q0−∂H∂q0∂∂p0+XH. (4.2)

The flow of is given by

 dq0ds=1,dp0ds=−∂H∂q0, (4.3)

where denotes the (new) time parameter. The solution of these equations are

 q0(s)=s,p0(s)=−H(s)+H(0)+p0(0)=−H(s)+H(0), (4.4)

provided (which now makes ). The Hamiltonian is determined up to an arbitrary constant and it is convenient to take since then the coordinate simply corresponds to the value of the Hamiltonian at a given instant of time. Note that these equations are still satisfied along the flow of . A presymplectic integrator is, according to Definition 4.1, any symplectic integrator for which are integrated exactly (see also [marthinsen2014geometric, asorey1983generalized]). Let us denote , and . Thus, requiring we have where denotes the true flow of . Because the integrator is symplectic and has a generating map—e.g., it is variational integrator such as RATTLE or leapfrog—we have a globally defined shadow Hamiltonian from which it follows that

 H∘Ψℓh=H∘Φhℓ+O(hr), (4.5)

for and some constant . Defining the projection , , we have . Moreover, where is the true flow of the original dissipative Hamiltonian system on . Hence from (4.5) we conclude:

 p0∘Ψℓh+H∘π∘Ψℓh=p0∘Φkℓ+H∘φhℓ∘π+O(hr). (4.6)

Since is integrated exactly, i.e., , we obtain

 H∘π∘Ψℓh=H∘φhℓ∘π+O(hr) (4.7)

and finally

 H∘ϕℓh=H∘φhℓ+O(hr) (4.8)

since

 ϕℓh∘π=π∘Ψℓh, (4.9)

where we recall that is the presymplectic integrator that acts on . ∎

This extends the main result of [Franca:2021] to constrained cases. In general it is impossible to preserve both the symplectic structure and the Hamiltonian [Ge:1988]. Presymplectic integrators are designed to exactly preserve the presymplectic structure of nonconservative constrained systems; however, they are also guaranteed to preserve the Hamiltonian up to a small error which, importantly, does not grow with time—this is in contrast to most discretizations for which such an error would be unbounded regardless of the order of the integrator.

This argument shows that the value of the dissipative Hamiltonian along the presymplectic integrator stays close to the true value of the Hamiltonian along the suspension vector field, , whose flow yields the true time-dependent mechanics on —note that in the above proof we can replace by any symplectic manifold. Furthermore, from the shadow expansion , there exists a shadow dissipative Hamiltonian given by

 ˜H∘π=H∘π+hH1+h2H2+⋯. (4.10)

In optimization and computer science one is often interested in complexity results. Suppose we have a dissipative system, such as (3.20), where a convergence rate,

 f(q(t))−f⋆=O(R(t)), (4.11)

is known a priori. Here is a decreasing function that describes how fast the system approaches a local minimum . As a direct consequence of Theorem 4.2, presymplectic integrators are guaranteed to nearly preserve such rates on arbitrary Riemannian manifolds. To see this, let be a presymplectic integrator of order . Then there exists a metric such that [Hansen:2011]

 d(ϕℓh(z∙),φtℓ(z∙))≤Cℓhr, (4.12)

where is the initial state, is the simulation time, and we recall that is the true flow of the system. The constant in general grows with . For instance, on it is common to use [HairerBook]

 Cℓ=C(eLϕtℓ−1), (4.13)

where is a Lipschitz constant of the integrator and is an independent constant that does not depend on . Such a bound holds for the majority of methods, even non-structure-preserving ones. Here we assume this rough bound is also true for (4.12) on the manifold . It is now straightforward to show the following.

###### Corollary 4.3.

Consider the dissipative and constrained geodesic equation (3.20)—or equivalently Hamilton’s equations (3.23)—over a Riemannian manifold . Suppose a convergence rate (4.11) is known. Then a presymplectic integrator of order preserves this rate up to

 f∘ϕℓh=f∘φhℓ+O(hre−η(htℓ)), (4.14)

provided , where is a Lipschitz constant of the numerical integrator, and this holds for exponentially large times .

###### Proof.

Consider the Hamiltonian (3.22), i.e., where is the original Hamiltonian. Replacing into (4.1)—note that this already accounts for the constraints—yields

 f∘ϕℓh−f∘φhℓ=e−2η(tℓ)(T∘φhℓ−T∘ϕℓh)+O(e−η(tℓ)hr), (4.15)

where is the kinetic energy. We have a Lipschitz condition:

 ∣∣(T∘φhℓ−T∘ϕℓh)(z∙)∣∣≤LTC(eLϕtℓ−1)hr, (4.16)

where we have used (4.12) with (4.13). Therefore,

 ∣∣f∘ϕℓh(z)−f∘φhℓ(z)∣∣≤e−η(tℓ)hr[LT(C/2)(eLϕtℓ−1)e−η(tℓ)+K]. (4.17)

The result follows since is bounded by assumption. ∎

Thus, provided the system is suitably dampened (choice of ) the continuous-time rates will be closely matched on any Riemannian manifold—the error is exponentially small in (4.14). In the Appendix we also provide an alternative argument to Corollary 4.3 by using the symplectified Hamiltonian. Based on the above result one can establish convergence rates of numerical methods by considering a Lyapunov analysis for the continuous system alone.444We stress that the focus of this paper is not in establishing such continuous-time rates but rather in providing a general framework that ensures that such rates—whatever they are—are closely preserved in discrete time.

### 4.1 Why presymplectic integrators?

Purely from the point of view of optimization, there is no necessary reason for employing a geometric integrator. Indeed, from the proof of Corollary 4.3 we see that the key ingredient relevant to numerical methods is that the geometric integrator preserves the Hamiltonian within a bounded error. This condition can, however, be ensured by other types of discretization as well, e.g., by constructing methods based on energy-preserving integrators. In this case the Hamiltonian would be exactly preserved and the rates would be matched even more directly (i.e., the constant would be absent from inequality (4.17)). Another approach would be to construct methods that exactly preserve a Lyapunov function, assuming one is known. While these approaches would be sufficient to obtain “rate-matching” discretizations, there are other desirable numerical consequences that derive from the use of geometric integrators.

It is important to note that in general it is impossible to preserve both the energy and the symplectic structure [Ge:1988]. Symplectic integrators exactly preserve the former and as a consequence they also nearly preserve the latter. Thus, the geometric integrators we have constructed for the dissipative case inherit the Hamiltonian structure of the system, implying that the numerical trajectories stay on the the manifold of interest; this is particularly important for optimization on manifolds. In contrast, the majority of other discretizations will move away from the manifold and necessitate some type of projection back to the manifold. Unfortunately, such projections can only be computed in very few special cases. Moreover, the constraint formulation we have presented is quite versatile since it allows one to solve problems on manifolds that are only implicitly defined, e.g., manifolds that can be immersed in .

Another advantage of our geometric integrators is that they are simple to implement—they only require standard gradient computations of the objective function and, in the constrained case, additionally require the solution of nonlinear algebraic equations. On the other hand, energy-preserving methods, or methods that preserve a Lyapunov function, are not only difficult to construct but also tend to have complicated and costly implementations (see, e.g., [McLachlan:1999]). For instance, they require solving implicit relations even in the unconstrained case. Another benefit is that geometric integrators tend to be extremely stable and can often operate with larger step sizes compared to other methods; this property is justified by the existence of a shadow Hamiltonian. Finally, from a theoretical standpoint, backward error analysis for geometric integrators tends to be simpler since it admits the same equations of motion—with slightly perturbed terms.

## 5 Numerical experiments

In this section we present the results of numerical experiments. Besides illustrating the feasibility of our methods, we compare to Riemannian gradient descent (RGD) which is a standard baseline for optimization on manifolds [Sra:2016]. RGD is given by

for where is the exponential map, obtained through the geodesic flow on and is only locally defined, and is the Riemannian gradient of at , i.e., where is the metric of . To approximate the exponential map one usually needs to know a projection to the manifold . When considering optimization over Lie groups or homogeneous spaces we can adapt (5.1) into the following form:

 Yℓ =Tr(∂xf(Xℓ)⋅Xℓ⋅Ti)Ti, (5.2a) Xℓ+1 =Xℓexp(−hYℓ), (5.2b)

where are the generators of the Lie algebra .555 For a practical implementation we actually do not need to use the generators but rather a projection to the Lie algebra:

We also employ this formula in our method (2.3).

Let us mention some guidelines used in our experiments:

• We plot the curves for RGD, i.e., (5.1) or (5.2), as a dashed black line. The curves for our methods, (2.2) or (2.3), are shown as solid colored lines. We test a few different values of the momentum factor for each experiment.

• The parameters of each algorithm were tuned with a rough grid search, and in particular we choose the largest step size such that RGD attains its “best” performance. Then we use the same step size for all methods, even though (2.2) and (2.3) were often able to work with larger step sizes compared to RGD.

• We always choose in (2.2) and (2.3)—this was justified in [Franca:2020] and corresponds to a change of variables such that discretizations of first- and second-order dynamics have the same simulation time and therefore the comparison between the associated methods is meaningful.

• For method (2.2) we use a standard root finder routine to solve the nonlinear constraint equation (2.2d); this is done with (damped) Newton’s method.

### 5.1 Spherical spin glass

Consider the 2-spin spherical Sherrington-Kirkpatrick (SSK) model given by the Hamiltonian

 H(σ)=−1NN∑i,j=1Jijσiσj−N∑i=1biσi, (5.3)

where is a vector on the hypersphere , i.e., . The disorder parameters and

are a random matrix and a random vector, respectively (they will be specified shortly). The vector

represents an external magnetic field. The SSK is a continuous approximation of a spin glass and is an important model in the theory of disordered systems, having also many applications ranging from physics to computer science. We want to find ground states,

 minσ∈SN−1H(σ), (5.4)

using our method (2.2)—thus plays the role of and of the variable . This is a particular instance of a quadratically constrained quadratic program, which in general is nonconvex and even NP hard. Problem (5.4) can, however, be solved efficiently despite its nonconvexity. In Fig. 2 we illustrate the performance of algorithm (2.2) in comparison to RGD (5.1) for few different instances of problem (5.4). We see that it significantly outperforms RGD—the constraints are satisfied to high accuracy, .

### 5.2 Frobenius distance minimization on SO(n)

Consider the following matrix distance minimization problem over the rotation group :

 minX∈SO(n)∥A−X∥2F