# Semi-Riemannian Manifold Optimization

We introduce in this paper a manifold optimization framework that utilizes semi-Riemannian structures on the underlying smooth manifolds. Unlike in Riemannian geometry, where each tangent space is equipped with a positive definite inner product, a semi-Riemannian manifold allows the metric tensor to be indefinite on each tangent space, i.e., possessing both positive and negative definite subspaces; differential geometric objects such as geodesics and parallel-transport can be defined on non-degenerate semi-Riemannian manifolds as well, and can be carefully leveraged to adapt Riemannian optimization algorithms to the semi-Riemannian setting. In particular, we discuss the metric independence of manifold optimization algorithms, and illustrate that the weaker but more general semi-Riemannian geometry often suffices for the purpose of optimizing smooth functions on smooth manifolds in practice.

## Authors

• 12 publications
• 12 publications
• 4 publications
• ### Riemannian optimization on the simplex of positive definite matrices

We discuss optimization-related ingredients for the Riemannian manifold ...
06/25/2019 ∙ by Bamdev Mishra, et al. ∙ 0

• ### Geoopt: Riemannian Optimization in PyTorch

Geoopt is a research-oriented modular open-source package for Riemannian...
05/06/2020 ∙ by Max Kochurov, et al. ∙ 0

• ### Parallel transport in shape analysis: a scalable numerical scheme

The analysis of manifold-valued data requires efficient tools from Riema...
11/23/2017 ∙ by Maxime Louis, et al. ∙ 0

• ### Fat Triangulations and Differential Geometry

We study the differential geometric consequences of our previous result ...
08/17/2011 ∙ by Emil Saucan, et al. ∙ 0

• ### Bayesian Quadrature on Riemannian Data Manifolds

Riemannian manifolds provide a principled way to model nonlinear geometr...
02/12/2021 ∙ by Christian Fröhlich, et al. ∙ 14

• ### Geometric subdivision and multiscale transforms

Any procedure applied to data, and any quantity derived from data, is re...
07/17/2019 ∙ by Johannes Wallner, et al. ∙ 0

• ### A diffusion approach to Stein's method on Riemannian manifolds

We detail an approach to develop Stein's method for bounding integral me...
03/25/2020 ∙ by Huiling Le, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Manifold optimization [12, 2] is a class of techniques for solving optimization problems of the form

 minx∈Mf(x) (1)

where is a (typically nonlinear and nonconvex) manifold and is a smooth function over . These techniques generally begin with endowing the manifold with a Riemannian structures, which amounts to specifying a smooth family of inner products on the tangent spaces of , with which analogies of differential quantities such as gradient and Hessian can be defined on in parallel with their well-known counterparts on Euclidean spaces. This geometric perspective enables us to tackle a constrained optimization problem Eq. 1 using methodologies of unconstrained optimization, which becomes particularly beneficial when the constraints (expressed in ) appear highly nonlinear and nonconvex.

The optimization problem Eq. 1 is certainly independent of the choice of Riemannian structures on ; in fact, all critical points of on are metric independent. From a differential geometric perspective, equipping the manifold with a Riemannian structure and studying the critical points of a generic smooth function is highly reminiscent of the classical Morse theory [27, 33], for which the main interest is to understand the topology of the underlying manifold; the topological information needs to be extracted using tools from differential geometry, but is certainly independent of the choice of Riemannian structures. It is thus natural to inquire the influence of different choices of Riemannian metrics on manifold optimization algorithms, which to our knowledge has never been explored in existing literature. This paper stems from our attempts at understanding the dependence of manifold optimization on Riemannian structure. It turns out that most technical tools for optimization on Riemannian manifolds can be extended to a larger class of metric structures on manifolds, namely, semi-Riemannian structures. Just as a Riemannian metric is a smooth assignment of inner products to tangent spaces, a semi-Riemannian metric smoothly assigns to each tangent space a scalar product, which is a symmetric bilinear form but without the constraint of positive definiteness; our major technical contribution in this paper is an optimization framework built upon the rich differential geometry in such weaker but more general metric structures, of which standard unconstrained optimization on Euclidean spaces and Riemannian manifold optimization are special cases. Though semi-Riemannian geometry has attracted generations of mathematical physicists for its effectiveness in providing space-time model in general relativity [35, 9], to the best of our knowledge, the link with manifold optimization has never been explored.

A different yet strong motivation for investigating optimization problems on semi-Riemannian manifolds arises from the Riemannian geometric interpretation of interior point methods [31, 41]. For a twice differentiable and strongly convex function defined over an open convex domain in an Euclidean space, denote by and for the gradient and Hessian of , respectively. The strong convexity of ensures which defines a local inner product by

 gx(v,w):=v⊤[∇2f(x)]w,∀v,w∈TxQ.

With respect to this class of new local inner products, which can be interpreted as turning into a Riemannian manifold , the gradient of takes the form

 ~∇f(x)=[∇f(x)]−1∇f(x).

The negative manifold gradient coincides with the descent direction satisfying the Newton’s equation

 [∇2f(x)]ηx=−∇f(x) (2)

at . In other words, the Newton method, which is second order, can be interpreted as a first order method in the Riemannian setting. Such equivalence between first and second order methods under coordinate transformation is also known in other contexts such as natural gradient descent in information geometry; see [40] and the references therein. Extending this geometric picture beyond the relatively well-understood case of strongly convex functions requires understanding optimization on semi-Riemannian manifolds as a first step; we expect the theoretical foundation laid out in this paper will shed light upon gaining deeper geometric insights on the convergence of non-convex optimization algorithms.

The rest of this paper is organized as follows. In Section 2 we provide a brief but self-contained introduction to Riemannian optimization and semi-Riemannian geometry. Section 3 details the algorithmic framework of semi-Riemannian optimization, and proposes semi-Riemannian analogies of the Riemannian steepest descent and conjugate gradient algorithms; the metric independence of some second-order algorithms are also investigated. We specialize the general geometric framework to submanifolds in Section 4, in which we characterize the phenomenon (which does not exist in Riemannian geometry) of degeneracy for induced semi-Riemannian structures, and identify several (nearly) non-degenerate examples to which our general algorithmic framework applies. We illustrate the utility of the proposed framework with several examples in Section 5 and conclude with Section 6. More examples and some omitted proofs are deferred to the Supplementary Materials.

## 2 Preliminaries

### 2.1 Notations

We denote a smooth manifold using or . Lower case letters such as or

will be used to denote vectors or points on a manifold, depending on the context. We write

and for the tangent and cotangent bundles of , respectively. For a fibre bundle , will be used to denote smooth sections of this bundle. Unless otherwise specified, we use or to denote a semi-Riemannian metric. For a smooth function , notations and stand for semi-Riemannian gradients and Hessians, respectively, when they exist; and will be reserved for Riemannian gradients and Hessians, respectively. More generally, will be used to denote the Levi-Civita connection on the semi-Riemannian manifold, while

denotes for the Levi-Civita connection on a Riemannian manifold. We denote anti-symmetric (i.e. skew-symmetric) matrices and symmetric matrices of size

-by- with and , respectively. For a vector space , and stands for alternated or symmetrized copies of , respectively.

### 2.2 Riemannian Manifold Optimization

As stated at the beginning of this paper, manifold optimization is a type of nonlinear optimization problems taking the form of Eq. 1. The methodology of Riemannian optimization is to equip the smooth manifold with a Riemannian metric structure, i.e. positive definite bilinear forms on the tangent spaces of that varies smoothly on the manifold [28, 10, 38]. The differentiable structure on facilitates generalizing the concept of differentiable functions from Euclidean spaces to these nonlinear objects; in particular, notions such as gradient and Hessian are available on Riemannian manifolds and play the same role as their Euclidean space counterparts.

The algorithmic framework of Riemannian manifold optimization has been established and investigated in a sequence of works [13, 44, 12, 2]. These algorithms typically builds upon the concepts of gradient, the first-order differential operator defined by

 ⟨∇f(x),X⟩=Xf(x)∀X∈TxM,

and Hessian, the covariant derivative of the gradient operator defined by

 ∇2f(X,Y)=XYf−(∇XY)f∀X,Y∈Γ(TM)

as well as a retraction from each tangent plane to the manifold such that (1) for all , and (2) the differential map of is identify at . On Riemannian manifolds it is natural to use the exponential mapping as the retraction, but any general map from tangent spaces to the Riemannian manifold suffices; in fact, the only requirement implied by conditions (1) and (2) is that the retraction map coincides with the exponential map up to the first order.

The optimality conditions for unconstrained optimization on Euclidean spaces in terms of gradients and Hessians can be naturally translated into the Riemannian manifold setting:

###### Proposition 1 ([8], Proposition 1.1)

A local optimum of Problem Eq. 1 satisfies the following necessary conditions:

1. if is first-order differentiable;

2. and if is second-order differentiable.

Following [8], we call satisfying condition (i) in Proposition 1 a (first-order) critical point or stationary point, and a point satisfying condition (i) in Proposition 1 a second-order critical point.

The heart of Riemannian manifold optimization is to transform the nonlinear constrained optimization problem Eq. 1 into an unconstrained problem on the manifold . Following this methodology, classical unconstrained optimization algorithms such as gradient descent, conjugate gradients, Newton’s method, and trust region methods have been generalized to Riemannian manifolds; see [2, Chapter 8]. For instance, the dynamics of the iterates generated by gradient descent algorithm on Riemannian manifolds essentially replaces the descent step with its Riemannian counterpart . Other differential geometric objects such as parallel-transport, Hessian, and curvature render themselves naturally en route to adapting other unconstrained optimization algorithms to the manifold setting. We refer interested readers to [2] for more details.

### 2.3 Semi-Riemannian Geometry

Semi-Riemannian geometry differs from Riemannian geometry in that the bilinear form equipped on each tangent space can be indefinite. Classical examples include Lorentzian spaces and De Sitter spaces in general relativity; see e.g. [35, 9]. Although one may think of Riemannian geometry as a special case of semi-Riemannian geometry as all Riemannian metric tensors are automatically semi-Riemannian, the existence of a semi-Riemannian metric with nontrivial index (see definition below) actually imposes additional constraints on the tangent bundle of the manifold and is thus often more restrictive—the tangent bundle should admit a non-trivial splitting into the direct sum of “positive definite” and “negative definite” sub-bundles. Nevertheless, such metric structures have found vast applications in and beyond understanding the geometry of spacetime, for instance, in the study of the regularity of optimal transport maps [21, 20, 3].

###### Definition 1

A symmetric bilinear form on a vector space is non-degenerate if

 ⟨v,w⟩=0for allw∈V⇔v=0.

The index of a symmetric bilinear form on is the dimension of the maximum negative definite subspace of ; similarly, we denote for the dimension of the maximum positive definite subspace of . A scalar product on a vector space is a non-degenerate symmetric bilinear form on . The signature of a scalar product on with index is a vector of length with the first entries equaling and the rest of entries equaling . A subspace is said to be non-degenerate if the restriction of the scalar product to is non-degenerate.

The main difference between a scalar product and an inner product is that the former needs not possess positive definiteness. The main issue with this lack of positivity is the consequent lack of a meaningful definition for “orthogonality” — a vector subspace may well be the orthogonal complement of itself: consider for example the subspace spanned by in equipped with a scalar product with signature . The same example illustrates that the property of non-degeneracy is not always inheritable by subspaces. Nonetheless, the following is true:

###### Lemma 1 (Chapter 2, Lemma 23, [35])

A subspace of a vector space is non-degenerate if and only if .

###### Definition 2 (Semi-Riemannian Manifolds)

A metric tensor on a smooth manifold is a symmetric non-degenerate tensor field on of constant index. A semi-Riemannian manifold is a smooth manifold equipped with a metric tensor.

###### Example 1 (Minkowski Spaces Rp,q)

Consider the Euclidean space and denote for the -by- diagonal matrix with the first diagonal entries equaling and the rest entries equaling , where and . For arbitrary , define the bilinear form

 ⟨u,v⟩:=u⊤Ip,qw.

It is straightforward to verify that this bilinear form is nondegenerate on , and that such defined is a semi-Riemannian manifold. This space is known as the Minkowski space of signature .

###### Example 2

Consider the vector space of matrices , where and , . Define a bilinear form on by

 ⟨A,B⟩:=Tr(A⊤Ip,qB),∀A,B∈Rn×n.

This bilinear form is non-degenerate on , because for any we have

 Tr(A⊤Ip,qB)=vec(A)⊤(In⊗Ip,q)vec(B)

where

is the identity matrix of size

-by-, denotes for the Kronecker product, and is the vectorization operator that vertically stacks the columns of a matrix in . The non-degeneracy then follows from Example 1. This example gives rise to a semi-Riemannian structure for matrices in .

The non-degeneracy of the semi-Riemannian metric tensor ensures that most classical constructions on Riemannian manifolds have their analogies on a semi-Riemannian manifold. Most fundamentally, the “miracle of Riemannian geometry” — the existence and uniqueness of a canonical connection — is beheld on semi-Riemannian manifolds as well. Quoting [35, Theorem 11], on a semi-Riemannian manifold there is a unique connection such that

 [V,W]=DVW−DWV (3)

and

 X⟨V,W⟩=⟨DXV,W⟩+⟨V,DXW⟩ (4)

for all . This connection is called the Levi-Civita connection of and is characterized by the Koszul formula

 2⟨DVW,X⟩= V⟨W,X⟩+W⟨X,V⟩−X⟨V,W⟩ (5) −⟨V,[W,X]⟩+⟨W,[X,V]⟩+⟨X,[V,W]⟩∀X,V,W∈Γ(M,TM).

Geodesics, parallel-transport, and curvature of can be defined via the Levi-Civita connection on in an entirely analogous manner as on Riemannian manifolds.

Differential operators can be defined on semi-Riemannian manifolds much the same way as on Riemannian manifolds. For any , where is a semi-Riemannian manifold, the gradient of , denoted as , is defined by the equality (c.f. [35, Definition 47])

 ⟨Df,X⟩=Xf,∀X∈Γ(M,TM). (6)

The Hessian of can be similarly defined, also similar to the Riemannian case ([35, Definition 48, Lemma 49]), by , or equivalently

 (7)

Since the Levi-Civita connection on is torsion-free, is a symmetric tensor field on , i.e.,

 D2f(X,Y)=D2f(Y,X),∀X,Y∈Γ(M,TM).

One way to compare the semi-Riemannian and Riemannian gradients and Hessians, when both metric structures exist on the same smooth manifold, is through their local coordinate expressions. In fact, the local coordinate expressions for the two types (Riemannian/semi-Riemannian) of differential operators can be unified as follows. Let be a local coordinate system around an arbitrary point , and denote and for the components of the Riemannian and semi-Riemannian metric tensors, respectively; the Christoffel symbols will be denoted as and , respectively. Direct computation reveals

 ∇f=gij∂jf∂i,∇2f=(∂2ijf−gΓkij∂kf)dxi⊗dxj, (8) Df=hij∂jf∂i,D2f=(∂2ijf−hΓkij∂kf)dxi⊗dxj.

Using the music isomorphism induced from the (Riemannian or semi-Riemannian) metric, the Hessians can be cast in the form of -tensors on as

 (∇2f)♯ =giℓgjm(∂2ijf−gΓkij∂kf)∂i⊗∂m, (D2f)♯ =hiℓhjm(∂2ijf−hΓkij∂kf)∂i⊗∂m.
###### Remark 1

Notably, for any , if we compute the Hessians and in the corresponding geodesic normal coordinates centered at , Eq. 8 implies that the two Hessians take the same coordinate form since both and vanish at . For instance, has the same geodesics under the Euclidean or Lorentzian metric (straight lines), and the standard coordinate system serves as geodesic normal coordinate system for both metrics; see Example 3. In particular, the notion of geodesic convexity [39, 46] is equivalent for the two different of metrics; this equivalence is not completely trivial by the well-known first and second order characterization (see e.g. [46, Theorem 5.1] and [46, Theorem 6.1]) since geodesics need not be the same under different metrics.

###### Proposition 2

On a smooth manifold admitting two different Riemannian or semi-Riemannian structures, an optimization problem is geodesic convex with respect to one metric if and only if it is also geodesic convex with respect to another.

###### Proof

Denote the two metric tensors on as and , respectively. Both and can be Riemannian or semi-Riemannian, respectively or simultaneously. For any , let and be the geodesic coordinates around with respect to and , respectively. Denote for the Jacobian of the coordinate transformation between the two normal coordinate systems. The coordinate expressions of a tangent vector in the two normal coordinate systems are linked by (Einstein summation convention adopted)

 v=vi∂/∂xi=~vj∂/∂yj⇔vi=~vj∂xi/∂yj.

Therefore

 [∇2f(x)](v,v)≥0∀v∈TxM ⇔ vivj∂2f∂xi∂xj(x)≥0∀v1,⋯,vn∈R ⇔ ~vℓ∂xi∂yℓ~vm∂xj∂ym∂2f∂yi∂yj≥0∀~v1,⋯,~vn∈R ⇔ [D2f(x)](v,v)≥0∀v∈TxM.

which establishes the desired equivalence.

###### Example 3 (Gradient and Hessian in Minkowski Spaces)

Consider the Euclidean space . Denote for the -by- diagonal matrix with the first diagonal entries equaling and the rest diagonal entries equaling . We compute and compare in this example the gradients and Hessians of differentiable functions on . We take the Riemannian metric as the standard Euclidean metric, and the semi-Riemannian metric given by . For any , the gradient of is determined by

 (Df)⊤Ip,qX =Xf=(∇f)⊤X,∀X∈Γ(Rn,Rn) ⇔Df=Ip,q∇f% where ∇f=(∂1f,⋯,∂nf)∈Rn.

Furthermore, since in this case the semi-Riemannian metric tensor is constant on , the Christoffel symbol vanishes (c.f. [35, Chap 3. Proposition 13 and Lemma 14]), and thus for all , where

By the definition of Hessian, for all we have

 D2f(X,Y)=⟨DXDf,Y⟩=Y⊤Ip,q⋅Ip,q(∇2f)X=Y⊤(∇2f)X

from which we deduce the equality . In fact, the equivalence of the two Hessians also follows directly from Remark 1, since the geodesics under the Riemannian and semi-Riemannian metrics coincide in this example (see e.g. [35, Chapter 3 Example 25]). In particular, the equivalence between the two types of geodesics and Hessians imply the equivalence of geodesic convexity for the two metrics.

## 3 Semi-Riemannian Optimization Framework

This section introduces the algorithmic framework of semi-Riemannian optimization. To begin with, we point out that the first- and second-order necessary conditions for optimality in unconstrained optimization and Riemannian optimization can be directly generalized to semi-Riemannian manifolds. We then generalize several Riemannian manifold optimization algorithms to their semi-Riemannian counterparts, and illustrate the difference with a few numerical examples. We end this section by showing global and local convergence results for semi-Riemannian optimization.

### 3.1 Optimality Conditions

The following Proposition 3 should be considered as the semi-Riemannian analogy of the optimality conditions Proposition 1 .

###### Proposition 3 (Semi-Riemannian First- and Second-Order Necessary Conditions for Optimality)

Let be a semi-Riemannian manifold. A local optimum of Problem Eq. 1 satisfies the following necessary conditions:

1. if is first-order differentiable;

2. and if is second-order differentiable.

###### Proof
1. If is a local optimum of Eq. 1, then for any we have , which, by definition Eq. 6 and the non-degeneracy of the semi-Riemannian metric, implies that .

2. If is a local optimum of Eq. 1, then there exists a local neighborhood of such that for all . Without loss of generality we can assume that is sufficiently small so as to be geodesically convex (see e.g. [10, §3.4]). Denote for a constant-speed geodesic segment connecting to that lies entirely in . The one-variable function admits Taylor expansion

 f(y) =f∘γ(1)=f∘γ(0)+(f∘γ)′(0)+12(f∘γ)′′(ξ) =f(x)+⟨Df(x),γ′(0)⟩+12Dγ′(ξ)⟨Df(γ(ξ)),γ′(ξ)⟩ =f(x)+12[D2f(γ(ξ))](γ′(ξ),γ′(ξ))

where the last equality used . Letting on , the smoothness of ensures that

 D2f(x)[V,V]≥0∀V∈TxM

which establishes .

The formal similarity between Proposition 3 and Proposition 1 is not entirely surprising. As can be seen from the proofs, both optimality conditions are based on geometric interpretations of the same Taylor expansion; the metrics affect the specific forms of the gradient and Hessian, but the optimality conditions are essentially derived from the Taylor expansions only. Completely parallel to the Riemannian setting, we can also translate the second-order sufficient conditions [26, §7.3] into the semi-Riemannian setting without much difficulty. The proof essentially follows [26, §7.3 Proposition 3], with the Taylor expansion replaced with the expansion along geodesics in Proposition 3 (ii); we omit the proof since it is straightforward, but document the result in Proposition 4 below for future reference. Recall from [26, §7.1] that is a strict relative minimum point of on if there is a local neighborhood of on such that for all .

###### Proposition 4 (Semi-Riemannian Second-Order Sufficient Conditions)

Let be a second differentiable function on a semi-Riemannian manifold , and is a an interior point. If and , then is a strict relative minimum point of .

The formal similarity between the Riemannian and semi-Riemannian optimality conditions indicates that it might be possible to transfer many technologies in manifold optimization from the Riemannian to the semi-Riemannian setting. For instance, the equivalence of the first-order necessary condition implies that, in order to search for a first-order stationary point, on a semi-Riemannian manifold we should look for points at which the semi-Riemannian gradient vanishes, just like in the Riemannian realm we look for points at which the Riemannian gradient vanishes. However, extra care has to be taken regarding the influence different metric structures have on the induced topology of the underlying manifold. For Riemannian manifolds, it is straightforward to check that the induced topology coincides with the original topology of the underlying manifold (see e.g. [10, Chap 7 Proposition 2.6]), whereas the “topology” induced by a semi-Riemannian structure is generally quite pathological — for instance, two distinct points connected by a light-like geodesic (a geodesic along which all tangent vectors are null vectors (c.f. Definition 3)) has zero distance. An exemplary consequence is that, in search of a first-order stationary point, we shouldn’t be looking for points at which vanishes since this does not imply .

### 3.2 Determining the “Steepest Descent Direction”

As long as gradients, Hessians, retractions, and parallel-transports can be properly defined, one might think there exists no essential difficulty in generalizing any Riemannian optimization algorithms to the semi-Riemannian setup, with the Riemannian geometric quantities replaced with their semi-Riemannian counterparts, mutatis mutandis. It is tempting to apply this methodology to all standard manifold optimization algorithms, including but not limited to first-order methods such as steepest descent, conjugate gradient descent, and quasi-Newton methods, or second-order methods such as Newton’s method and trust region methods. We discuss in this subsection how to determine a proper descent direction for steepest-descent-type algorithms on a semi-Riemannian manifold. Some exemplary first- and second-order methods will be discussed in the next subsection.

As one of the prototypical first-order optimization algorithms, gradient descent is known for its simplicity yet surprisingly powerful theoretical guarantees under mild technical assumptions. A plausible “Semi-Riemannian Gradient Descent” algorithm that naïvely follows the paradigm of Riemannian gradient descent could be designed as simply replacing the Riemannian gradient with the semi-Riemannian gradient defined in Eq. 6, as listed in Algorithm 1. Of course, a key step in Algorithm 1 is to determine the descent direction in each iteration. However, while negative gradient is an obvious choice in Riemannian manifold optimization, the “steepest descent direction” is a slightly more subtle notion in semi-Riemannian geometry, as will be demonstrated shortly in this section.

A first difficulty with replacing by is that needs not be a descent direction at all: consider, for instance, an illustrative example of optimization in the Minkowski space (Euclidean space equipped with the standard semi-Riemannian metric): the first order Taylor expansion at gives for any small

 (9)

but in the semi-Riemannian setting the scalar product term may well be negative, unlike the Riemannian case. In order for the value of the objective function to decrease (at least in the first order), we have to pick the descent direction to be either or , whichever makes .

Though the quick fix by replacing with would work generically in many problems of practical interest, a second, and more serious issue with choosing as the descent direction lies inherently at the indefiniteness of the metric tensor. For standard gradient descent algorithms (e.g. on Euclidean spaces with standard metric, or more generally on Riemannian manifolds), the algorithm terminates after becomes smaller than a predefined threshold; for norms induced from positive definite metric tensors, is equivalent to characterizing , implying that the sequence is truly approaching a first order stationary point. This intuition breaks down for indefinite metric tensors as no longer implies the proximity between and . Even though one can fix this ill-defined termination condition by introducing an auxiliary Riemannian metric (which always exists on a Riemannian manifold), when is a null vector (i.e. , see Definition 3), the gradient algorithm loses the first order decrease in the objection function value (see Eq. 9); the validity of the algorithm then relies upon second-order information, with which we lose the benefits of first-order methods. As a concrete example, consider the unconstrained optimization problem on the Minkowski space equipped with a metric of signature :

 minx,y∈Rf(x,y)=12(x−y)2.

Recall from Example 3 that

 Df(x,y)=I1,1∇f(x,y)=−(x−y)⋅(1,1)⊤

which is a direction parallel to the isolines of the objective function . Thus the semi-Riemannian gradient descent will never decrease the objective function value.

To rectify these issues, it is necessary to revisit the motivating, geometric interpretation of the negative gradient direction as the direction of “steepest descent,” i.e. for any Riemannian manifold and function on differentiable at , we know from vector arithmetic that

 −∇f(x)√g(∇f(x),∇f(x))=argminV∈TxMg(V,V)=1g(V,∇f(x))=argminV∈TxMg(V,V)=1Vf(x). (10)

In the semi-Riemannian setting, assuming is equipped with a semi-Riemannian metric , we can also set the descent direction leading to the steepest decrease of the objective function value. It is not hard to see that in general

 (11)

In fact, in both versions the search for the “steepest descent direction” is guided by making the directional derivative as negative as possible, but constrained on different unit spheres. The precise relation between the two steepest descent directions is not readily visible, for the two unit spheres could differ drastically in geometry. In fact, for cases in which the unit ball is noncompact, the “steepest descent direction” so defined may not even exist.

###### Example 4

Consider the optimization problem over the Minkowski space equipped with a metric of signature

 minx,y∈Rf(x,y)=12[x2+(y+1)2].

At , recall from Example 3 that . Over the unit ball under this Lorentzian metric, the scalar product as . Even worse, since the scalar product approaches , it is not possible to find a descent direction with for some pre-set threshold .

One way to fix this non-compactness issue is to restrict the candidate tangent vectors in the minimization of to lie in a compact subset of the tangent space . For instance, one can consider the unit sphere in under a Riemannian metric. Comparing the right hand sides of Eq. 10 and Eq. 11, descent directions determined in this manner will be the negative gradient direction under the Riemannian metric, thus in general has nothing to do with the semi-Riemannian metric; moreover, if a Riemannian metric has to be defined laboriously in addition to the semi-Riemannian one, in principle we can already employ well-established, fully-functioning Riemannian optimization techniques, thus bypassing the semi-Riemannian setup entirely. While this argument might well render first-order semi-Riemannian optimization futile, we emphasize here that one can define steepest descent directions with the aid of “Riemannian structures” that arise naturally from the semi-Riemannian structure, and thus there is no need to specify a separate Riemannian structure in parallel to the semi-Riemannian one, though this affiliated “Riemannian structure” is highly local.

The key observation here is that one does not need to consistently specify a Riemannian structure over the entire manifold, if the only goal is to find one steepest descent direction in that tangent space — in other words, when we search for the steepest descent direction in the tangent space of a semi-Riemannian manifold , it suffices to specify a Riemannian structure locally around , or more extremely, only on the tangent space , in order for the “steepest descent direction” to be well-defined over a compact subset of . These local inner products do not have to “patch together” to give rise to a globally defined Riemannian structure. A very handy way to find local inner products is through the help of geodesic normal coordinates that reduce the local calculation to the Minkowski spaces. For any , there is a normal neighborhood containing such that the exponential map is a diffeomorphism when restricted to , and one can pick an orthonormal basis (with respect to the semi-Riemannian metric on ), denoted as , such that , where , , are the Kronecker delta’s, and . Without loss of generality, assume is a semi-Riemannian manifold of order , where , and that , . The normal coordinates of any are determined by the coefficients of with respect to the orthonormal basis . It is straightforward (see [35, Proposition 33]) to verify that

where denotes the semi-Riemannian metric tensor components and stands for the Christoffel symbols. Under this coordinate system, it is straightforward to verify that the scalar product between tangent vectors can be written as

 ⟨u,v⟩=n∑i=1ϵiuivi

where and (Einstein’s summation convention implicitly invoked). The local Riemannian structure can thus be defined as

 g(u,v)=n∑i=1uivi. (12)

Essentially, such a local inner product is defined by imposing orthogonality between positive and negative definite subspaces of and “reversing the sign” of the negative definite component of the scalar product. Making such a modification consistently and smoothly over the entire manifold is certainly subject to topological obstructions; nevertheless, locally (in fact, pointwise) defined Riemannian structures suffice for our purposes, and in practical applications we can simply the workflow by choosing an arbitrary orthonormal basis in the tangent space in place of the geodesic frame. The orthonormalization process, of course, is adapted for the semi-Riemannian setting; see [35, Chapter 2, Lemma 24 and Lemma 25] or Algorithm 2. The output set of vectors satisfies

 ⟨ei,ej⟩=δijϵi

where are the Kronecker symbols, and

. A generic approach which works with high probability is to pick a random linearly independent set of vectors and apply a (pivoted) Gram-Schmidt orthogonalization process with respect to the indefinite scalar product; see

Algorithm 3.

In geodesic normal coordinates, the gradient takes the form

 Df(x)=n∑i=1ϵi∂if(x)∂i∣∣x

and choosing the steepest descent direction reduces to the problem

 maxv1,⋯,vn∈R(v1)2+⋯+(vn)2=1n∑i=1ϵivi∂if(x)

of which the optimum is obviously attained at

 (v1,⋯,vn)=1n∑i=1(∂if(x))2(ϵ1∂1f(x),⋯,ϵn∂nf(x)).

For the simplicity of statement, we introduce the notation

 [X]+:=n∑i=1⟨X,ei⟩ei

for , where is an orthonormal basis for the semi-Riemannian metric tensor on . Using this notation, the descent direction we will choose can be written as

 (13)

Note that, by [35, Lemma 3.25], with respect to an orthonormal basis we have in general

which is consistent with our previous discussion that the steepest descent direction in the semi-Riemannian setting is not in general. Intuitively, the “steepest descent direction” is obtained by reversing signs of components of the gradient that “corresponds to” the negative definite subspace, and then rescale according to the induced Riemannian metric. This leads to the routine Algorithm 4 for finding descent directions.

###### Remark 2

The definition certainly depends on the choice of the orthonormal basis with respect to the semi-Riemannian metric tensor. In other words, if we choose a different orthonormal basis with respect to the same semi-Riemannian metric on , the resulting descent direction will also be different. In practical computations, we could pre-compute an orthonormal basis for all points on the manifold, but that will complicate the proofs for convergence since the amount of descent will be uncomparable to each other across tangent vectors. A compromise is to cover the entire semi-Riemannian manifold with a chart consisting of geodesic normal neighborhoods, and extend the definition Eq. 13 from at a single point to over the geodesic normal neighborhood around each point, with the orthonormal basis given by geodesic normal frame fields [35, pp.84-85] defined over each normal neighborhood. Under suitable compactness assumptions, this construction essentially defines a Riemannian structure on the semi-Riemannian manifold by means of partition of unity and

 g(X,Y):=⟨X,[Y]+⟩=n∑i=1⟨X,ei⟩⟨Y,ei⟩. (14)

The arbitrariness of the choice of geodesic normal frame fields makes this Riemannian structure non-canonical, but the bilinear form is symmetric and coercive, and can thus be used for performing steepest descent in the semi-Riemannian setting.

###### Remark 3

For Minkowski spaces, it is easy to check that the descent direction output from Algorithm 4 coincides with exactly. In this sense Algorithm 1 can be viewed as a generalization of the Riemannian steepest descent algorithm. In fact, the pointwise construction of positive-definite scalar products in each tangent space Eq. 12 indicates that the methodology of Riemannian manifold optimization can be carried over to settings with weaker geometric assumptions, namely, when the inner product structure on the tangent spaces need not vary smoothly from point to point. From this perspective, we can also view semi-Riemannian optimization as a type of manifold optimization with weaker geometric assumptions.

###### Remark 4

Algorithm 1 can indeed be viewed as an instance of a more general paradigm of line-search based optimization on manifolds [42, §3]. Our choice of the descent direction in Algorithm 4 ensures that the objective function value indeed decreases, at least for sufficiently small step size, which further facilitates convergence.

###### Example 5 (Semi-Riemannian Gradient Descent for Minkowski Spaces)

Recall from Example 3 that the semi-Riemannian gradient of a differentiable function on Minkowski space is . If we choose the standard canonical basis for , the descent direction produced by Algorithm 4 and needed for Algorithm 1 is

 [Df(x)]+=In⋅Ip,q⋅In⋅Ip,q∇f(x)=∇f(x)

and thus the semi-Riemannian gradient descent coincides with the standard gradient descent algorithm on the Euclidean space if the standard orthonormal basis is used at every point of . Of course, if we use a randomly generated orthonormal basis (under the semi-Riemannian metric) at each point, the semi-Riemannian gradient descent will be drastically different from standard gradient descent on Euclidean spaces; see Section 5.1 for an illustration.

When studying self-concordant barrier functions for interior point methods, a useful guiding principle is to consider the Riemannian geometry defined by the Hessian of a strictly convex self-concordant barrier function [31, 11, 41, 32]; in this setting, descent directions produced from Newton’s method can be equivalently viewed as gradients with respect to the Riemannian structure. When the barrier function is non-convex, however, the Hessians are no longer positive definite, and the Riemannain geometry is replaced with semi-Riemannian geometry. It is well known that the direction computed from Newton’s equation Eq. 2 may not always be a descent direction if is not positive definite [48, §3.3], which is consistent with our observation in this subsection that semi-Riemannian gradients need not be descent directions in general. In this particular case, our modification Eq. 13 can also be interpreted as a novel variant of the Hessian modification strategy [48, §3.4], as follows. Denote the function under consideration as , where is a connected, closed convex subset with non-empty interior and contains no straight lines. Assume is non-degenerate on , which necessarily implies that is of constant signature on . At any , the negative gradient of with respect to the semi-Riemannian metric defined by the Hessian of is , where and stand for the gradient and Hessian of with respect to the Euclidean geometry of . Our proposed modification first finds a matrix satisfying

 U⊤[∇2f(x)]U=Ip,q

where is the constant signature of on , and then set

 −[Df(x)]+=−UU⊤[∇2f(x)]Df(x)=−UU⊤∇f(x) (15)

which is guaranteed to be a descent direction since

 −[∇f(x)]⊤[Df(x)]+=−∥U∇f(x)∥2≤0.

From Eq. 15 it is evident that the semi-Riemannian descent direction is obtained from by replacing the inverse Hessian with . This is close to Hessian modification in spirit, but also drastically different from common Hessian modification techniques that adds a correction matrix to the true Hessian ; see [48, §3.4] for more detailed explanation.

Using the same steepest descent directions and line search strategy, we can also adapt conjugate gradient methods to the semi-Riemannian setting. See Algorithm 5 for the algorithm description. Note that in Algorithm 5 we used the Polak-Rebière formula to determine , but alternatives such as Hestenes-Stiefel or Fletcher-Reeves methods (see e.g. [12, §2.6] or [42]) can be easily adapted to the semi-Riemannian setting as well, since none of the major steps in Riemannian conjugate gradient algorithm relies essentially on the positive-definiteness of the metric tensor, except that the (steepest) descent direction needs to be modified according to Eq. 13. We noticed in practice that Polak-Rebière and formulae tend to be more robust and efficient than the Fletcher-Reeves formula for the choice of , which is consistent with general observations of nonlinear conjugate gradient methods [48, §5.2].

###### Remark 5

For Minkowski spaces (including Lorentzian spaces) with the standard orthonormal basis, both steepest descent and conjugate gradient methods coincide with their counterparts on standard Euclidean spaces, since they share identical descent directions, parallel-transports, and Hessians of the objective function.

###### Remark 6

Algorithm 5 can also be applied to self-concordant barrier functions for interior point methods, when the objective function is not necessarily strictly convex but has non-degenerate Hessians. In this context, where the semi-Riemannian metric tensor is given by the Hessian of the objective function, Algorithm 5 can be viewed as a hybrid of Newton and conjugate gradient methods, in the sense that the “steepest descent directions” are determined by the Newton equations but the actual descent directions are combined using the methodology of conjugate gradient methods. To the best of our knowledge, such a hybrid algorithm has not been investigated in existing literature.

### 3.4 Metric Independence of Second Order Methods

In this subsection we consider two prototypical second-order optimization methods on semi-Riemannian manifolds, namely, Newton’s method and trust region method. Surprisingly, both methods turn out to produce descent directions that are independent of the choice of scalar products on tangent spaces. We give a geometric interpretation of this independence from the perspective of jets in Section 3.4.2.

#### 3.4.1 Semi-Riemannian Newton’s Method

As an archetypal second-order method, Newton’s method on Riemannian manifolds has already been developed in detail in the early literature of Riemannian optimization [2, Chap 6]. The rationale behind Newton’s method is that the first order stationary points of a differentiable function are in one-to-one correspondence with the minimum of when the metric is positive-definite (i.e., when is a Riemannian manifold). Thus by choosing the direction to satisfy the Newton equation we ensure that is a descent direction

 V⟨∇f,∇f⟩=2⟨∇V∇f,∇f⟩=−2⟨∇f,∇f⟩=−2∥∇f∥2

and the right hand side is strictly negative as long as . The main difficulty in generalizing this procedure to the semi-Riemannian setting is similar with the difficulty we faced in Section 3.2: when the metric is indefinite, has nothing to do with , and thus one can no longer find the stationary points of by minimizing . The approach we’ll adopt to fix this issue is also similar to that in Section 3.2: instead of minimizing , we will focus on the coercive bilinear form .

Let be a local geodesic normal coordinate frame centered at , i.e. for any

 ⟨Ei(x),Ej(x)⟩=ϵiδij,∇EiEj(x)=0.

Then we have

 (16)

and thus for any tangent vector we have

 V ⟨Df(x),[