1 Introduction
Manifold optimization [12, 2] is a class of techniques for solving optimization problems of the form
(1) 
where is a (typically nonlinear and nonconvex) manifold and is a smooth function over . These techniques generally begin with endowing the manifold with a Riemannian structures, which amounts to specifying a smooth family of inner products on the tangent spaces of , with which analogies of differential quantities such as gradient and Hessian can be defined on in parallel with their wellknown counterparts on Euclidean spaces. This geometric perspective enables us to tackle a constrained optimization problem Eq. 1 using methodologies of unconstrained optimization, which becomes particularly beneficial when the constraints (expressed in ) appear highly nonlinear and nonconvex.
The optimization problem Eq. 1 is certainly independent of the choice of Riemannian structures on ; in fact, all critical points of on are metric independent. From a differential geometric perspective, equipping the manifold with a Riemannian structure and studying the critical points of a generic smooth function is highly reminiscent of the classical Morse theory [27, 33], for which the main interest is to understand the topology of the underlying manifold; the topological information needs to be extracted using tools from differential geometry, but is certainly independent of the choice of Riemannian structures. It is thus natural to inquire the influence of different choices of Riemannian metrics on manifold optimization algorithms, which to our knowledge has never been explored in existing literature. This paper stems from our attempts at understanding the dependence of manifold optimization on Riemannian structure. It turns out that most technical tools for optimization on Riemannian manifolds can be extended to a larger class of metric structures on manifolds, namely, semiRiemannian structures. Just as a Riemannian metric is a smooth assignment of inner products to tangent spaces, a semiRiemannian metric smoothly assigns to each tangent space a scalar product, which is a symmetric bilinear form but without the constraint of positive definiteness; our major technical contribution in this paper is an optimization framework built upon the rich differential geometry in such weaker but more general metric structures, of which standard unconstrained optimization on Euclidean spaces and Riemannian manifold optimization are special cases. Though semiRiemannian geometry has attracted generations of mathematical physicists for its effectiveness in providing spacetime model in general relativity [35, 9], to the best of our knowledge, the link with manifold optimization has never been explored.
A different yet strong motivation for investigating optimization problems on semiRiemannian manifolds arises from the Riemannian geometric interpretation of interior point methods [31, 41]. For a twice differentiable and strongly convex function defined over an open convex domain in an Euclidean space, denote by and for the gradient and Hessian of , respectively. The strong convexity of ensures which defines a local inner product by
With respect to this class of new local inner products, which can be interpreted as turning into a Riemannian manifold , the gradient of takes the form
The negative manifold gradient coincides with the descent direction satisfying the Newton’s equation
(2) 
at . In other words, the Newton method, which is second order, can be interpreted as a first order method in the Riemannian setting. Such equivalence between first and second order methods under coordinate transformation is also known in other contexts such as natural gradient descent in information geometry; see [40] and the references therein. Extending this geometric picture beyond the relatively wellunderstood case of strongly convex functions requires understanding optimization on semiRiemannian manifolds as a first step; we expect the theoretical foundation laid out in this paper will shed light upon gaining deeper geometric insights on the convergence of nonconvex optimization algorithms.
The rest of this paper is organized as follows. In Section 2 we provide a brief but selfcontained introduction to Riemannian optimization and semiRiemannian geometry. Section 3 details the algorithmic framework of semiRiemannian optimization, and proposes semiRiemannian analogies of the Riemannian steepest descent and conjugate gradient algorithms; the metric independence of some secondorder algorithms are also investigated. We specialize the general geometric framework to submanifolds in Section 4, in which we characterize the phenomenon (which does not exist in Riemannian geometry) of degeneracy for induced semiRiemannian structures, and identify several (nearly) nondegenerate examples to which our general algorithmic framework applies. We illustrate the utility of the proposed framework with several examples in Section 5 and conclude with Section 6. More examples and some omitted proofs are deferred to the Supplementary Materials.
2 Preliminaries
2.1 Notations
We denote a smooth manifold using or . Lower case letters such as or
will be used to denote vectors or points on a manifold, depending on the context. We write
and for the tangent and cotangent bundles of , respectively. For a fibre bundle , will be used to denote smooth sections of this bundle. Unless otherwise specified, we use or to denote a semiRiemannian metric. For a smooth function , notations and stand for semiRiemannian gradients and Hessians, respectively, when they exist; and will be reserved for Riemannian gradients and Hessians, respectively. More generally, will be used to denote the LeviCivita connection on the semiRiemannian manifold, whiledenotes for the LeviCivita connection on a Riemannian manifold. We denote antisymmetric (i.e. skewsymmetric) matrices and symmetric matrices of size
by with and , respectively. For a vector space , and stands for alternated or symmetrized copies of , respectively.2.2 Riemannian Manifold Optimization
As stated at the beginning of this paper, manifold optimization is a type of nonlinear optimization problems taking the form of Eq. 1. The methodology of Riemannian optimization is to equip the smooth manifold with a Riemannian metric structure, i.e. positive definite bilinear forms on the tangent spaces of that varies smoothly on the manifold [28, 10, 38]. The differentiable structure on facilitates generalizing the concept of differentiable functions from Euclidean spaces to these nonlinear objects; in particular, notions such as gradient and Hessian are available on Riemannian manifolds and play the same role as their Euclidean space counterparts.
The algorithmic framework of Riemannian manifold optimization has been established and investigated in a sequence of works [13, 44, 12, 2]. These algorithms typically builds upon the concepts of gradient, the firstorder differential operator defined by
and Hessian, the covariant derivative of the gradient operator defined by
as well as a retraction from each tangent plane to the manifold such that (1) for all , and (2) the differential map of is identify at . On Riemannian manifolds it is natural to use the exponential mapping as the retraction, but any general map from tangent spaces to the Riemannian manifold suffices; in fact, the only requirement implied by conditions (1) and (2) is that the retraction map coincides with the exponential map up to the first order.
The optimality conditions for unconstrained optimization on Euclidean spaces in terms of gradients and Hessians can be naturally translated into the Riemannian manifold setting:
Proposition 1 ([8], Proposition 1.1)
A local optimum of Problem Eq. 1 satisfies the following necessary conditions:

if is firstorder differentiable;

and if is secondorder differentiable.
Following [8], we call satisfying condition (i) in Proposition 1 a (firstorder) critical point or stationary point, and a point satisfying condition (i) in Proposition 1 a secondorder critical point.
The heart of Riemannian manifold optimization is to transform the nonlinear constrained optimization problem Eq. 1 into an unconstrained problem on the manifold . Following this methodology, classical unconstrained optimization algorithms such as gradient descent, conjugate gradients, Newton’s method, and trust region methods have been generalized to Riemannian manifolds; see [2, Chapter 8]. For instance, the dynamics of the iterates generated by gradient descent algorithm on Riemannian manifolds essentially replaces the descent step with its Riemannian counterpart . Other differential geometric objects such as paralleltransport, Hessian, and curvature render themselves naturally en route to adapting other unconstrained optimization algorithms to the manifold setting. We refer interested readers to [2] for more details.
2.3 SemiRiemannian Geometry
SemiRiemannian geometry differs from Riemannian geometry in that the bilinear form equipped on each tangent space can be indefinite. Classical examples include Lorentzian spaces and De Sitter spaces in general relativity; see e.g. [35, 9]. Although one may think of Riemannian geometry as a special case of semiRiemannian geometry as all Riemannian metric tensors are automatically semiRiemannian, the existence of a semiRiemannian metric with nontrivial index (see definition below) actually imposes additional constraints on the tangent bundle of the manifold and is thus often more restrictive—the tangent bundle should admit a nontrivial splitting into the direct sum of “positive definite” and “negative definite” subbundles. Nevertheless, such metric structures have found vast applications in and beyond understanding the geometry of spacetime, for instance, in the study of the regularity of optimal transport maps [21, 20, 3].
Definition 1
A symmetric bilinear form on a vector space is nondegenerate if
The index of a symmetric bilinear form on is the dimension of the maximum negative definite subspace of ; similarly, we denote for the dimension of the maximum positive definite subspace of . A scalar product on a vector space is a nondegenerate symmetric bilinear form on . The signature of a scalar product on with index is a vector of length with the first entries equaling and the rest of entries equaling . A subspace is said to be nondegenerate if the restriction of the scalar product to is nondegenerate.
The main difference between a scalar product and an inner product is that the former needs not possess positive definiteness. The main issue with this lack of positivity is the consequent lack of a meaningful definition for “orthogonality” — a vector subspace may well be the orthogonal complement of itself: consider for example the subspace spanned by in equipped with a scalar product with signature . The same example illustrates that the property of nondegeneracy is not always inheritable by subspaces. Nonetheless, the following is true:
Lemma 1 (Chapter 2, Lemma 23, [35])
A subspace of a vector space is nondegenerate if and only if .
Definition 2 (SemiRiemannian Manifolds)
A metric tensor on a smooth manifold is a symmetric nondegenerate tensor field on of constant index. A semiRiemannian manifold is a smooth manifold equipped with a metric tensor.
Example 1 (Minkowski Spaces )
Consider the Euclidean space and denote for the by diagonal matrix with the first diagonal entries equaling and the rest entries equaling , where and . For arbitrary , define the bilinear form
It is straightforward to verify that this bilinear form is nondegenerate on , and that such defined is a semiRiemannian manifold. This space is known as the Minkowski space of signature .
Example 2
Consider the vector space of matrices , where and , . Define a bilinear form on by
This bilinear form is nondegenerate on , because for any we have
where
is the identity matrix of size
by, denotes for the Kronecker product, and is the vectorization operator that vertically stacks the columns of a matrix in . The nondegeneracy then follows from Example 1. This example gives rise to a semiRiemannian structure for matrices in .The nondegeneracy of the semiRiemannian metric tensor ensures that most classical constructions on Riemannian manifolds have their analogies on a semiRiemannian manifold. Most fundamentally, the “miracle of Riemannian geometry” — the existence and uniqueness of a canonical connection — is beheld on semiRiemannian manifolds as well. Quoting [35, Theorem 11], on a semiRiemannian manifold there is a unique connection such that
(3) 
and
(4) 
for all . This connection is called the LeviCivita connection of and is characterized by the Koszul formula
(5)  
Geodesics, paralleltransport, and curvature of can be defined via the LeviCivita connection on in an entirely analogous manner as on Riemannian manifolds.
Differential operators can be defined on semiRiemannian manifolds much the same way as on Riemannian manifolds. For any , where is a semiRiemannian manifold, the gradient of , denoted as , is defined by the equality (c.f. [35, Definition 47])
(6) 
The Hessian of can be similarly defined, also similar to the Riemannian case ([35, Definition 48, Lemma 49]), by , or equivalently
(7) 
Since the LeviCivita connection on is torsionfree, is a symmetric tensor field on , i.e.,
One way to compare the semiRiemannian and Riemannian gradients and Hessians, when both metric structures exist on the same smooth manifold, is through their local coordinate expressions. In fact, the local coordinate expressions for the two types (Riemannian/semiRiemannian) of differential operators can be unified as follows. Let be a local coordinate system around an arbitrary point , and denote and for the components of the Riemannian and semiRiemannian metric tensors, respectively; the Christoffel symbols will be denoted as and , respectively. Direct computation reveals
(8)  
Using the music isomorphism induced from the (Riemannian or semiRiemannian) metric, the Hessians can be cast in the form of tensors on as
Remark 1
Notably, for any , if we compute the Hessians and in the corresponding geodesic normal coordinates centered at , Eq. 8 implies that the two Hessians take the same coordinate form since both and vanish at . For instance, has the same geodesics under the Euclidean or Lorentzian metric (straight lines), and the standard coordinate system serves as geodesic normal coordinate system for both metrics; see Example 3. In particular, the notion of geodesic convexity [39, 46] is equivalent for the two different of metrics; this equivalence is not completely trivial by the wellknown first and second order characterization (see e.g. [46, Theorem 5.1] and [46, Theorem 6.1]) since geodesics need not be the same under different metrics.
Proposition 2
On a smooth manifold admitting two different Riemannian or semiRiemannian structures, an optimization problem is geodesic convex with respect to one metric if and only if it is also geodesic convex with respect to another.
Proof
Denote the two metric tensors on as and , respectively. Both and can be Riemannian or semiRiemannian, respectively or simultaneously. For any , let and be the geodesic coordinates around with respect to and , respectively. Denote for the Jacobian of the coordinate transformation between the two normal coordinate systems. The coordinate expressions of a tangent vector in the two normal coordinate systems are linked by (Einstein summation convention adopted)
Therefore
which establishes the desired equivalence.
Example 3 (Gradient and Hessian in Minkowski Spaces)
Consider the Euclidean space . Denote for the by diagonal matrix with the first diagonal entries equaling and the rest diagonal entries equaling . We compute and compare in this example the gradients and Hessians of differentiable functions on . We take the Riemannian metric as the standard Euclidean metric, and the semiRiemannian metric given by . For any , the gradient of is determined by
Furthermore, since in this case the semiRiemannian metric tensor is constant on , the Christoffel symbol vanishes (c.f. [35, Chap 3. Proposition 13 and Lemma 14]), and thus for all , where
By the definition of Hessian, for all we have
from which we deduce the equality . In fact, the equivalence of the two Hessians also follows directly from Remark 1, since the geodesics under the Riemannian and semiRiemannian metrics coincide in this example (see e.g. [35, Chapter 3 Example 25]). In particular, the equivalence between the two types of geodesics and Hessians imply the equivalence of geodesic convexity for the two metrics.
3 SemiRiemannian Optimization Framework
This section introduces the algorithmic framework of semiRiemannian optimization. To begin with, we point out that the first and secondorder necessary conditions for optimality in unconstrained optimization and Riemannian optimization can be directly generalized to semiRiemannian manifolds. We then generalize several Riemannian manifold optimization algorithms to their semiRiemannian counterparts, and illustrate the difference with a few numerical examples. We end this section by showing global and local convergence results for semiRiemannian optimization.
3.1 Optimality Conditions
The following Proposition 3 should be considered as the semiRiemannian analogy of the optimality conditions Proposition 1 .
Proposition 3 (SemiRiemannian First and SecondOrder Necessary Conditions for Optimality)
Let be a semiRiemannian manifold. A local optimum of Problem Eq. 1 satisfies the following necessary conditions:

if is firstorder differentiable;

and if is secondorder differentiable.
Proof

If is a local optimum of Eq. 1, then there exists a local neighborhood of such that for all . Without loss of generality we can assume that is sufficiently small so as to be geodesically convex (see e.g. [10, §3.4]). Denote for a constantspeed geodesic segment connecting to that lies entirely in . The onevariable function admits Taylor expansion
where the last equality used . Letting on , the smoothness of ensures that
which establishes .
The formal similarity between Proposition 3 and Proposition 1 is not entirely surprising. As can be seen from the proofs, both optimality conditions are based on geometric interpretations of the same Taylor expansion; the metrics affect the specific forms of the gradient and Hessian, but the optimality conditions are essentially derived from the Taylor expansions only. Completely parallel to the Riemannian setting, we can also translate the secondorder sufficient conditions [26, §7.3] into the semiRiemannian setting without much difficulty. The proof essentially follows [26, §7.3 Proposition 3], with the Taylor expansion replaced with the expansion along geodesics in Proposition 3 (ii); we omit the proof since it is straightforward, but document the result in Proposition 4 below for future reference. Recall from [26, §7.1] that is a strict relative minimum point of on if there is a local neighborhood of on such that for all .
Proposition 4 (SemiRiemannian SecondOrder Sufficient Conditions)
Let be a second differentiable function on a semiRiemannian manifold , and is a an interior point. If and , then is a strict relative minimum point of .
The formal similarity between the Riemannian and semiRiemannian optimality conditions indicates that it might be possible to transfer many technologies in manifold optimization from the Riemannian to the semiRiemannian setting. For instance, the equivalence of the firstorder necessary condition implies that, in order to search for a firstorder stationary point, on a semiRiemannian manifold we should look for points at which the semiRiemannian gradient vanishes, just like in the Riemannian realm we look for points at which the Riemannian gradient vanishes. However, extra care has to be taken regarding the influence different metric structures have on the induced topology of the underlying manifold. For Riemannian manifolds, it is straightforward to check that the induced topology coincides with the original topology of the underlying manifold (see e.g. [10, Chap 7 Proposition 2.6]), whereas the “topology” induced by a semiRiemannian structure is generally quite pathological — for instance, two distinct points connected by a lightlike geodesic (a geodesic along which all tangent vectors are null vectors (c.f. Definition 3)) has zero distance. An exemplary consequence is that, in search of a firstorder stationary point, we shouldn’t be looking for points at which vanishes since this does not imply .
3.2 Determining the “Steepest Descent Direction”
As long as gradients, Hessians, retractions, and paralleltransports can be properly defined, one might think there exists no essential difficulty in generalizing any Riemannian optimization algorithms to the semiRiemannian setup, with the Riemannian geometric quantities replaced with their semiRiemannian counterparts, mutatis mutandis. It is tempting to apply this methodology to all standard manifold optimization algorithms, including but not limited to firstorder methods such as steepest descent, conjugate gradient descent, and quasiNewton methods, or secondorder methods such as Newton’s method and trust region methods. We discuss in this subsection how to determine a proper descent direction for steepestdescenttype algorithms on a semiRiemannian manifold. Some exemplary first and secondorder methods will be discussed in the next subsection.
As one of the prototypical firstorder optimization algorithms, gradient descent is known for its simplicity yet surprisingly powerful theoretical guarantees under mild technical assumptions. A plausible “SemiRiemannian Gradient Descent” algorithm that naïvely follows the paradigm of Riemannian gradient descent could be designed as simply replacing the Riemannian gradient with the semiRiemannian gradient defined in Eq. 6, as listed in Algorithm 1. Of course, a key step in Algorithm 1 is to determine the descent direction in each iteration. However, while negative gradient is an obvious choice in Riemannian manifold optimization, the “steepest descent direction” is a slightly more subtle notion in semiRiemannian geometry, as will be demonstrated shortly in this section.
A first difficulty with replacing by is that needs not be a descent direction at all: consider, for instance, an illustrative example of optimization in the Minkowski space (Euclidean space equipped with the standard semiRiemannian metric): the first order Taylor expansion at gives for any small
(9) 
but in the semiRiemannian setting the scalar product term may well be negative, unlike the Riemannian case. In order for the value of the objective function to decrease (at least in the first order), we have to pick the descent direction to be either or , whichever makes .
Though the quick fix by replacing with would work generically in many problems of practical interest, a second, and more serious issue with choosing as the descent direction lies inherently at the indefiniteness of the metric tensor. For standard gradient descent algorithms (e.g. on Euclidean spaces with standard metric, or more generally on Riemannian manifolds), the algorithm terminates after becomes smaller than a predefined threshold; for norms induced from positive definite metric tensors, is equivalent to characterizing , implying that the sequence is truly approaching a first order stationary point. This intuition breaks down for indefinite metric tensors as no longer implies the proximity between and . Even though one can fix this illdefined termination condition by introducing an auxiliary Riemannian metric (which always exists on a Riemannian manifold), when is a null vector (i.e. , see Definition 3), the gradient algorithm loses the first order decrease in the objection function value (see Eq. 9); the validity of the algorithm then relies upon secondorder information, with which we lose the benefits of firstorder methods. As a concrete example, consider the unconstrained optimization problem on the Minkowski space equipped with a metric of signature :
Recall from Example 3 that
which is a direction parallel to the isolines of the objective function . Thus the semiRiemannian gradient descent will never decrease the objective function value.
To rectify these issues, it is necessary to revisit the motivating, geometric interpretation of the negative gradient direction as the direction of “steepest descent,” i.e. for any Riemannian manifold and function on differentiable at , we know from vector arithmetic that
(10) 
In the semiRiemannian setting, assuming is equipped with a semiRiemannian metric , we can also set the descent direction leading to the steepest decrease of the objective function value. It is not hard to see that in general
(11) 
In fact, in both versions the search for the “steepest descent direction” is guided by making the directional derivative as negative as possible, but constrained on different unit spheres. The precise relation between the two steepest descent directions is not readily visible, for the two unit spheres could differ drastically in geometry. In fact, for cases in which the unit ball is noncompact, the “steepest descent direction” so defined may not even exist.
Example 4
Consider the optimization problem over the Minkowski space equipped with a metric of signature
At , recall from Example 3 that . Over the unit ball under this Lorentzian metric, the scalar product as . Even worse, since the scalar product approaches , it is not possible to find a descent direction with for some preset threshold .
One way to fix this noncompactness issue is to restrict the candidate tangent vectors in the minimization of to lie in a compact subset of the tangent space . For instance, one can consider the unit sphere in under a Riemannian metric. Comparing the right hand sides of Eq. 10 and Eq. 11, descent directions determined in this manner will be the negative gradient direction under the Riemannian metric, thus in general has nothing to do with the semiRiemannian metric; moreover, if a Riemannian metric has to be defined laboriously in addition to the semiRiemannian one, in principle we can already employ wellestablished, fullyfunctioning Riemannian optimization techniques, thus bypassing the semiRiemannian setup entirely. While this argument might well render firstorder semiRiemannian optimization futile, we emphasize here that one can define steepest descent directions with the aid of “Riemannian structures” that arise naturally from the semiRiemannian structure, and thus there is no need to specify a separate Riemannian structure in parallel to the semiRiemannian one, though this affiliated “Riemannian structure” is highly local.
The key observation here is that one does not need to consistently specify a Riemannian structure over the entire manifold, if the only goal is to find one steepest descent direction in that tangent space — in other words, when we search for the steepest descent direction in the tangent space of a semiRiemannian manifold , it suffices to specify a Riemannian structure locally around , or more extremely, only on the tangent space , in order for the “steepest descent direction” to be welldefined over a compact subset of . These local inner products do not have to “patch together” to give rise to a globally defined Riemannian structure. A very handy way to find local inner products is through the help of geodesic normal coordinates that reduce the local calculation to the Minkowski spaces. For any , there is a normal neighborhood containing such that the exponential map is a diffeomorphism when restricted to , and one can pick an orthonormal basis (with respect to the semiRiemannian metric on ), denoted as , such that , where , , are the Kronecker delta’s, and . Without loss of generality, assume is a semiRiemannian manifold of order , where , and that , . The normal coordinates of any are determined by the coefficients of with respect to the orthonormal basis . It is straightforward (see [35, Proposition 33]) to verify that
where denotes the semiRiemannian metric tensor components and stands for the Christoffel symbols. Under this coordinate system, it is straightforward to verify that the scalar product between tangent vectors can be written as
where and (Einstein’s summation convention implicitly invoked). The local Riemannian structure can thus be defined as
(12) 
Essentially, such a local inner product is defined by imposing orthogonality between positive and negative definite subspaces of and “reversing the sign” of the negative definite component of the scalar product. Making such a modification consistently and smoothly over the entire manifold is certainly subject to topological obstructions; nevertheless, locally (in fact, pointwise) defined Riemannian structures suffice for our purposes, and in practical applications we can simply the workflow by choosing an arbitrary orthonormal basis in the tangent space in place of the geodesic frame. The orthonormalization process, of course, is adapted for the semiRiemannian setting; see [35, Chapter 2, Lemma 24 and Lemma 25] or Algorithm 2. The output set of vectors satisfies
where are the Kronecker symbols, and
. A generic approach which works with high probability is to pick a random linearly independent set of vectors and apply a (pivoted) GramSchmidt orthogonalization process with respect to the indefinite scalar product; see
Algorithm 3.In geodesic normal coordinates, the gradient takes the form
and choosing the steepest descent direction reduces to the problem
of which the optimum is obviously attained at
For the simplicity of statement, we introduce the notation
for , where is an orthonormal basis for the semiRiemannian metric tensor on . Using this notation, the descent direction we will choose can be written as
(13) 
Note that, by [35, Lemma 3.25], with respect to an orthonormal basis we have in general
which is consistent with our previous discussion that the steepest descent direction in the semiRiemannian setting is not in general. Intuitively, the “steepest descent direction” is obtained by reversing signs of components of the gradient that “corresponds to” the negative definite subspace, and then rescale according to the induced Riemannian metric. This leads to the routine Algorithm 4 for finding descent directions.
Remark 2
The definition certainly depends on the choice of the orthonormal basis with respect to the semiRiemannian metric tensor. In other words, if we choose a different orthonormal basis with respect to the same semiRiemannian metric on , the resulting descent direction will also be different. In practical computations, we could precompute an orthonormal basis for all points on the manifold, but that will complicate the proofs for convergence since the amount of descent will be uncomparable to each other across tangent vectors. A compromise is to cover the entire semiRiemannian manifold with a chart consisting of geodesic normal neighborhoods, and extend the definition Eq. 13 from at a single point to over the geodesic normal neighborhood around each point, with the orthonormal basis given by geodesic normal frame fields [35, pp.8485] defined over each normal neighborhood. Under suitable compactness assumptions, this construction essentially defines a Riemannian structure on the semiRiemannian manifold by means of partition of unity and
(14) 
The arbitrariness of the choice of geodesic normal frame fields makes this Riemannian structure noncanonical, but the bilinear form is symmetric and coercive, and can thus be used for performing steepest descent in the semiRiemannian setting.
Remark 3
For Minkowski spaces, it is easy to check that the descent direction output from Algorithm 4 coincides with exactly. In this sense Algorithm 1 can be viewed as a generalization of the Riemannian steepest descent algorithm. In fact, the pointwise construction of positivedefinite scalar products in each tangent space Eq. 12 indicates that the methodology of Riemannian manifold optimization can be carried over to settings with weaker geometric assumptions, namely, when the inner product structure on the tangent spaces need not vary smoothly from point to point. From this perspective, we can also view semiRiemannian optimization as a type of manifold optimization with weaker geometric assumptions.
Remark 4
Algorithm 1 can indeed be viewed as an instance of a more general paradigm of linesearch based optimization on manifolds [42, §3]. Our choice of the descent direction in Algorithm 4 ensures that the objective function value indeed decreases, at least for sufficiently small step size, which further facilitates convergence.
Example 5 (SemiRiemannian Gradient Descent for Minkowski Spaces)
Recall from Example 3 that the semiRiemannian gradient of a differentiable function on Minkowski space is . If we choose the standard canonical basis for , the descent direction produced by Algorithm 4 and needed for Algorithm 1 is
and thus the semiRiemannian gradient descent coincides with the standard gradient descent algorithm on the Euclidean space if the standard orthonormal basis is used at every point of . Of course, if we use a randomly generated orthonormal basis (under the semiRiemannian metric) at each point, the semiRiemannian gradient descent will be drastically different from standard gradient descent on Euclidean spaces; see Section 5.1 for an illustration.
When studying selfconcordant barrier functions for interior point methods, a useful guiding principle is to consider the Riemannian geometry defined by the Hessian of a strictly convex selfconcordant barrier function [31, 11, 41, 32]; in this setting, descent directions produced from Newton’s method can be equivalently viewed as gradients with respect to the Riemannian structure. When the barrier function is nonconvex, however, the Hessians are no longer positive definite, and the Riemannain geometry is replaced with semiRiemannian geometry. It is well known that the direction computed from Newton’s equation Eq. 2 may not always be a descent direction if is not positive definite [48, §3.3], which is consistent with our observation in this subsection that semiRiemannian gradients need not be descent directions in general. In this particular case, our modification Eq. 13 can also be interpreted as a novel variant of the Hessian modification strategy [48, §3.4], as follows. Denote the function under consideration as , where is a connected, closed convex subset with nonempty interior and contains no straight lines. Assume is nondegenerate on , which necessarily implies that is of constant signature on . At any , the negative gradient of with respect to the semiRiemannian metric defined by the Hessian of is , where and stand for the gradient and Hessian of with respect to the Euclidean geometry of . Our proposed modification first finds a matrix satisfying
where is the constant signature of on , and then set
(15) 
which is guaranteed to be a descent direction since
From Eq. 15 it is evident that the semiRiemannian descent direction is obtained from by replacing the inverse Hessian with . This is close to Hessian modification in spirit, but also drastically different from common Hessian modification techniques that adds a correction matrix to the true Hessian ; see [48, §3.4] for more detailed explanation.
3.3 SemiRiemannian Conjugate Gradient
Using the same steepest descent directions and line search strategy, we can also adapt conjugate gradient methods to the semiRiemannian setting. See Algorithm 5 for the algorithm description. Note that in Algorithm 5 we used the PolakRebière formula to determine , but alternatives such as HestenesStiefel or FletcherReeves methods (see e.g. [12, §2.6] or [42]) can be easily adapted to the semiRiemannian setting as well, since none of the major steps in Riemannian conjugate gradient algorithm relies essentially on the positivedefiniteness of the metric tensor, except that the (steepest) descent direction needs to be modified according to Eq. 13. We noticed in practice that PolakRebière and formulae tend to be more robust and efficient than the FletcherReeves formula for the choice of , which is consistent with general observations of nonlinear conjugate gradient methods [48, §5.2].
Remark 5
For Minkowski spaces (including Lorentzian spaces) with the standard orthonormal basis, both steepest descent and conjugate gradient methods coincide with their counterparts on standard Euclidean spaces, since they share identical descent directions, paralleltransports, and Hessians of the objective function.
Remark 6
Algorithm 5 can also be applied to selfconcordant barrier functions for interior point methods, when the objective function is not necessarily strictly convex but has nondegenerate Hessians. In this context, where the semiRiemannian metric tensor is given by the Hessian of the objective function, Algorithm 5 can be viewed as a hybrid of Newton and conjugate gradient methods, in the sense that the “steepest descent directions” are determined by the Newton equations but the actual descent directions are combined using the methodology of conjugate gradient methods. To the best of our knowledge, such a hybrid algorithm has not been investigated in existing literature.
3.4 Metric Independence of Second Order Methods
In this subsection we consider two prototypical secondorder optimization methods on semiRiemannian manifolds, namely, Newton’s method and trust region method. Surprisingly, both methods turn out to produce descent directions that are independent of the choice of scalar products on tangent spaces. We give a geometric interpretation of this independence from the perspective of jets in Section 3.4.2.
3.4.1 SemiRiemannian Newton’s Method
As an archetypal secondorder method, Newton’s method on Riemannian manifolds has already been developed in detail in the early literature of Riemannian optimization [2, Chap 6]. The rationale behind Newton’s method is that the first order stationary points of a differentiable function are in onetoone correspondence with the minimum of when the metric is positivedefinite (i.e., when is a Riemannian manifold). Thus by choosing the direction to satisfy the Newton equation we ensure that is a descent direction
and the right hand side is strictly negative as long as . The main difficulty in generalizing this procedure to the semiRiemannian setting is similar with the difficulty we faced in Section 3.2: when the metric is indefinite, has nothing to do with , and thus one can no longer find the stationary points of by minimizing . The approach we’ll adopt to fix this issue is also similar to that in Section 3.2: instead of minimizing , we will focus on the coercive bilinear form .
Let be a local geodesic normal coordinate frame centered at , i.e. for any
Then we have
(16) 
and thus for any tangent vector we have
Comments
There are no comments yet.