Impossibility of dimension reduction in the nuclear norm

Let S_1 (the Schatten--von Neumann trace class) denote the Banach space of all compact linear operators T:ℓ_2→ℓ_2 whose nuclear norm T_S_1=∑_j=1^∞σ_j(T) is finite, where {σ_j(T)}_j=1^∞ are the singular values of T. We prove that for arbitrarily large n∈N there exists a subset C⊆S_1 with |C|=n that cannot be embedded with bi-Lipschitz distortion O(1) into any n^o(1)-dimensional linear subspace of S_1. C is not even a O(1)-Lipschitz quotient of any subset of any n^o(1)-dimensional linear subspace of S_1. Thus, S_1 does not admit a dimension reduction result á la Johnson and Lindenstrauss (1984), which complements the work of Harrow, Montanaro and Short (2011) on the limitations of quantum dimension reduction under the assumption that the embedding into low dimensions is a quantum channel. Such a statement was previously known with S_1 replaced by the Banach space ℓ_1 of absolutely summable sequences via the work of Brinkman and Charikar (2003). In fact, the above set C can be taken to be the same set as the one that Brinkman and Charikar considered, viewed as a collection of diagonal matrices in S_1. The challenge is to demonstrate that C cannot be faithfully realized in an arbitrary low-dimensional subspace of S_1, while Brinkman and Charikar obtained such an assertion only for subspaces of S_1 that consist of diagonal operators (i.e., subspaces of ℓ_1). We establish this by proving that the Markov 2-convexity constant of any finite dimensional linear subspace X of S_1 is at most a universal constant multiple of √(dim(X)).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/30/2020

Sufficient Dimension Reduction for Interactions

Dimension reduction lies at the heart of many statistical methods. In re...
11/22/2020

Blade Envelopes Part I: Concept and Methodology

Blades manufactured through flank and point milling will likely exhibit ...
11/08/2018

Nonlinear Dimension Reduction via Outer Bi-Lipschitz Extensions

We introduce and study the notion of an outer bi-Lipschitz extension of ...
09/14/2021

ε-isometric dimension reduction for incompressible subsets of ℓ_p

Fix p∈[1,∞), K∈(0,∞) and a probability measure μ. We prove that for ever...
05/08/2018

Optimal Subspace Estimation Using Overidentifying Vectors via Generalized Method of Moments

Many statistical models seek relationship between variables via subspace...
10/20/2019

Supporting Multi-point Fan Design with Dimension Reduction

Motivated by the idea of turbomachinery active subspace performance maps...
09/12/2016

CompAdaGrad: A Compressed, Complementary, Computationally-Efficient Adaptive Gradient Method

The adaptive gradient online learning method known as AdaGrad has seen w...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The (bi-Lipschitz) distortion of a metric space in a metric space , which is a numerical quantity that is commonly denoted [60] by or simply if the metrics are clear from the context, is the infimum over those for which there exists (an embedding) and (a scaling factor) such that

(1)

When (1) occurs one says that embeds with (bi-Lipschitz) distortion into .

Following [75], a Banach space is said to admit metric dimension reduction if for every , every subset of size embeds with distortion into some linear subspace of of dimension . Formally, given and , denote by the smallest such that for every with there exists a -dimensional linear subspace of into which embeds with distortion . Using this notation, the above terminology can be rephrased to say that admits metric dimension reduction if there exists for which

(2)

The reason why the specific asymptotic behavior in (2) is singled out here is that, based on previous works some of which are described below, it is a recurring bottleneck in several cases of interest. Also, such behavior is what would be needed in relation111Formally, for the purpose of efficient approximate nearest neighbor search one cannot use a dimension reduction statement like (2) as a “black box" without additional information about the low-dimensional embedding itself rather than its mere existence. One would want the embedding to be fast to compute and “data oblivious," as in the classical Johnson–Lindenstrauss lemma [44]. There is no need to give a precise formulation here because the present article is devoted to ruling out any low-dimensional low-distortion embedding whatsoever. to the existence of a nontrivial data structure for approximate nearest neighbor search in , due to the forthcoming work [5].

By fixing any and considering we have . So, the pertinent question is to obtain a bound on that is significantly smaller than . This natural question turns out to be an elusive longstanding goal for all but a few of the classical Banach spaces.

If is a Hilbert space, then (1) holds true, i.e., admits metric dimension reduction. In fact, the influential Johnson–Lindenstrauss lemma [44] asserts the stronger bound222We shall use throughout this article the following (standard) asymptotic notation. Given two quantities , the notations and mean that for some universal constant . The notation stands for . If we need to allow for dependence on certain parameters, we indicate this by subscripts. For example, in the presence of an auxiliary parameter , the notation means that , where is allowed to depend only on , and similarly for the notations and .

(3)

See [44, 1, 51] and [6, 42] for the implicit dependence on in (3) as and , respectively. By [46] there exists a universal constant and a Banach space that is not isomorphic to a Hilbert space for which . In other words, there exist non-Hilbertian Banach spaces that admit metric dimension reduction (even with a stronger logarithmic guarantee).

By [45] there is a universal constant for which (see [63, 64] for simplifications and improvements). By [64] this is sharp up to the value of (see also [76] for a stronger statement and a different proof). Therefore, does not admit metric dimension reduction.

The case is especially important from the perspectives of both pure mathematics and algorithms. Nevertheless, it required substantial effort to even show that, say, one has for some universal constant : This is achieved in the forthcoming work [4]

which obtains the estimate

. The question whether admits metric dimension reduction was open for many years, until it was resolved negatively in [17] by showing that there exists a universal constant such that . Indeed, let be the finite subset that is considered in [17] and suppose that is a -dimensional linear subspace into which embeds with distortion . By [91] (the earlier estimates of [88, 14] suffice here), embeds with distortion into . Hence, embeds with distortion into . This implies that for some universal constant by the main result of [17], which gives the stated lower bound on .

Remarkably, despite major efforts over the past three decades, the above quoted results are the entirety of what is known about metric dimension reduction in Banach spaces in terms of the size of the point set; in particular, no nontrivial upper or lower bounds on are currently known when for any . The purpose of the present article is to increase the repertoire of classical Banach spaces which fail to admit metric dimensionality reduction by one more space. Specifically, we will demonstrate that this is so for the Schatten–von Neumann trace class .

consists of those linear operators for which , where are the singular values of ; see Section 2 below for background (in particular, is a norm [94] which is sometimes called the nuclear norm333Those who prefer to consider the nuclear norm on matrices can do so throughout, since all of our results are equivalent to their matricial counterparts; see Lemma 6 below for a formulation of this (straightforward) statement.). Our main result is the following theorem.

Theorem 1.

There is a universal constant such that for all and .

Since the publication of [17], there was an obvious candidate for an -point subset of that could potentially exhibit the failure of metric dimensionality reduction in . Namely, since is the subspace of diagonal operators in , one could consider the same subset as the one that was used in [17] to rule out metric dimensionality reduction in (see Figure 1 below). The main result of [17] states that this subset does not well-embed into any low-dimensional subspace of all of whose elements are diagonal operators. So, the challenge amounted to strengthening this assertion so as to apply to low-dimensional subspaces of whose elements can be any operator whatsoever.444One should note here that does not admit a bi-Lipschitz embedding into any space, as follows by combining the corresponding linear result of [65, 83] with a classical differentiation argument [12], or directly by using a bi-Lipschitz invariant that is introduced in the forthcoming work [78].

This is exactly what Theorem 1 achieves, i.e., its contribution is not a construction of a new example but rather proving that the natural guess indeed works. The analogue in of the fact that any finite-dimensional subspace of well-embeds into with is not known (see Section 1.3 below for more on this). Our proof of Theorem 1 circumvents this problem about the linear structure of by taking a different route. As we shall soon explain, this proof actually yields a stronger geometric conclusion (which is new even for dimension reduction in ) that does not follow from the approaches that were used in the literature [17, 53, 3, 87] to treat the setting.

is of immense importance to mathematics, statistics and physics; it would be unrealistically ambitious to attempt to describe this here, but we shall now briefly indicate some of the multifaceted uses of in combinatorics and computer science. The nuclear norm of the adjacency matrix of a graph is also called [36] its graph energy; see e.g. the article [80], the monograph [57] and the references therein for many applications which naturally give rise to a variety of algorithmic issues involving nuclear norm computations. The nuclear norm arises in many optimization scenarios, ranging from notable work [18, 19, 85] on matrix completion and other non-convex optimization problems in matrix analysis and numerical linear algebra (e.g. [86, 23]), differential privacy (e.g. [38, 56]

), machine learning (e.g. 

[37]), signal processing (e.g. [16]

), computer vision (e.g. 

[34, 33]), sketching and data streams (e.g. [59, 58]), and quantum computing (e.g. [96, 39]). In terms of direct relevance to dimension reduction, a natural question would be that of approximate nearest neighbor search in . This was posed explicitly in [2] but resisted attempts to devise nontrivial data structures until the forthcoming work [5]. The above cited work [39] on quantum computing is a direct precursor to the present article. Specifically, in [39] the notion of quantum dimension reduction was introduced with the additional requirement that the nuclear norm-preserving embedding into low-dimensions is a quantum channel, and a strong impossibility result was obtained under this assumption (the additional structural information on the embedding makes the older approach of [21] applicable). Theorem 1 (and even more so Theorem 3 below) complements this investigation by ruling out any sufficiently faithful low-dimensional embedding without any further restriction on its structure.

1.1. Quotients of subsets

In what follows, the closed ball of radius centered at a point of a metric space will be denoted . Fix . Following [95, 43, 32, 11], a metric space is said to be an -Lipschitz quotient of a metric space if there is an onto mapping and (a scaling factor) such that

(4)

The second inclusion in (4) is just a rephrasing of the requirement that is Lipschitz, and the first inclusion in (4) means that is “Lipschitzly open.” For Banach spaces (and linear mappings), this definition is the dual of the bi-Lipschitz embedding requirement, i.e., given two Banach space , a linear mapping has distortion , i.e., for all and some , if and only if its adjoint is an -Lipschitz quotient. For general metric spaces, in lieu of duality one directly defines Lipschitz quotients as above.

In accordance with the Ribe program [73, 8], following insights from Banach space theory a natural way to weaken the notion of bi-Lipschitz embedding into a metric space is to study those metric spaces that are a Lipschitz quotient of a subset of . Quantitatively, given a metric space , denote by the infimum over those for which there exists a subset such that is an -Lipschitz quotient of

. The geometric meaning of this concept is elucidated via the following reformulation. Given two nonempty subsets

, denote their minimal distance and Hausdorff distance, respectively, as follows.

The following fact is straightforward to check directly from the definitions; see [66, Lemma 6.1].

Fact 2.

Let and be metric spaces. The quantity is equal to the infimum over those that satisfy the following property. One can assign to every a nonempty subset of such that there exists (a scaling factor) for which

(5)

Clearly because if an embedding satisfies (1), then by considering the singleton for every one obtains a collection of subsets that satisfies (5). Hence, the following impossibility result for dimension reduction is stronger than Theorem 1.

Theorem 3.

There is a universal constant with the following property. For every there is an -point subset such that for every and every linear subspace of ,

1.2. Markov convexity

The subtlety of proving results such as Theorem 3, i.e., those that provide limitations on the structure of subsets of quotients, is that one needs to somehow argue that no representation of using arbitrary subsets of can satisfy (5). Note that can be much smaller than , and qualitatively the class of metric spaces that are Lipschitz quotients of subsets of is typically much richer than the class of metric spaces that admit a bi-Lipschitz embedding into ; a striking example of this is Milman’s Quotient of Subspace Theorem [71] that yields markedly stronger guarantees than the classical Dvoretzky theorem [27, 70] (for the purpose of this comparison, it suffice to consider the earlier work [24] that is weaker by a logarithmic factor, or even the bounds on the quotient of subspace problem in [31]; one could also consider here the nonlinear results on quotients of subsets in [66] in comparison to their “subset” counterparts [10]).

Our proof of Theorem 3 (hence also Theorem 1 as a special case) uses the bi-Lipschitz invariant Markov convexity that was introduced in [54] and was shown in [67] to be preserved under Lipschitz quotients. Let

be a Markov chain on a state space

. Given an integer , denote by the process that equals for time , and evolves independently of

(with respect to the same transition probabilities) for time

. Following [54], the Markov 2-convexity constant of a metric space , denoted , is the infimum over those such that for every Markov chain on a state space and every we have

(6)

Because (6) involves only pairwise distances, for every metric space and any . Also, by [67] if for some a metric space is an -Lipschitz quotient of , then . As in  [67], by combining these facts we see that Markov convexity yields the following obstruction to the existence of Lipschitz quotients from an arbitrary subset of onto , or equivalently the existence of a representation of using subsets of as in (5).

(7)

The following theorem is the key structural contribution of the present article.

Theorem 4.

Every finite-dimensional linear subspace of satisfies .

We deduce Theorem 3 from Theorem 4 using another result of [67] which shows that there exists a sequence of connected series-parallel graphs such that and , where is equipped with its shortest-path metric. The graphs are known as the Laakso graphs [48, 50], and the corresponding metric spaces are even -doubling (see [40] for the notion of doubling metric spaces; we do not need to use it here). These are not the same graphs as the ones that were used in [17] (though in [52] the Laakso graphs were used as another way to rule out metric dimension reduction in ). The graphs of [17] are the diamond graphs (see Figure 1 for a depiction of and ), which are also series-parallel and one can show that the argument of [67] applies mutatis mutandis to yield the same properties for as those that we stated above for (this is carried out in the forthcoming work [30]). In any case, in [35] it was shown that any connected series-parallel graph (equipped with its shortest path metric) embeds into (hence also into ) with distortion . Let be the image of such an embedding of . If is a finite-dimensional linear subspace of , then by combining Theorem 4 with (7) and the fact [67] that , we see that

(8)

This simplifies to give Theorem 3. As we explained above, the same conclusion holds for the images in that arise from an application of [35] to diamond graphs

1.3. Comments on the proof of Theorem 4

Having explained the ingredients of the proof of Theorem 1, we shall end this introduction by commenting on the proof of Theorem 4, stating additional consequences of this proof, and discussing limitations of previous methods in this context.

For denote the subspace of that consists of the matrices by (to be extra formal, one can think of these matrices as the top left corner of infinite matrices corresponding to operators on , with all the entries that are not in that corner vanishing). is a subspace of of dimension , so Theorem 4 implies in this special case that . However, it is much simpler to prove this for than to prove the full statement of Theorem 4 for a general subspace of . Indeed, this property of follows from a combination of results of [82, 9, 54]. The conclusion of Theorem 1 is therefore easier in the special case . As we explained above, since by [88, 14, 91] every finite-dimensional subspace of embeds with distortion into for some , for it suffices to prove the impossibility of metric dimension reduction when the target is the special subspace (as done in [17]) rather than a general subspace. However, it remains open whether or not every finite-dimensional subspace of embeds with distortion into for some . It isn’t clear if it is reasonable to expect that such a phenomenon holds in , because the proofs in [88, 14, 91] rely on (substantial) coordinate sampling arguments that seem to be inherently commutative and without a matricial interpretation.

We prove Theorem 4 by showing directly that for every , any finite-dimensional subspace of embeds into with distortion ; see Theorem 12 below. This distortion is when , so Theorem 4 follows from the fact that for every , which can be shown to hold true by combining results of [82, 9, 54]. The above estimate builds on a structural result of [93] that is akin to an important lemma of Lewis [55] in the commutative setting (namely for an space instead of ), in combination with matricial estimates that constitute the bulk of the technical part of our contribution. As an aside, we provide a substantially simpler proof of a slight variant (that suffices for our purposes) of the aforementioned noncommutative Lewis-like lemma of [93] via a quick variational argument.

Remark 5.

The fact (Theorem 12) that any finite-dimensional linear subspace of embeds with distortion into for yields additional useful information beyond the estimate on of Theorem 4. Indeed, it directly implies that the martingale cotype constant of is at most ; the definition of this invariant, which is due to [84], is recalled in Section 2 below. By [7, 69] this implies that the metric Markov cotype constant (see [7, 69] for the relevant definition) of is , which in turn implies improved extension results for -valued Lipschitz functions using [7] as well as improved estimates for -valued nonlinear spectral calculus using [68]. This also yields several improved Littlewood–Paley estimates [97, 62, 41] for -valued functions and improved quantitative differentiation estimates for such functions [41]. Finally, using [49], it yields improved vertical-versus-horizontal Poincaré inequalities for functions on the Heisenberg group that take values in low-dimensional subspaces of (as an aside, it is natural to recall here the very interesting open question whether the Heisenberg group admits a bi-Lipschitz embedding into ; see [79]). We shall not include detailed statements of these applications here because this would result in an exceedingly long digression, but one should note their availability.

The impossibility result for dimension reduction in was proved in [17]

using linear programming duality. Different proofs were subsequently found in 

[53, 3, 87]. Specifically, the proof in [53] was a geometric argument (see also [52] for a variant of the same idea for the Laakso graph), the proof in [3] was a combinatorial argument (though inspired by the linear programming approach of [17]), and the proof in [87] was an information-theoretical argument. Of these proofs, those of [17, 3, 87] rely on the coordinate structure of and do not seem to extend to the noncommutative setting of . The geometric approach of [53] (and its variants in [52, 47]) is more robust and could be used to deduce the impossibility of dimension reduction into for small , but not to obtain the full strength of Theorem 1. Also, we shall now explain why the method of [53] is inherently unsuited for obtaining the impossibility statement of Theorem 3 for quotients of subsets. To this end, we need to describe the bi-Lipschitz invariant that was used in [53] and is dubbed “diamond convexity” in [30]. The iterative construction of the diamond graphs (see Figure 1) replaces each edge in the ’th stage by a quadrilateral , i.e., are the corresponding new edges in . The pair is called a level- anti-edge and the set of all level- anti-edges is denoted .

Figure 1. The diamond graph (on the right) and the Laakso graph (on the left). Both the diamond graphs and the Laakso graphs are defined iteratively as follows, starting with being a single edge. To pass from to , replace each edge of by two parallel paths of length . To pass from to , subdivide each edge of into a path of length , remove the middle two edges in this path, and replace them by two parallel paths of length .

Following [30], the diamond -convexity constant of a metric space , denoted , is the infimum over those such that for every , every satisfies

With this terminology, the proof of [53] derives an upper bound on and contrasts it with , allowing one to deduce that must be large if is small, similarly to the way we used the Markov -convexity constant in (8). But, working with diamond convexity cannot yield impossibility results for quotients of subsets as in Theorem 3, because in [30] it is shown that there exist metric spaces and such that is a Lipschitz quotient of yet and . In other words, in contrast to Markov convexity, diamond -convexity is not preserved under Lipschitz quotients. Thus, working with Markov -convexity as we do here has an advantage over the approach of [53] by yielding Theorem 3 whose statement is new even for .

Acknowledgements. We thank A. Andoni, R. Krauthgamer and M. Mendel for helpful input.

2. Proof of Theorem 4

We shall start by recording for ease of later reference some basic notation and well-known facts about Schatten–von Neumann trace classes that will be used repeatedly in what follows. The standard material that appears below can be found in many texts, including e.g. [26, 13, 90, 20].

Throughout, the Hilbert space will be over the real555For many purposes it is important to work with complex scalars, and correspondingly complex matrices. However, for the purpose of the ensuing metric results, statements over are equivalent to their complex counterparts. scalar field . The standard scalar product on is . Given a closed linear subspace of , the orthogonal projection onto will be denoted and the orthogonal complement of will be denoted . The group of orthogonal operators on is denoted and the set of compact operators is denoted . The elements of are characterized as those operators

that admit a singular value decomposition, i.e., they can be written as

, where and is a diagonal operator (say, relative to the standard coordinate basis of ) with nonnegative entries that tend to . The diagonal entries of are called the singular values of and their decreasing rearrangement is denoted . Note that and , and so (polar decompositions).

Given and a symmetric positive semidefinite operator , the power is defined via the usual functional calculus, i.e., if is the singular value decomposition of , then , where is obtained from the operator (which we recall is diagonal with nonnegative entries) by raising each of its entries to the power . In what follows, it will also be very convenient to adhere to (and make frequent use of) the following convention for negative powers of symmetric positive semidefinite operators . If the diagonal of is , then let be the diagonal operator whose ’th diagonal entry equals if and equals if . Then, write . Observe that if is invertible in addition to being symmetric and positive semidefinite, then under this convention coincides with the usual inverse of . But, in general we have , where is the kernel of .

An operator is said to be nuclear if . In this case, the trace of is well defined as . Given , the Schatten–von Neumann trace class is the space of all whose singular values are -summable, in which case one defines by

(9)

When the quantity is the operator norm of . If , then is a norm; the (non-immediate) proof of this fact is a classical theorem of von Neumann [94].

The Schatten–von Neumann norms are invariant under the group , i.e., for all ,

(10)

The von Neumann trace inequality [94] (see also [72]) asserts that every satisfy

(11)

This implies in particular that if is positive semidefinite and , then

(12)

Also, by trace duality (see e.g. [20, Theorem 7.1]), the von Neumann inequality (11) implies the Hölder inequality for Schatten–von Neumann norms (see e.g. [13, Corollary IV.2.6]), which asserts that if satisfy , then for every and we have

(13)

For , the above discussion can be repeated mutatis mutandis with the infinite dimensional Hilbert space replaced by the -dimensional Euclidean space . In this setting, we denote the corresponding Schatten–von Neumann matrix spaces by for every . We shall also use the standard notations and . Some of the ensuing arguments are carried out for linear subspaces of rather than for arbitrary finite-dimensional linear subspaces of . We suspect that this restriction is only a matter of convenience and it could be removed (specifically, in Lemma 8 below), but for our purposes it suffices to treat subspaces of by the following simple and standard lemma (a truncation argument). Since we could not locate a clean reference for this statement, we include its straightforward proof in Remark 14 below.

Lemma 6.

Fix and suppose that is a finite-dimensional linear subspace of . For every there exist an integer and a linear operator such that

(14)

For ease of later reference, we shall record the following general lemma. In it, as well as in the subsequent discussion, the usual PSD partial order is denoted by , i.e., given two symmetric bounded operators the notation means that is positive semidefinite.

Lemma 7.

Let be symmetric positive semidefinite operators such that . Then

Proof.

Fix . Then and therefore the classical Löwner theorem [61] (see e.g. [13, Theorem V.1.9]) asserts that the function is operator monotone on . Hence the assumption implies that , i.e.,

(15)

While need not be invertible, we have . Hence, for every we have

Thus for all , which is the desired conclusion. ∎

Our proof of Theorem 4 relies on Lemma 8 below, which is a useful structural result for subspaces of . In fact, we will only need the case of Lemma 8, but we include its proof for general because this does not require additional effort beyond the special case .

Lemma 8 is a noncommutative analogue of an important classical lemma that was proved by Lewis in [55] for (finite-dimensional linear subspaces of) spaces. The Lewis lemma was extended by Tomczak-Jaegermann in [93] in a different manner to both Banach lattices and Schatten–von Neumann classes. In particular, Theorem 2.3 of [93] states a slightly different noncommutative Lewis-type lemma for when (note that this is proved in [93] only when since only that range is needed in [93]). The variant that is stated in [93] would suffice for our purposes as well, but we include a different proof here because the argument of [93] is significantly more sophisticated than the way we proceed below. As an aside, our proof applies also to the range while the proof in [93] does not because it relies inherently on duality. The need to obtain such a result for spaces when arose in [89], where a new proof of the Lewis lemma was obtained so as to be applicable to these values of (Lewis’ argument in [55] also relied on duality, hence requiring ). This generalization turned out to lead to a simpler approach, and our proof below consists of a noncommutative adaptation of the argument of [89].

Lemma 8 (Lewis-type basis for subspaces of ).

Fix and . Let be a linear subspace of with . Then there exists a basis of such that if we define

(16)

then for all , denoting by the Kronecker delta, we have

(17)
Proof.

Fix an arbitrary basis of . For every matrix define

(18)

and

(19)

Since are linearly independent, . It follows that is equivalent to a norm on . Indeed, and one computes directly that is a Hilbertian semi-norm on . Hence, if we define

(20)

then the set is compact. Therefore the continuous mapping attains its maximum on this set, so fix from now on some such that

Because , we will soon explain that is continuously differentiable on a neighborhood of . Therefore there exists (a Lagrange multiplier) such that . A standard formula for the gradient of the determinant (which follows directly from the cofactor expansion) asserts that . We will also soon compute that

(21)

Therefore, the above Lagrange multiplier identity asserts that for all ,

(22)

Fix , multiply (22) by and sum over , thus arriving at

(23)

By summing (23) over while recalling the definition of the matrix in (19), we see that . Since is positive semidefinite, it follows that . Hence, the assertions of Lemma 8 hold true for .

It remains to verify that is continuously differentiable and the identity (21) holds true; this is a standard exercise in spectral calculus which we include for completeness. Consider the subspace

and observe that for every invertible matrix

the definitions (18) and (19) of and , respectively, imply that we also have . So, for all the restrictions of to and satisfy and , respectively (the latter assertion is that is invertible). Fix

that is strictly smaller than the smallest nonzero eigenvalue of

and let be a simply connected open domain that is contained in the complex half-plane and contains all of the nonzero eigenvalues of . By continuity of the mapping and the fact that is invertible, the above reasoning implies that there exists an open neighborhood of such that every is invertible and all of the nonzero eigenvalues of are contained in . Since the function is analytic on , by the Cauchy integral formula (for both this function and its derivative), every satisfies

(24)

and

(25)

where

is the identity matrix. It is important to note that (

25) respects our convention for negative powers of symmetric positive semidefinite matrices that are not invertible, because the nonzero eigenvalues of any such are contained in , and also so that the Cauchy integral vanishes on the kernel . Recalling (19), the (quadratic) mapping is continuously differentiable, and in fact for every