A quadratically convergent iterative scheme for locating conical degeneracies in the spectra of parametric self-adjoint matrices

01/08/2020
by   Gregory Berkolaiko, et al.
0

A simple iterative scheme is proposed for locating the parameter values for which a 2-parameter family of real symmetric matrices has a double eigenvalue. The convergence is proved to be quadratic. An extension of the scheme to complex Hermitian matrices (with 3 parameters) and to location of triple eigenvalues (5 parameters for real symmetric matrices) is also described. Algorithm convergence is illustrated in several examples: a real symmetric family, a complex Hermitian family, a family of matrices with an "avoided crossing" (no covergence) and a 5-parameter family of real symmetric matrices with a triple eigenvalue.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 5

06/14/2021

Optimizing Rayleigh quotient with symmetric constraints and their applications to perturbations of the structured polynomial eigenvalue problem

For a Hermitian matrix H ∈ℂ^n,n and symmetric matrices S_0, S_1,…,S_k ∈ℂ...
11/15/2021

Global Convergence of Hessenberg Shifted QR I: Dynamics

Rapid convergence of the shifted QR algorithm on symmetric matrices was ...
09/03/2019

Inverse problems for symmetric doubly stochastic matrices whose Suleĭmanova spectra are to be bounded below by 1/2

A new sufficient condition for a list of real numbers to be the spectrum...
12/01/2020

A Parallel Direct Eigensolver for Sequences of Hermitian Eigenvalue Problems with No Tridiagonalization

In this paper, a Parallel Direct Eigensolver for Sequences of Hermitian ...
02/04/2020

Coalescing Eigenvalues and Crossing Eigencurves of 1-Parameter Matrix Flows

We investigate the eigenvalue curves of 1-parameter hermitean and genera...
09/08/2017

A Curious Family of Binomial Determinants That Count Rhombus Tilings of a Holey Hexagon

We evaluate a curious determinant, first mentioned by George Andrews in ...
04/08/2021

Fast optimization of viscosities for frequency-weighted damping of second-order systems

We consider frequency-weighted damping optimization for vibrating system...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

A theorem of von Neuman and Wigner states that, generically, a two-parameter family of real symmetric matrices has multiple eigenvalues at isolated points [22]. In other words, the matrices with multiple eigenvalues have co-dimension 2 in the manifold of real symmetric matrices [1, Appendix 10]. In this paper, we would like to address the problem of locating these isolated points of eigenvalue multiplicity in the 2-dimensional parameter space. To be more precise, we consider the following problem.

Problem.

Given a smooth real symmetric matrix valued function , locate the values of the parameters which yield a matrix with degenerate eigenvalues.

Example 1.1.

The function

has eigenvalues as shown in Figure 1. Note that the degeneracies occur at isolated points.

For a family of complex Hermitian matrices, the co-dimension of the matrices with multiple eigenvalues is 3. Therefore the analogous question can be posed about locating multiple eigenvalues of a Hermitian . We will formulate an extension of our results to complex Hermitian matrices but will concentrate on the real symmetric case in our proofs.

The problem of locating the points of eigenvalue multiplicity is of practical importance. In condensed matter physics [2] the wave propagation through periodic medium is studied via Floquet–Bloch transform [15, 16] which results in a parametric family of self-adjoint operators (or matrices) with discrete spectrum. The eigenvalue surfaces (sheets of the “dispersion relation”) may touch, see Fig. 1, which has profound effect on wave propagation and its sensitivity to a small perturbation of the medium. This touching corresponds precisely to a multiplicity in the eigenvalue spectrum. To give a well-studied example, the unusual electron properties of graphene occur due to the presence of eigenvalue multiplicity [5, 18].

Figure 1. Eigenvalue surfaces corresponding to Example 1.1. There are three conical point; the surfaces appear not to touch at the middle point due to insufficient grid precision.

The question of locating eigenvalue multiplicity in a family of real symmetric matrices has a straightforward solution (which also illustrates why the co-dimension is 2). The discriminant of can be written as a sum of two squares,

(1)

By definition, the discriminant is if and only if two eigenvalues coincide, therefore we have two conditions that must simultaneously be met for the multiplicity to occur:

(2)

Unfortunately, for larger matrices the discriminant quickly becomes unwieldy and cannot be used in practical computations. The discriminant can still be written as a sum of squares [13, 17, 19, 6], but the number of terms grows fast with the size of the matrix.

Thus, for an real symmetric matrix depending on two parameters and there is only one easily computable function whose root, in variables and , we are seeking.111Here, without loss of generality, we have assumed that one is interested in the degeneracy However, to apply a standard method with quadratic convergence, such as the Newton–Raphson algorithm, one needs 2 functions for 2 variables.

One can change the basis to make block-diagonal, with a block corresponding to eigenvalues and . The existence of this change in a neighborhood of the multiplicity point is assured if remain bounded away from the rest of the spectrum. However the new basis will depend on the parameters and is not directly accessible for numerical computations. Despite this obstacle, we will show that a “naive” approach produces equivalently good convergence: one can use a constanteigenvector basis which is recomputed at each point of the Newton–Raphson iteration. More precisely, we establish the following theorem.

Theorem 1.2.

Let be a real symmetric matrix valued function which is continuously twice differentiable in each entry, with a non-degenerate conical point (defined below) between and at parameter point . For any , define by

(3)

where

(4)

denote the eigenvalues of at the point and denote the corresponding eigenvectors.

Then there exists an open neighborhood of and a constant such that for all , the corresponding

satisfies the estimate

(5)

Before we prove this theorem in Section 4 we explain in Section 2 the geometrical picture behind the iterative procedure (3) and also point out the main differences between (3) and the Newton–Raphson method in a conventional setting. We also review related literature in Section 2.1 once we introduce relevant notions. The precise definition and properties of “nondegenerate conical point” is given in Section 3. Section 5 contain some computational examples.

1.1. Notation

We let denote the set of matrix valued functions mapping to with each element being continuously twice differentiable. The eigenvalues of the matrix function are numbered in the increasing order and without loss of generality we will look for such that . Naturally, all results apply equally well to any pair of consecutive eigenvalues. We remark that function are continuous but not necessarily smooth: the points of eigenvalue multiplicity are typically the points where the eigenvalues involved are not differentiable, see Fig. 1.

For any real symmetric matrix valued function and any point , we let denote the representation of in the eigenvector basis computed at point . That is,

is a fixed orthogonal matrix whose columns are the eigenvectors of

. The eigenvectors are assumed to be numbered according to the eigenvalue ordering. This means that is a diagonal matrix at the point but not necessarily anywhere else.

We let

(6)

denote the submatrix of corresponding to the coalescing eigenvectors. By definition of , we have

(7)

We let

(8)

denote the target function similar to (2). We stress that is a function of .

Throughout the paper

will denote the row vector of derivatives taken with respect to parameters

,

If is a vector-function, is a matrix with 2 columns. We use the notation to denote the derivative evaluated at the point , i.e.

We use notation to denote the Jacobian of ,

(9)

where are the eigenvectors of and the derivatives and have been evaluated at point . We remark that in Theorem 1.2 can be calculated as . The factor is the definition of arises naturally in calculations; it can also be used to put the second row terms in the more symmetric form,

Finally, we remark that by our definitions and . Therefore, the tilde (defined in equation (6)) will usually be omitted once we invoke functions and .

2. Discussion

2.1. Geometric interpretation

What is described in this paper is a variation of the Newton-Raphson method applied to the objective function . This is only one condition on two parameters (in the real case), and leads to an underdetermined Newton-Raphson iteration. In particular, given an initial guess , we would like to update our guess to such that

(10)

However, there is a whole line of points that satisfies this condition, as illustrated in Figure 2.

(a)
(b)
(c)
(d)
Figure 2. (a) Conical degeneracy of eigenvalues. (b) Linear approximation of top eigenvalue about the initial guess. (c) Linear approximation of bottom eigenvalue about the initial guess. (d) The intersection of the two linear approximations is a line, not a point. We need to use the conical nature of the intersection to determine a unique point to chose as our next guess.

To incorporate our knowledge that the degeneracy occurs at an isolated point, we use a heuristic derived from Berry phase

[10, 3, 21], a phenomenon which underlies the inability to find a smooth diagonalisation around a degeneracy: on a loop in the parameter space around a nondegenerate conical point, a continuous choice of eigenvectors must rotate by (as opposed to 0 mod ).

But if smoothly going in a loop around the degeneracy rotates the eigenvectors, the direction of minimal rotation is a direction towards the point of degeneracy. Let be a smooth choice of normalized eigenvectors around the point (this is possible because is not a point of eigenvalue multiplicity). Then we are looking for the direction in the parameter space in which the eigenvector as a function of does not rotate in the plane spanned by (it may still rotate “out of the plane”). This condition can be written as

(11)

Conditions (10) and (11) together generically222See Sections 3 and 4 for a precise formulation. define a unique point which can be taken as the next step in the iteration. We can solve for it explicitly using the well-known perturbation formulas [4, 14],

(12)
(13)

where

(14)

We stress that in equation (14) the eigenvectors are evaluated at the point and do not depend on .

The tangent planes condition (10) and the non-rotation condition (11) can now be written succicntly as

(15)

or, less succintly, as

which immediately leads to (3).

Berry phase also lies at the heart of another set of works devoted to locating points of eigenvalue multiplicity. Pugliese, Dieci and co-authors [20, 8, 7] developed a procedure which uses Berry phase to grid-search available space and identify regions with conical points. For the final convergence they used the standard Newton–Raphson method to locate the ciritical point of . The convergence of this final step is quadratic, as in Theorem 1.2.

In terms of ease of application, coding equation (3) is straightforward and lack of convergence of the method also carries information (see Section 5.3). To perform a thorough search of all available space and to locate all conical points, it is prefereable to use the methods of [20, 8, 7].

2.2. Relation to Newton-Raphson method

Recalling the definition of and in particular equation (7), we have

This allows us to rewrite equation  (15) as

which is the same as a single step of Newton–Raphson iteration applied to . In other words, is chosen to be a solution to

(16)

for some .

To understand the difference of our algorithm from a seemingly conventional Newton–Raphson method, we need to revisit the computation of . It can be viewed as first expressing in the eigenvector basis computed at the point and then extracting the -subblock of the resulting matrix.

In this notation, the problem of finding the degeneracy is equivalent to finding a point such that

(17)

In contrast, solving equation (16) is a first step in finding a point such that

(18)

Going all the way to find the solution to equation (18) is pointless; this is not the equation we need to solve. Instead, we go one step, computing the first Newton–Raphson approximation , and then update our target equation to

compute the first Newton–Raphson approximation to that equation and so on.

2.3. Complex Hermitian matrices

Let us now consider a complex Hermitian matrix-valued function

. To find a point of eigenvalue multiplicity, we typically need three real parameters (the off diagonal terms can be complex, and that introduces an additional degree of freedom), which we still denote by

.

The conditions can now be written as

(19)

where

(20)

One can equivalently use the objective function

(21)

3. Conical Intersection

Let be a point in the parameter space such that has a double eigenvalue . The existence of eigenvalue multiplicity precludes a smooth diagonalization in a region containing the degeneracy. However, a smooth block diagonalization exists. The standard construction (see, for example, [14, II.4.2 and Remark 4.4 therein]) uses Riesz projector.

We can choose a contour with enclosing and no other point in the spectrum of . This property of must persist for when is in a small neighborhood of . The Riesz projector

(22)

projects onto the continuation of the eigenspace of

at [11]. The projector itself is smooth, as the points on the contour are all in the resolvent set of (and so has a bounded inverse for all ). Starting with an arbitrary eigenvector basis at , we can obtain a basis at a nearby by applying Gram-Schmidt procedure to the set , which preserves smoothness. We can do the same with the orthogonal complement and a complementary basis to . To summarize, for some region with , we find a change of basis such that

(23)

where and . We can further diagonalise both and at any point to obtain

(24)

where , and both

are diagonal at . A stronger result from Hsieh, and Sibuya [12], and Gingold [9] states that such block-diagonalization exists even for matrices that are not necessarily Hermitian, and for any closed rectangular region that contains an isolated degeneracy.

Note that since is a matrix which has an eigenvalue multiplicity at the point , is a multiple of the identity. The eigenvalue multiplicity is detected by the discriminant of which in the case is defined as

(25)

The discriminant achieve its minimum value 0 at the point . It is also a function of and its Hessian is well-defined.

Definition 3.1.

A point of eigenvalue multiplicty is a non-degenerate conical point if has a non-degenerate conical point at .

In other words, there is a positive definite matrix such that

and, along any ray originating at , the eigenvalues are separating at a non-zero linear rate. This picture justifies the use of the term “conical”.

Unfortunately, while existence of is assured, it is not easily accessible analytically. The following theorem provides a more practical method of checking if is non-degenerate.

Theorem 3.2.

The Hessian of at is given by

(26)

Consequently, is a non-degenerate conical point if and only if .

We remark the it is the same that appears in the denominator in Theorem 1.2. The condition

has a nice geometric meaning: it is precisely the condition that the manifold

of real symmetric matrices is transversal to the line of symmetric matrices with repeated eigenvalues.

The choice of basis in the definition of is assumed to align with the choice of basis used to compute , i.e. the first two columns of are the eigenvectors used to compute . This choice does not affect the definition of the non-degenerate point because of the following lemma.

Lemma 3.3.

Let be a matrix-valued function of . Then for any orthogonal matrix there is an orthogonal matrix such that for all we have

(27)

and therefore

(28)
Proof.

This identity for matrix-functions can be checked by direct computation but the details are excessively tedious. Instead we use a more generalizable approach.

We fix an orthogonal and let denote the linear space of real symmetric matrices. The map , see equation (8

), acts as a linear transformation from

to . It is obviously onto and has the kernel consisting of multiples of the identity. On the other hand, conjugation by (namely the map ) is a linear transformation of to itself. It maps multiples of the identity to themselves and therefore induces a linear transformation from the quotient space to itself. This linear transformation, via the isomorphism between and , induces a linear transformation on mapping to .

We summarize the above in the commutative diagram

In other words, for a given orthogonal , there exists a constant matrix such that

From the identity (see (25) for the definition of discriminant)

we conclude that is orthogonal. Finally, taking derivatives we get

since determinant of an orthogonal matrix is either or . ∎

The following identity will be helpful in the proof of Theorem 3.2 and also in Section 4.

Lemma 3.4.

For any and as in equation (24),

(29)

where are the first two columns of the matrix .

Proof.

We remark that identity (29) is only claimed for the Jacobian evaluated at the point where both and are diagonal, therefore .

For all , are orthonormal and differentiating we get

(30)

We can now relate the derivatives of to the derivatives of ,

The calculation is identical for derivatives. ∎

Proof of Theorem 3.2.

We write

and note that . The latter observation implies that the product rule for the second derivatives at the point collapses to

Therefore the Hessian can be written as

Finally, setting in Lemma 3.4 yields

(31)

and concludes the proof of (26). ∎

4. Proof of the main result

Here we restate the procedure used to locate the degeneracy in the notation that has been introduced.

Theorem 4.1.

For a family of matrix functions , define by

(32)

Let have a non-degenerate conical point at between eigenvalues and . Then there exists an open with and , such that for all ,

(33)

where the matrix-function is defined by

(34)

with the constant matrix whose columns are the eigenvectors of .

We remark that the assumption of non-degeneracy of the conical point is justified, for example, by the fact that any degenerate conical point can be made non-degenerate by a small perturbation of the function .

We recall that the superscript in refers to the basis which is computed at the point and in which the matrix is represented. The derivatives of that are taken to compute in (32), are also evaluated at the point . The result of evaluating is explicitly written out in equations (3)-(4).

Proof.

We present a brief outline of the proof which combines several facts established in the remained of this Section.

Let be the matrix defined in equation (23). We will see, in Lemmas 4.2 and 4.4 below that there is a neighbourhood of the conical point , and constants such that for all we have

and

Together, these give us

as desired. ∎

Now we establish the lemmas used in the proof of Theorem 4.1.

Lemma 4.2.

There exists with and such that

(35)

when .

Proof.

This is the usual Newton–Raphson method applied to conical point search for the matrix . For completeness we provide the proof. For the function , we have the Taylor expansion around the point which is evaluated at the point ,

where the constant in is independent of as long as it is in a neighborhood of . The dot denotes the matrix-by-vector multiplication (to distiguish it from the argument of the function ).

By assumption , and, by smoothness, we know that is boundedly invertible in some region containing . Therefore, for the point , or equivalently,

we have

with the estimate (35) following by inverting . ∎

Lemma 4.3.

For any and constant, orthogonal , we have

(36)
Proof.

Equation (36) follows directly from the definition of the one-step iteration function and Lemma 3.3. ∎

Lemma 4.4.

There exists with and such that

(37)

when .

Proof.

By the assumption that is a non-degenerate conical point and equation (26), we have that and therefore has a bounded inverse in a region around . By equation (29) we conclude that also has a bounded inverse in some region around where is small. We can express the difference of the inverses as

and so, using boundedness of and its derivatives, we get

We also recall that by definition of and ,

Finally, abbreviating , we estimate

Equation (37) now follows by applying Lemma 4.3 to get . ∎

5. Examples

5.1. Elements of A are linear in parameters

If is linear in each parameter, we have , where and for some , that depend on , and . The eigenvalues of this matrix are values of where ,

which is a cone in the new parameter space. In fact, a simple calculation shows that the degeneracy of the function , which has the same eigenvectors and shifted eigenvalues, has eigenvectors , can be located using a single step of the above rule.

5.2. Non-linear examples

Consider the following matrix-function example,

(38)

Since is a rank-one perturbation of a diagonal matrix, it can be shown that there is a double eigenvalue at the point given by

or . The results of running the algorithm of Theorem 1.2 with random starting points in the rectangle is shown in Figure (a)a.

(a)
(b)
Figure 3. (A) Logarithm of the distance from the -th iteration to the conical point of from equation (38), plotted as a function of ; the algorithm saturates at the limit of numerical precision in 3-5 steps. (B) Logarithm of where is the -th iteration of the algorithm applied to given by equation (39). Several independent runs are plotted, each beginning at a random point in .

The complex Hermitian case described in Section 2.3 is demonstrated in Figure (b)b. The matrix

(39)

corresponds to the discrete Laplacian of the graph shown in Figure 4 with dashed edges carrying a magnetic potential ( and correspondingly). The parameter is introduced artificially, and the conical point found numerically has value . Since the location of the conical point is not known analytically, the error is estimated using the norms of updates instead of . The result of several runs of the algorith is shown in Figure (b)b.

2

1

4

3

5

6

7

8

9

10
Figure 4. Graph corresponding to equation (39).

5.3. Avoided crossing

While a non-degenerate conical point is stable under small perturbations of the real symmetric matrix-function