# Extending the Davis-Kahan theorem for comparing eigenvectors of two symmetric matrices II: Computation and Applications

The extended Davis-Kahan theorem makes use of polynomial matrix transformations to produce bounds at least as tight as the standard Davis-Kahan theorem. The optimization problem of finding transformation parameters resulting in optimal bounds from the extended Davis-Kahan theorem is presented for affine transformations. It is demonstrated how globally optimal bound values can be computed automatically using fractional programming theory. Two different solution approaches, the Charnes-Cooper transformation and Dinkelbach's algorithm are reviewed. Our implementation of the extended Davis--Kahan theorem is used to calculate bound values in three significant examples. First, a pairwise comparison is made of the spaces spanned by the eigenvectors of the graph shift operator matrices corresponding to different stochastic block model graphs. Second our bound is calculated on the distance of the spaces spanned by eigenvectors of the graph shift operators and their corresponding generating matrices in the stochastic blockmodel, and, third, on the sample and population covariance matrices in a spiked covariance model. Our extended bound values, using affine transformations, not only outperform the standard Davis-Kahan bounds in all examples where both theorems apply, but also demonstrate good performance in several cases where the standard Davis-Kahan theorem cannot be used.

There are no comments yet.

## Authors

• 3 publications
• 4 publications
• ### Extending the Davis-Kahan theorem for comparing eigenvectors of two symmetric matrices I: Theory

The Davis-Kahan theorem can be used to bound the distance of the spaces ...
08/09/2019 ∙ by J. F. Lutzeyer, et al. ∙ 0

• ### Comparing Graph Spectra of Adjacency and Laplacian Matrices

Typically, graph structures are represented by one of three different ma...
12/11/2017 ∙ by J. F. Lutzeyer, et al. ∙ 0

• ### Spectral clustering in the weighted stochastic block model

This paper is concerned with the statistical analysis of a real-valued s...
10/12/2019 ∙ by Ian Gallagher, et al. ∙ 0

• ### Towards Tight(er) Bounds for the Excluded Grid Theorem

We study the Excluded Grid Theorem, a fundamental structural result in g...
01/23/2019 ∙ by Julia Chuzhoy, et al. ∙ 0

• ### An Unsupervised Algorithm For Learning Lie Group Transformations

We present several theoretical contributions which allow Lie groups to b...
01/07/2010 ∙ by Jascha Sohl-Dickstein, et al. ∙ 0

• ### On Transitive Consistency for Linear Invertible Transformations between Euclidean Coordinate Systems

Transitive consistency is an intrinsic property for collections of linea...
09/02/2015 ∙ by Johan Thunberg, et al. ∙ 0

• ### A Unified Framework of Elementary Geometric Transformation Representation

As an extension of projective homology, stereohomology is proposed via a...
07/03/2013 ∙ by F. Lu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In the first part of this work (Paper I) we introduced the extended Davis–Kahan (DK) theorem for comparing two sets of consecutive and corresponding eigenvectors from any two symmetric matrices.

The extension incorporated a polynomial transform of one of the matrices which allows a relaxation and utilisation of the eigenvalue structure imposed by the standard DK theorem. As a result the bounds determined by the extended theorem are always at least as tight as those from the standard DK theorem.

Paper I concentrated on the mathematics of the proposed approach. In this second part of the work, we turn our attention to computational issues, and also give some significant examples of applications of our extended DK theorem. The computational aspects are certainly challenging, and in this paper we only give a full discussion for the case of affine (linear) transformations; however, as exemplified by the applications, the affine transformation can be very beneficial.

We classify the problem of finding optimal affine transformation parameters for the extended DK bound as a fractional program. Fractional programing seems to be a less-well known class of optimization problems. For the history and recent advances in fractional programming see

[9]. An excellent overview of the solution and implementation of concave-convex fractional programs, the subclass of fractional programs which applies to our optimization problem, is given in [13]. In the wireless communication literature fractional programming approaches have recently found much use [5, 18, 25, 27].

Once we have chosen a solution method for our optimization problem we produce bounds in a range of different applications. The matrices considered are graph shift operators corresponding to graphs in a stochastic blockmodel and covariance matrices in a spiked covariance model, where the principal component analysis algorithm is well motivated. As stated in paper I, bounding the spaces spanned by the eigenvectors of the graph shift operators is particularly relevant to the signal processing community since they form the basis of the much utilised graph Fourier transform.

The remainder of this paper is structured as follows, in Section II we summarise our extended DK theorem proved in Paper I using general polynomial transformations and then specialize to affine transformations. In Section III we present the problem of finding transformation parameters resulting in optimal bound values in our extended affine DK theorem. Furthermore, we prove that a trivial bound on the distance of the spaces spanned by the compared eigenvectors is always outperformed for comparisons of eigenvectors corresponding to either the largest or smallest eigenvalues. In Section IV we introduce concave-convex fractional programs, demonstrate that the problem from Section III can be brought into a fractional programming form and discuss two solution approaches, the Charnes–Cooper transformation and Dinkelbach’s algorithm. Section V presents three significant examples in which our bound is computed. These examples are the comparison of the spaces spanned by the eigenvectors of the graph shift operators in V-C, of the graph shift operators to their corresponding generating matrices in the stochastic blockmodel in Section V-D and of the sample and population covariance matrices in a spiked covariance model in Section V-E. Our summary and conclusions are given in Section VI.

## Ii Problem Summary

### Ii-a Bounds

Let denote the Stiefel manifold of matrices with orthonormal columns. Let be symmetric matrices with eigenvalues and and corresponding eigenvectors and , respectively. For let the matrices holding the eigenvectors corresponding to consecutive eigenvalues of each matrix be denoted by and the columns of which span the spaces and respectively. Under the stated conditions, it was shown in Paper I that there exists a such that

 ∥Wj−VjQ∥F ≤cn,r∥∥WjWTj(I−VjVTj)∥∥2 ≤cn,r∥p(Φ)−Ψ∥2δi, (1)

where is the group of orthogonal matrices, is a polynomial matrix transformation and are different values of the DK interval separation parameter corresponding to the two different DK interval choices. The usual or standard DK bounds follow by taking in the second inequality. In Paper I we saw that the first norm is directly related to the metric and the second relates to the metric

### Ii-B Affine Transformations

In this paper we focus on affine matrix transformations

of one of the matrices under comparison (arbitrarily, ). So we set and for from Paper I, the two DK interval choices take the form with

 a1 =mini∈{j+1,…,j+r}f(ϕi),b1=maxi∈{j+1,…,j+r}f(ϕi), δ1 =min(ψj+r+1−b1,a1−ψj); (2) a2 =ψj+1,b2=ψj+r δ2 =min[mini∈A1 f(ϕi)−b2,a2−maxi∈A2 f(ϕi)]. (3)

The index sets and are given in Paper I.

A main purpose of this paper is to solve the problem of optimizing the bound on the right-side of (II-A) when i.e., minimize

 cn,r∥f(Φ)−Ψ∥2δi,i∈{1,2}, (4)

subject to the associated constraints given in Paper I. For an affine transformation, Constraints 2 of Paper I must be applied (Constraints 1 are subsumed). Constraints 2 take the form:

1. [label = ]

2. In the case of interval choice (2), let the transformation parameters of be chosen such that, for given ,

 δ1 >0, (5) a1−ψj+1 <δ1, (6) ψj+r−b1 <δ1. (7)
3. For interval choice (3), let the transformation parameters of be chosen such that, for given ,

 δ2 >0, (8) a2−mini∈{j+1,…,j+r}f(ϕi) <δ2, (9) maxi∈{j+1,…,j+r}f(ϕi)−b2 <δ2. (10)

Also as pointed out in Paper I, in addition to the two DK interval choices which give in (4), there are also the possibilities and in the affine transform. So there are four different possible values for the DK interval separation which are,

 δ1,+ =min(ψj+r+1−c1ϕj+r−c0,c1ϕj+1+c0−ψj) (11) δ1,− =min(ψj+r+1−c1ϕj+1−c0,c1ϕj+r+c0−ψj) (12) δ2,+ =min(c1ϕj+r+1+c0−ψj+r,ψj+1−c1ϕj−c0) (13) δ2,− =min(c1ϕj+c0−ψj+r,ψj+1−c1ϕj+r+1−c0) (14)

## Iii The bound as a numerical optimization problem

In this section we frame the optimization of the affine bounds (4) as constrained optimization problems over the transformation parameters Solving the optimization problems derived in this section results in the minimal bound under affine transformations on the distance of the spaces spanned by two sets of eigenvectors. Since there are four possibilities for the denominator in (4), namely (11)-(14), we have four optimization subproblems, which need to be solved in order to obtain the overall optimal DK bound.

We study the solution of one in detail, and the rest follow analogously. In (15) we show the optimization subproblem for interval choice (2) for affine transformations with , i.e., where in (11). Then and (Here and are given in Table I of Paper I, but follow from the preservation of ordering of the eigenvalues for an affine transform with ). For interval choice (2) we need Constraints 2A, which we apply in (15) via the last three rows of the “s.t” (“subject to”) statement. From Corollary 1 of Paper I the objective function to be minimized is Hence the first subproblem takes the form

 minc1,c0 ∥c1Φ+c0I−Ψ∥2δ1,+, (15) s.t. c1>0, δ1,+>0, δ1,+>ψj+r−c1ϕj+r−c0, δ1,+>c1ϕj+1+c0−ψj+1.

The remaining three subproblems follow a similar structure as (15), where equals (12), (13) or (14) and the values of the transformed spectrum are correspondingly taken from Table I of Paper I.

###### Remark 1.

In the objective function in (15) we omit the constant present in (II-A), since it is inconsequential to the minimization. However, solutions of (15) have to be multiplied by in order to obtain valid bounds.

In Proposition 1 we obtain a trivial upper bound on and demonstrate that, for comparisons of the first or the last eigenvectors, solutions of (15) always approximate or improve upon this bound. Without considering the matrix transform, no such guarantees could be given.

###### Proposition 1.

1. Let Then, for any there exists such that,

 ∥Wj−VjQ∥F≤cn,r. (16)
2. When comparing spaces spanned by the eigenvectors corresponding to either the largest or smallest eigenvalues, the bound produced from (15) always approximates or improves upon (16).

###### Proof:

We begin by proving part 1 of the proposition. From Lemma 2 of Paper I,

Now, and are both projectors and hence all their eigenvalues are equal to either 0 or 1 [1, p. 358]. Therefore,

 ∥Wj−VjQ∥F ≤cn,r ∥∥WjWTj∥∥2∥∥I−VjVTj∥∥2≤cn,r.

In Appendix -A we show that, for comparisons of spaces spanned by the eigenvectors corresponding to either the largest or smallest eigenvalues, the objective function in (15) approximates 1 as tends to either or . This result holds for all optimization problems of the form (15), where or . As stated in Remark 1, multiplying the objective function by yields valid bound values. Hence, for there exist large negative or positive values of such that the cost function (15) results in bounds approximating (16). For some problems of the form (15), an appropriate choice of transformation parameters may result in a bound smaller than 1 as shown in Section V. Therefore, for the bound produced from a solution of (15) always approximates or improves upon (16). ∎

###### Remark 2.

When considering other than the first or last eigenvectors, i.e., eigenvector comparisons with we observe from Equations (11), (12), (13) and (14) that all four quantities and tend to as Hence, for eigenvector comparisons with Constraints 2 are violated when i.e., lies outside of the feasible set of (15) and its related subproblems. Hence, for comparisons of spaces spanned by eigenvectors other than the first or last , (ordered by corresponding eigenvalue magnitude), a similar statement to part 2 of Proposition 1 is not guaranteed.

###### Remark 3.

The trivial upper bound in (16) applies for all values of Suppose we are unable to find a set of affine transformation parameters such that This means there exists no affine transformation reducing the distance of the two matrices in the two norm to a value less than the distance of the relevant eigenvalues. In this case, the trivial bound in (16) should be used to bound the distance of the spaces spanned by the consecutive eigenvectors of the two matrices instead of the bound resulting from the solutions of (15) and its related subproblems.

## Iv Calculating the bound in practice

### Iv-a Fractional Programming

Here we show that the optimization subproblem (15) and its related subproblems can be efficiently solved using fractional programming theory.

Ratio optimization problems are commonly called fractional programs [9]. Hence, the optimization problem (15) of choosing the affine transformation parameters resulting in a minimal bound is a fractional program. We now describe the properties of fractional programs and their solutions. Then we transform (15) to fit the standard class of concave-convex fractional programs and discuss the implementation of its solution.

Firstly, we formally define fractional programs.

###### Definition 1.

[13] A general nonlinear fractional program has the form,

 maxx g1(x)g2(x),s.t. x∈S, (17)

where , and . Problem (17) is called a concave-convex fractional program if is concave, is convex, and is a convex set; additionally for is required, unless is affine.

In [9], concave-convex fractional programs are referred to as concave fractional programs. An excellent overview of concave-convex fractional programs is given in [13], with a focus on wireless communication.

###### Remark 4.

For concave-convex fractional programs, a powerful and useful practical result is that any local maximum is a global maximum [9].

When discussing the solution of concave-convex fractional programs the concept of equivalence of optimization problems is essential.

###### Definition 2.

[2, p. 130] define two optimization problems as equivalent if the solution of one problem can be readily obtained given the solution of the other problem and vice versa.

Furthermore, we make use of the standard definition of the feasible set of an optimization problem.

###### Definition 3.

[2, p. 127] The feasible set of an optimization problem is equal to the set of points which satisfy all the constraints of the optimization problem.

### Iv-B Creating a Concave-convex Fractional Program

Subproblem (15) can be transformed to fall into the class of concave-convex fractional programs. As pointed out in [21],

 maxx∈S(g1(x)g2(x))=1minx∈S(g2(x)g1(x)). (18)

Using (18) we find that solving (15) is equivalent to solving,

 maxc1,c0 δ1,+∥c1Φ+c0I−Ψ∥2, (19) s.t. c1>0, δ1,+>0, δ1,+>ψj+r−c1ϕj+r−c0, δ1,+>c1ϕj+1+c0−ψj+1.

The bound value is found by transforming the solution of (19) according to (18).

We now demonstrate that the subproblems such as (19), are concave-convex fractional programs.

###### Lemma 1.

The ’s in Equations (11), (12), (13) and (14) are all concave.

###### Proof:

All the ’s in Equations (11)–(14) are equal to the minimum of two affine functions of the transformation parameters.

Let be affine functions. Then, and are still affine functions. Affine functions can be thought of as either convex or concave [2, p. 67] and further the pointwise maximum of convex functions is convex [2, p. 80]. Hence, is a convex function.

If is a convex function, is concave [2, p. 67]. Therefore, is concave. But is of the same form as the ’s, therefore is concave. ∎

All four subproblems share the denominator which is easily shown to be convex via the triangle inequality. Furthermore, we require the denominator in (19) to be strictly positive. Since is not affine we additionally require the numerator of (19) to be positive on its feasible set; this is ensured by the ’s being positive on this set. Hence, (19) and the remaining 3 subproblems are elements of the class of concave-convex fractional programs as in Definition 1.

### Iv-C Solving a Concave-convex Fractional Program

Several general approaches to solving concave-convex fractional programs are presented in [13] and in [25] fractional programming in the context of multiple-ratio problems is discussed.

In Section IV-D, we discuss the parameter-free approach where an equivalent convex problem is obtained through transformation of the optimization parameters; the transformation used is commonly referred to as the Charnes–Cooper transformation. This transformed problem only needs to be solved once. In [5] the Charnes–Cooper transformation is used in the optimization of the energy spectral efficiency of a communication network and in [18]

the authors show that the maximum likelihood estimate of the steering direction of a signal for radar detection can be found by utilising the Charnes–Cooper transformation of a fractional programming problem.

In Section IV-E, we treat the parametric approach which introduces an additional parameter to obtain an equivalent problem, which is not jointly convex in and . The equivalent problem, is however, convex in and monotone in . Therefore, we iteratively solve the convex problem for for a fixed and update using a Newton-Raphson step. This algorithm is credited to Dinkelbach [6]. In [27]

minimization of the system outage probability in a communication network using Dinkelbach’s algorithm is discussed, using a closed form solution to the problem at each iteration.

For computational reasons we mainly utilise the parameter-free approach, i.e., the Charnes–Cooper transformation. This follows advice in [22] and [23], who state that the iterative solution via Dinkelbach’s algorithm is only to be preferred over the single Charnes–Cooper transformed problem, if the solution via Dinkelbach exploits the structure of the numerator and denominator of the fractional program which the Charnes–Cooper solution does not. For instance for quadratic fractional programs – fractional programs with a quadratic numerator and denominator and affine constraints – Dinkelbach’s algorithm solves a quadratic program at every iteration, while the Charnes–Cooper transformation yields a concave problem. Therefore, if not many Dinkelbach iterations are necessary for convergence, then Dinkelbach’s algorithm is to be preferred over the Charnes–Cooper approach for quadratic fractional programs. We find that for our problem (15) both the Charnes–Cooper transformation and Dinkelbach’s algorithm solve convex or concave problems. Therefore, we prefer the Charnes–Cooper solution method. However, most importantly, the results of the two different approaches agree in our simulations, as would be anticipated from [13] who showed that the optimality conditions of the two approaches are equivalent, so in theory the results should indeed not vary.

### Iv-D The Charnes–Cooper Transformation

Charnes and Cooper [4] proposed a variable transform for linear fractional programs – fractional programs with an affine numerator and denominator and linear constraints. Schaible [20] generalised the transformation to concave-convex fractional programs. [13] give a good recent summary of the transformation of concave-convex fractional problems. For the transformation of (19), appropriate transformation parameters are:

 y1 =c1∥c1Φ+c0I−Ψ∥2;y2=c0∥c1Φ+c0I−Ψ∥2; t=1∥c1Φ+c0I−Ψ∥2. (20)

Let

 δ1,+(t)=min(tψj+r+1−y1ϕj+r−y2,y1ϕj+1+y2−tψj).

Transforming (19) using the parameters in (IV-D) we obtain the following convex optimization problem:

 maxy1,y2,t δ1,+(t), (21) s.t. t>0, ∥y1Φ+y2I−tΨ∥2≤1, y1>0, δ1,+(t)>0, δ1,+(t)>tψj+r−y1ϕj+r−y2, δ1,+(t)>y1ϕj+1+y2−tψj+1.

In the original proposal of the transformation for linear fractional programs [4] the equality constraint was used. This constraint cannot be placed on concave-convex fractional problems, since convex optimization problems can only have linear equality constraints [2, p. 191]. It is proved in [20] that for concave-convex fractional programs the constraints and are equivalent. Therefore, we work with the relaxed constraint .

Note that (21

) is not a linear program since the constraint

contains a non-linear function of the parameters. The constraint is however convex; therefore, (21) is a convex optimization problem.

We implement the 4 subproblems using the cvx package in MATLAB [10, 11]. cvx does not accept strict inequalities. We therefore solve relaxed subproblems, where the strict inequalities are relaxed to include their boundaries, and then check whether the obtained solutions satisfy the strict inequalities. We have found this approach to work extremely well in practice with no convergence issues.

### Iv-E Dinkelbach’s Algorithm

Dinkelbach’s algorithm was proposed in [6]. Equivalent to (19) is the problem

 maxc,d δ1,+−λ∥c1Φ+c0I−Ψ∥2, (22) s.t. c1>0, δ1,+>0, δ1,+>ψj+r−c1ϕj+r−c0, δ1,+>c1ϕj+1+c0−ψj+1.

[6] state that the algorithm can be initialised at a feasible point , which is chosen such that the corresponding is positive, or at . When we initialise at then any feasible set of transformation parameters can be chosen for the initialisation. Therefore, we choose to always initialise at and to be equal to their respective optima from the Charnes–Cooper algorithm in the corresponding subproblem.

As with the Charnes–Cooper implementation we utilise relaxed subproblems, where the strict inequality constraints are relaxed to include their boundaries. Then we check and report if any of the strict inequality constraints in the Dinkelbach implementation are violated. We have found the solution of Dinkelbach’s algorithm to agree with the solution of the Charnes–Cooper algorithm in all cases we tested.

Dinkelbach’s scheme was extremely useful for checking that our implementation of the Charnes–Cooper scheme was correct, but it offered no advantages over the latter, and was much slower.

## V Visualising the bound values: three examples

Problem (15) and its solution via (19), (along with the three related optimization subproblems), can be used to calculate bounds on the distance of the spaces spanned by eigenvectors of any two symmetric matrices satisfying Assumption 2 of Paper I. Therefore, we envisage that the affine transform could contribute tighter bounds in a range of fields where eigenvectors are used. We highlight three such applications. In Section V-C we study our bound on the distance of the spaces spanned by the eigenvectors of the three graph shift operator matrices. Then, in Section V-D we will apply the bound to the comparison of the graph shift operator matrices to their respective generating matrices in the stochastic blockmodel. Our final example application in Section V-E is in a principal component analysis setting, where we compare the space spanned by the eigenvectors of the sample covariance matrix and its corresponding population covariance matrix in a spiked covariance model.

Throughout Sections V-C and V-D we will generate networks from the stochastic blockmodel, introduced next.

### V-a The Stochastic Block Model

The stochastic blockmodel, which is widely used in the networks literature [12, 16, 17], allows us to encode a block structure in a random graph via different probabilities of edges within and between node-blocks. The definition and parametrisation below is adapted from [17].

###### Definition 4.

Consider a graph with node set Split this node set into disjoint blocks denoted We encode block membership of the nodes via a membership matrix , where if and otherwise. Finally, we fix the probability of edges between blocks to be constant and collect these probabilities in a probability matrix , i.e., for nodes and the probability of an edge between and is equal to

Hence, the parameters of the stochastic blockmodel are and where the number of nodes and the number of clusters are implicitly defined via the dimensions of We simulate graphs from this model by fixing these parameters and then sampling edges from Bernoulli trials. The Bernoulli parameter of the trial corresponding to the edge connecting to is given by entry of the matrix .

Since our results apply only to symmetric matrices, we work with undirected graphs, which can be derived from a stochastic blockmodel by sampling the upper triangular half of the adjacency matrix from the stochastic blockmodel and then equating the lower triangular part of the adjacency matrix to the transpose of the upper triangle.

Throughout this section we take the matrix of edge probabilities to be composed of only two values. On the diagonal we have encoding the probability of edges within the different blocks to be the same for all blocks. Off-diagonal we have to encode the probability of edges between nodes in different blocks. For example, for takes the form:

 P=(pwpbpbpbpwpbpbpbpw). (23)

### V-B Scaling

For affine transforms, from (II-A) and (4) the bound of interest is the right-hand-side of

 ∥∥Wj−VjQ∥∥F ≤cn,r∥∥WjWTj−(I−VjVTj)∥∥2 ≤cn,r∥c1Φ+c0I−Ψ∥2δi,

where equals or from Equations (11)-(14), which generate the four different optimization subproblems of the form (19).

In this section we divide all bound values and attained values by i.e., we consider instead

 c−1n,r∥∥Wj−VjQ∥∥F ≤∥∥WjWTj−(I−VjVTj)∥∥2 ≤∥c1Φ+c0I−Ψ∥2δi, (24)

This rescaling has the advantage that, independent of and our bound values are on the same relative scale. Furthermore, the trivial bound derived in Proposition 1 corresponds to the upper bound of 1 in all plots, rather than the value which would vary across the different simulations.

In what follows, (scaled) attained distance in the metric refers to the quantity and (scaled) attained distance in the metric refers to (For the first of these we recall from Paper I that, when calculating distances, finding the matrix for which the infimum is attained can be avoided by the use of canonical angles.)

### V-C Different Pairs of Graph Shift Operator Matrices

In this section we calculate the bound on the distance of the spaces spanned by the eigenvectors of the graph shift operator matrices. Recall that the largest eigenvalues of the adjacency matrix correspond to the smallest eigenvalues of the Laplacians. Hence, in two of the three presented comparisons we will compare spaces spanned by eigenvectors corresponding to eigenvalues on opposing ends of the eigenvalue spectrum. Therefore, in the majority of cases presented in this section the standard DK Theorem does not apply, while bound values can be obtained via our extended DK Theorem.

Throughout this section we consider stochastic blockmodels with where every block is composed of equally many nodes. The parameters identifying the compared eigenvectors are and . This choice of and

is motivated by the fact that the first eigenvector of the Laplacian matrices is a constant vector and is therefore not informative in the recovery of the blocks in the stochastic blockmodel. Hence, we are comparing spaces spanned by eigenvectors corresponding to the

second and third largest eigenvalue of the adjacency matrix to those corresponding to the second and third smallest eigenvalues of the two graph Laplacians.

In Figs. 1 (a) and (b) we plot the bound and attained values arising from the comparison of the spaces spanned by eigenvectors of the three graph shift operator matrices against the degree extremes of the corresponding graphs. We simulated stochastic blockmodels with equal parameters

In Fig. 1(a) we observe that the bound grows with a growing degree extreme difference for the comparisons of with and , while the bound remains relatively constant across different degree extremes when and are compared. The bound values for comparisons of with and are very close, differing only by very small amounts. The bound arising from the comparison of and attains much lower values than the bound values arising from the other two comparisons. Hence, using an affine transformation we are able to obtain a very small bound on the difference of spaces spanned by the eigenvectors of and , which suggests that they are very close. Not only are they close to each other, they also produce extremely similar bounds on the space spanned by the eigenvectors when individually compared to the eigenvectors of

In Fig. 1(b) we observe that the attained distances of the three comparisons remain rather constant across the different degree extreme differences.

The results in Fig. 1 show that for the comparison of and an affine transformation is sufficient to remove the dependence of the bound value on the degree extreme difference. In contrast the much higher bounds for the comparison of the eigenvectors of with and still depends on the degree extreme difference and the affine transformation was not sufficient to remove this dependence.

In Fig. 2, we observe the effect of a growing number of network nodes on our bound. For each value of we simulated stochastic blockmodels with equal parameters In all plots the four values of are displayed on the -axis. The first column of plots in Fig. 2 displays boxplots of the bound values of the three possible pairwise comparisons, while the second and third columns display boxplots of the optimal affine transformation parameters and respectively. The rows of plots show the comparison of spaces spanned by eigenvectors of and (first), and (second) and and (third). Just the 25 samples from each stochastic blockmodel parametrisation were sufficient to reveal general trends in the 9 plots.

From Fig. 2 (a), (d) and (g) we immediately see that the bound values decrease as the number of nodes in the stochastic blockmodels grows indicating that as grows the differences between the spaces spanned by the eigenvectors of the graph shift operator matrices decrease. From Fig. 2 (b), (c), (e), (f), (h) and (i) we observe the majority of optimal affine transformation parameters depend on For the comparison of and in Fig. 2(c), we find the optimal additive parameter to grow from roughly 10 for to roughly 110 for while the multiplicative parameter in Fig. 2(b) remains almost constant for all values of . Comparison of the Laplacian matrices and shows both transformation parameters vary slightly with (Figs. 2(e) and (f)). Finally, comparison of and shows the additive parameter to remain mostly constant with changing (Fig. 2(i)) while the multiplicative parameter grows with (Fig. 2(h)). The magnitude by which the transformation parameters change as grows is clearly quite variable.

We were unable to run the simulation displayed in Fig. 2 beyond within reasonable computation time. The times for the simulations with the four different values of were, and minutes, respectively. For each value of 300 convex optimization problems were solved since each bound on the three possible graph shift operator matrix comparisons was calculated for 25 different stochastic blockmodels per value of and calculation of each bound involves the solution of 4 different convex optimization subproblems.

### V-D Graph Shift Operator Matrices and Generating Matrices

In this section we compare (i) spaces spanned by eigenvectors of the graph shift operator matrices to (ii) spaces spanned by eigenvectors of their corresponding generating matrices in the stochastic blockmodel and (Here is a column vector of ones with entries and the term in the calculation of ensures that our stochastic blockmodels do not have self-loops.) It is natural to compare the spaces spanned by these eigenvectors, since consistency and rate of convergence of different methods, based on the eigenvectors of the graph shift operator matrices in a stochastic blockmodel setting, can be demonstrated [3, 7, 19].

In Figs. 3(a) and (b) the bound and attained values from the comparison of the spaces spanned by the eigenvectors of the three graph shift operator matrices to the eigenvectors of their respective generating matrices are plotted against the degree extremes of the corresponding graphs. In the adjacency matrix comparison, the eigenvectors corresponding to the three largest eigenvalues are compared, while for the Laplacians we concern ourselves with the eigenvectors corresponding to the three smallest eigenvalues. realisations of a stochastic blockmodel with parameters were simulated. In this comparison we included the first eigenvector of the matrices under comparison, i.e., we chose and

We see in Fig. 3 that the ordering in magnitude of the attained values is reflected in the bound values, with the bounds on the unnormalised Laplacian comparison taking the largest values. We see that both the attained and the bound values in the comparison of the spaces spanned by eigenvectors of and seem to grow with growing degree extreme difference, which is not the case for the comparisons involving and

In Fig. 4 we observe the effect of a growing number of network nodes on our bound. For each value of we simulated stochastic blockmodels with parameters The transformation parameters are roughly centred around the parameters of the identity transformation, i.e., and

. Interestingly, the variance of the additive parameter

seems to be increasing with increasing for the comparison For all other displayed values in Fig. 4 we find the variance of the observed values to decrease as the number of nodes in the network grows. As the number of nodes, in the stochastic blockmodel grows, the bound values of all three comparisons decrease.

For the spaces spanned by the leading eigenvectors of the graph shift operator matrices compared to their corresponding generating matrices, Fig. 5 shows the usual DK bounds (Theorem 3 of Paper I), our sharpened bounds (Theorem 5 of Paper I and (24)) and the attained values; all were standarized. Here realisations of a stochastic blockmodel with parameters were generated. In all three comparisons, our sharpened bound values improve on the usual DK bound values. In the case of the unnormalised Laplacian several of the usual DK bound values are greater than 1, therefore, even the trivial bound value of 1 (see Proposition 1 and Section V-B) is tighter than the usual DK bound. In contrast, our bound produces values consistently lower than 1.

In addition to the attained distances of the spaces spanned by the eigenvectors in the metric we have shown the distance in the metric in Fig. 5, which is discussed in Paper I. As expected from Lemma 2 and Theorem 5 in paper I, we find the attained values to fall between the distances in the metric and our sharpened DK values. Of particular interest are the values attained in simulation number 10 in Fig. 5(b), where we find the distance of the spaces spanned by the eigenvectors to come very close to 1 in the metric and our sharpened bound to be very close to tight in this instance.

### V-E Sample Covariance and Population Covariance Matrices

Our final example of the application of our sharpened DK bound is in the setting of Principal Component Analysis (PCA). Consider independent, identically distributed samples

from a multivariate normal distribution of dimension

with mean and covariance matrix Then let be the matrix with columns with . We denote the sample covariance matrix by When a low dimensional structure truly generates the covariance matrix, PCA is the correct tool to recover this low dimensional space. The standard PCA algorithm maps the data into the space spanned by the eigenvectors corresponding to the largest eigenvalues of In this setting it is of interest to study the convergence of the space spanned by these leading eigenvectors of to the leading eigenvectors spanning the true low dimensional covariance space of . Using a so-called spiked covariance model, we will apply our bound to the spaces spanned by eigenvectors corresponding to the largest eigenvalues of and .

The spiked covariance model was first introduced by [14], who described a phenomenon in real world data where the largest eigenvalues of the covariance matrix are separated by a large eigengap from the rest of the spectrum. [15] proved consistency of the first eigenvectors of the estimated sample covariance towards the population covariance in the case of zero mean normally-distributed data and under certain conditions on the growth of the largest eigenvalues with growing dimensions and The spiked covariance model has been found to be implied by the factor model [8, 24, 26], which models a multivariate time series as being driven by a few main factors. The factor model and consequently the spiked covariance model, find application to financial data.

In our parametrisation of the spiked covariance model determines the dimension of the low-dimensional latent space in which is generated, encodes the latent dimension membership (similar to the stochastic blockmodel) and encodes the correlations between latent dimensions. has to be a valid covariance matrix, so must be symmetric positive definite. As for the stochastic blockmodels, we chose to consist of only two values and take the form given in (23). We define the population covariance matrix as,

 Σ=MPMT+I.

By building our covariance matrix like this, we get a covariance matrix following the spiked covariance model, where eigenvalues are significantly larger than the rest of the spectrum ([14, 15]), the latter consisting of eigenvalues all equal to 1, as a result of adding into the covariance structure.

In Fig. 6(a) we plot our sharpened bound for spaces spanned by the eigenvectors corresponding to the largest 3 eigenvalues of and and in Fig. 6(b) and (c) the corresponding optimal transformation parameters and , respectively. Here In Fig. 6(a) we see the bound decreases as the sample size grows. This makes sense: more samples should improve the estimation performance of the sample covariance matrix and therefore the distance of the subspaces spanned by the first three eigenvectors of the sample and population covariance matrices should decrease. For samples we find that in a few cases the bound value 1 is attained. This situation was theoretically discussed in Proposition 1. It is nice to see that we do indeed find the bound to converge to 1 with diverging transformation parameters in the worst case in practice. In plots (b) and (c) we find the transformation parameters to converge to the identity transformation as grows.

The format of Fig. 7 follows Fig. 6 with the difference that we keep fixed at 100 and study the behaviour of our bound as grows, Interestingly, our bound remains fairly constant for the values of considered. The transformation parameters hover around the identity transformation with the uncertainty in the additive parameter increasing as increases.

Using and 25 simulations, Fig. 8 shows that our extended or sharpened DK bounds improve upon the usual DK bounds by roughly a factor of 2. In Proposition 1 it was discussed that bound values above 1 are non-informative. In Fig. 8 we observe our bound to consistently fall below 1, improving on the trivial bound; however, the usual DK bound attains values consistently above 1.

## Vi Summary and Conclusions

Paper I discussed the theory of polynomial transformations of Here we have concentrated on affine transformations which have the advantage of being monotone and hence the largest and smallest transformed eigenvalues in eigenvalue intervals are easily determined, e.g.,

 mini∈{1,…,r}(f(ϕi))={c1ϕ1+c0forc1≥0,c1ϕr+c0forc1<0.

The overall minimal bound is thus found by considering 2 different optimization problems, where affine transforms with and are treated separately.

By way of contrast, higher order transformations are not monotone in general. As a result, it is difficult to classify the resulting optimization problem to fall within a certain class of solvable optimization problems. Hence, we have here restricted ourselves to the practical implementation of affine transformations. We found that in comparisons amongst the graph shift operator matrices, and of graph shift operator matrices with their generating matrices (in a stochastic blockmodel setting), our fractional programming implementation of the affine DK bounds is superior to the standard DK bounds. The same was found when working with the eigenvectors of the sample and population covariance matrices in a spiked covariance model (for which the PCA algorithm is a well motivated analysis tool).

## Acknowledgment

The work of Johannes Lutzeyer is supported by the EPSRC (UK).

### -a Proof of Part 2 of Proposition 1

In this Appendix we first work out the limit of bound (15) as either or for comparisons with . Then we draw parallels to the comparisons involving the last eigenvectors, i.e., the case where

We begin by producing a lower and an upper bound for the cost function in (15). Using the matrix triangle and reverse-triangle inequalities and on the numerator of the cost function, gives

 ∣∣|c0|∥I∥2−∥−c1Φ+Ψ∥2∣∣δ ≤∥c1Φ+c0I−Ψ∥2δ (25) (26)

But so henceforth this term will be omitted. Here is any of and corresponding to the four subproblems which need to be solved in the affine case. For we encounter the issue that and are undefined and therefore, as is done in [28, p. 317] we set them equal to Then, (11), (12), (13) and (14) take the following form,

 δ1,+ =ψr+1−c1ϕr−c0, (27) δ1,− =ψr+1−c1ϕ1−c0, (28) δ2,+ =c1ϕr+1+c0−ψr, (29) δ2,− =ψ1−c1ϕr+1−c0. (30)

In order to work out the limit of the lower and upper bound in (25) as either or we use l’Hopital’s rule. For l’Hopital’s rule to apply we require both the numerator and the denominator of the lower and upper bound in (25) to tend to 0 or as either or . We find that both numerators and their common denominators do indeed tend to as either or , i.e.,

 limc0→±∞∣∣|c0|−∥−c1Φ+Ψ∥2∣∣ =∞, limc0→±∞|c0|+∥c1Φ−Ψ∥2 =∞, limc0→−∞δ1,+=limc0→−∞δ1,−=limc0→∞δ2,+ =limc0→−∞δ2,−=∞.

We start by evaluating the derivative with respect to of the numerator of both the lower and upper bound in (25) and of their denominators.

Consider firstly This derivative is: for for for and for

Next we note for and for

The derivative of the terms (27), (28), (29) and (30) follows trivially since they are linear functions of

 ∂∂c0δ1,+=∂∂c0δ1,−=∂∂c0δ2,−=−1;