Spiked covariances and principal components analysis in high-dimensional random effects models

06/25/2018
by   Zhou Fan, et al.
Stanford University
0

We study principal components analyses in multivariate random and mixed effects linear models, assuming a spherical-plus-spikes structure for the covariance matrix of each random effect. We characterize the behavior of outlier sample eigenvalues and eigenvectors of MANOVA variance components estimators in such models under a high-dimensional asymptotic regime. Our results show that an aliasing phenomenon may occur in high dimensions, in which eigenvalues and eigenvectors of the MANOVA estimate for one variance component may be influenced by the other components. We propose an alternative procedure for estimating the true principal eigenvalues and eigenvectors that asymptotically corrects for this aliasing problem.

READ FULL TEXT VIEW PDF
03/22/2019

Principal components in linear mixed models with general bulk

We study the outlier eigenvalues and eigenvectors in variance components...
07/29/2019

Principal components of spiked covariance matrices in the supercritical regime

In this paper, we study the asymptotic behavior of the extreme eigenvalu...
01/24/2021

Testing for subsphericity when n and p are of different asymptotic order

In this short note, we extend a classical test of subsphericity, based o...
08/06/2021

Estimating Graph Dimension with Cross-validated Eigenvalues

In applied multivariate statistics, estimating the number of latent dime...
04/06/2021

Large factor model estimation by nuclear norm plus l_1 norm penalization

This paper provides a comprehensive estimation framework via nuclear nor...
04/16/2019

Constraints in Random Effects Age-Period-Cohort Models

Random effects (RE) models have been widely used to study the contextual...
02/10/2021

On high-dimensional wavelet eigenanalysis

In this paper, we mathematically construct wavelet eigenanalysis in high...

1. Introduction

We study multivariate random and mixed effects linear models. As a simple example, consider a twin study measuring quantitative traits in individuals, consisting of pairs of identical twins. We may model the observed traits of the individual in the pair as

(1.1)

Here,

is a deterministic vector of mean trait values in the population, and

are unobserved, independent random vectors modeling trait variation at the pair and individual levels. Assuming the absence of shared environment, the covariance matrices may be interpreted as the genetic and environmental components of variance.

Since the pioneering work of R. A. Fisher [Fis18], such models have been widely used to decompose the variation of quantitative traits into constituent variance components. Genetic variance is commonly further decomposed into additive, dominance, and epistatic components [Wri35]. Components of environmental variance may be individual-specific or potentially also shared within families or batches of an experimental protocol. In many applications, for example measuring the heritability of traits, predicting evolutionary response to selection, and correcting for confounding variation from experimental procedures, it is of interest to estimate the individual variance components [FM96, LW98, VHW08]. Classically, this may be done by examining the resemblance between relatives in simple classification designs [Fis18, CR48]. In modern genome-wide association studies, where genotypes are observed at a set of genetic markers, this is often done using models which treat contributions of single-nucleotide polymorphisms to polygenic traits as independent and unobserved random effects [YLGV11, ZCS13, MLH15, LTBS15, FBSG15].

These types of mixed effects linear models are often applied in univariate contexts,

, to study the genetic basis of individual traits. However, certain questions arising in evolutionary biology require an understanding of the joint variation of multiple, and oftentimes many, quantitative phenotypes

[Lan79, LA83, Hou10, HGO10]. In such multivariate contexts, it is often natural to interpret the covariance matrices of the variance components in terms of their principal component decompositions [Blo07, BM15]. For example, the largest eigenvalues and effective rank of the additive genetic component of covariance indicate the extent to which evolutionary response to natural selection is genetically constrained to a lower dimensional phenotypic subspace, and the principal eigenvectors indicate likely directions of phenotypic response [MH05, HB06, WB09, HMB14, BAC15]. Similar interpretations apply to the spectral structure of variance components that capture variation due to genetic mutation [MAB15]. In studies involving gene-expression phenotypes, trait dimensionality in the several thousands is common [MCM14, CMA18].

Figure 1. Eigenvalues and principal eigenvector of the MANOVA and REML estimates of in a one-way design with groups of size and traits. The true group covariance is (rank one), and the true error covariance is where . Histograms display eigenvalues averaged across 100 simulations. The rightmost plot displays the empirical mean and 90% ellipsoids for the first two coordinates of the unit-norm principal eigenvector (MANOVA in red and REML in blue), with and shown in black.

Motivated by these applications, we study in this work the spectral behavior of variance components estimates when the number of traits is large. To illustrate some of the problems that may arise, Figure 1

depicts the eigenvalues and principal eigenvector of the multivariate analysis of variance (MANOVA)

[SR74, SCM09] and multivariate restricted maximum likelihood (REML) [KP69, Mey91] estimates of in the balanced one-way model (1.1). REML estimates were computed by the post-processing procedure described in [Ame85]. In this example, the true group covariance has rank one, representing a single direction of variation. The true error covariance also represents a single direction of variation which is partially aligned with that of , plus additional isotropic noise. Partial alignment of eigenvectors of with those of may be common, for example, in sibling designs where the additive genetic covariance contributes both to and . We observe several problematic phenomena concerning either the MANOVA or REML estimate :

Eigenvalue dispersion. The eigenvalues of are widely dispersed, even though all but one true eigenvalue of is non-zero.

Eigenvalue aliasing. The estimate exhibits multiple outlier eigenvalues which indicate significant directions of variation, even though the true matrix has rank one.

Eigenvalue bias. The largest eigenvalue of is biased upwards from the true eigenvalue of .

Eigenvector aliasing. The principal eigenvector of is not aligned with the true eigenvector of , but rather is biased in the direction of the eigenvector of .

Several eigenvalue shrinkage and rank-reduced estimation procedures have been proposed to address some of these shortcomings, with associated simulation studies of their performance in low-to-moderate dimensions [HH81, KM04, MK05, MK08, MK10]. In this work, we will focus on higher-dimensional applications and study these phenomena theoretically and from an asymptotic viewpoint.

We focus on MANOVA-type estimators, with particular attention on balanced classification designs where such estimators are canonically defined (cf. Section 5). We leave the study of REML and likelihood-based estimation as an important avenue for future work. We consider the asymptotic regime where proportionally, and the number of realizations of each random effect also increases proportionally with and . In this setting, the dispersion of sample eigenvalues was studied in [FJ16]. We study here the latter three phenomena above, under the simplifying assumption of a spiked covariance model where the noise exhibited by each random effect is isotropic [Joh01]. It was observed in [FJ17] that in this isotropic setting, the equations describing eigenvalue dispersion reduce to the Marcenko-Pastur equation [MP67], and we review this in Section 2.

In Section 3, we provide a probabilistic characterization of the behavior of outlier eigenvalues and eigenvectors. We show that in the presence of high-dimensional noise, each outlier eigenvalue of a MANOVA estimate is close to an eigenvalue of a certain surrogate matrix which is a linear combination of different population variance components. When is an isolated eigenvalue, we show furthermore that it exhibits asymptotic Gaussian fluctuations on the scale, and its corresponding eigenvector is partially aligned with the eigenvector of this surrogate.

These results describe quantitatively the aliasing phenomena exhibited in Figure 1—eigenvalues and eigenvectors of the MANOVA estimate for one variance component may be influenced by the other components. In Section 4, we propose a new procedure for estimating the true principal eigenvalues and eigenvectors of a single variance component, by identifying alternative matrices in the linear span of the classical MANOVA estimates where the surrogate matrix depends only on the single component being estimated. We prove theoretically that the resulting eigenvalue estimates are consistent in the high-dimensional asymptotic regime. The eigenvector estimates remain inconsistent due to the high-dimensional noise, but we show that they are asymptotically void of aliasing effects. We provide finite-sample simulations of the performance of this algorithm in the one-way design (1.1) for moderately large and .

Proofs are contained in Section 6 and Appendix A. Our probabilistic results are analogous to those regarding outlier eigenvalues and eigenvectors for the spiked sample covariance model, studied in [BBP05, BS06, Pau07, Nad08, BY08], and our proofs use the matrix perturbation approach of [Pau07] which is similar also to the approaches of [BGN11, BGGM11, BY12]. An extra ingredient needed in our proof is a deterministic approximation for arbitrary linear and quadratic functions of entries of the resolvent in the Marcenko-Pastur model. We establish this for spectral arguments separated from the limiting support, building on the local laws for this setting in [BEK14, KY17] and using a fluctuation averaging idea inspired by [EYY11, EYY12, EKYY13a, EKYY13b]. We note that new qualitative phenomena emerge in our model which are not present in the setting of spiked sample covariance matrices—outliers may depend on the alignments between population spike eigenvectors in different variance components, and a single spike may generate multiple outliers. This latter phenomenon was observed in a different context in [BBC17], which studied sums and products of independent unitarily invariant matrices in spiked settings. We discuss two points of contact between our results and those of [BGN11] and [BBC17] in Examples 3.6 and 3.7.

Notational conventions

For a square matrix , is its multiset of eigenvalues (counting multiplicity). For a law on , we denote its closed support

is the standard basis vector,

is the identity matrix, and

is the all-1’s column vector, where dimensions are understood from context. We use and to explicitly emphasize the dimension .

is the Euclidean norm for vectors and the Euclidean operator norm for matrices. is the matrix Hilbert-Schmidt norm.

is the matrix tensor product. When

and are matrices, is shorthand for , where and are the row-wise vectorizations of and . is the transpose of , is its column span, and is its kernel or null space.

For subspaces and , is the dimension of , is the orthogonal direct sum, and is the orthogonal complement of in .

For , we typically write where and . For , is the distance from to .

Acknowledgments

We thank quantitative geneticist Mark W. Blows for introducing us to this problem and its applications in evolutionary biology. ZF was supported by a Hertz Foundation Fellowship. IMJ was supported in part by grants NIH R01 EB001988 and NSF DMS 1407813. YS was supported by a Junior Fellow award from the Simons Foundation and NSF Grant DMS-1701654. ZF and YS would like to acknowledge the Park City Mathematics Institute (NSF grant DMS:1441467) where part of this research was conducted.

2. Model

We consider observations of traits in individuals, modeled by a Gaussian mixed effects linear model

(2.1)

The matrices are independent, with each matrix having independent rows, representing (unobserved) realizations of a -dimensional random effect with distribution . The incidence matrix , which is known from the experimental protocol, determines how the random effect contributes to the observations . The first term models possible additional fixed effects, where is a known design matrix of regressors and contains the corresponding regression coefficients.

This model is usually written with an additional residual error term . We incorporate this by allowing the last random effect to be and . For example, the one-way model (1.1) corresponds to (2.1) where . Supposing there are groups of equal size , we set , , stack the vectors , , and as the rows of , , and , and identify

(2.2)

Here, is a single all-1’s regressor, and has columns indicating the groups. We discuss examples with random effects in Section 5.

Under the general model (2.1),

has the multivariate normal distribution

(2.3)

The unknown parameters of the model are . We study estimators of which are invariant to and take the form

(2.4)

where the estimation matrix is symmetric and satisfies . To obtain an estimate of , observe that for any matrix . Then, as are independent with mean 0,

(2.5)

So

is an unbiased estimate of

when satisfies and for all .

In balanced classification designs, discussed in greater detail in Section 5, the classical MANOVA estimators are obtained by setting to be combinations of projections onto subspaces of . For example, in the one-way model corresponding to (2.2), defining as the orthogonal projections onto and , the MANOVA estimators of and are given by

(2.6)

In unbalanced designs and more general models, various alternative choices of lead to estimators in the generalized MANOVA [SCM09] and MINQUE/MIVQUE families [Rao72, LaM73, SS78].

We study spectral properties of the matrix (2.4) in a high-dimensional asymptotic regime, assuming a spiked model for each variance component .

Assumption 2.1.

The number of effects is fixed while . There are constants such that

  1. (Number of traits) .

  2. (Model design) and for each .

  3. (Estimation matrix) , , and .

  4. (Spiked covariance) For each ,

    where has orthonormal columns, is diagonal, , , and . (We set when .)

Under Assumption 2.1(d), each has an isotropic noise level (possibly 0 if is low-rank) and a bounded number of signal eigenvalues greater than this noise level. We allow , , , and to vary with and . We will be primarily interested in scenarios where at least one variance is of size , although let us remark that setting also recovers the classical low-dimensional asymptotic regime where the ambient dimension of the data is bounded as .

In classification designs, Assumption 2.1(b) holds when the number of outer-most groups is proportional to , and groups (and sub-groups) are bounded in size. This encompasses typical designs in classical settings [Rob59a, Rob59b], and we discuss several examples in Section 5. In models where is a matrix of genotype values at SNPs, Assumption 2.1(b) holds if and is entrywise bounded by . This latter condition is satisfied if genotypes at each SNP are normalized to mean 0 and variance , and SNPs with minor allele frequency below a constant threshold are removed. Under Assumption 2.1(b), the scaling in Assumption 2.1(c) is then natural to ensure is bounded for each , and hence is on the same scale as .

Throughout, we denote by the combined column span of , where if . and denote the orthogonal projections onto and its orthogonal complement. We set

and define a block matrix by

(2.7)

The “null” setting of no spikes, , was studied in [FJ17], which made the following simple observation.

Proposition 2.2.

If , then is equal in law to where has i.i.d.  entries.

Proof.

If , we may write , where has i.i.d.  entries. Then, applying ,

The result follows upon stacking row-wise as . ∎

In this case, the asymptotic spectrum of is described by the Marcenko-Pastur equation:

Theorem 2.3.

For each , there is a unique value which satisfies

(2.8)

This function

defines the Stieltjes transform of a probability distribution

on .

Under Assumption 2.1, if , then weakly almost surely as , where is the empirical distribution of eigenvalues of .

Proof.

See [MP67, Sil95, SB95] in the setting where converges to a positive constant and the spectral distribution of converges to a fixed limit. The above formulation follows from Prohorov’s theorem and a subsequence argument. ∎

Denote the -neighborhood of the support of by

We emphasize that and its support depend on , although we suppress this dependence notationally. Then, if , all eigenvalues of fall within with high probability:

Theorem 2.4.

Fix any constants . Under Assumption 2.1, if , then for a constant and all ,

Proof.

See [BS98, KY17] for positive definite , and [FJ17, Theorem 2.8] for the extension to having negative eigenvalues. ∎

More generally, when so that exhibit a bounded number of spike eigenvalues, the bulk eigenvalue distribution of is still described by the above law , and Theorem 2.4 implies that only a bounded number of eigenvalues of should fall far from . These outlier eigenvalues and their corresponding eigenvectors are the focus of our study.

3. Outlier eigenvalues and eigenvectors

In this section, we describe results that characterize the asymptotic behavior of outlier eigenvalues and eigenvectors of a general matrix of the form (2.4).

Let be the Stieltjes transform of the law in Theorem 2.3, defined for all via

(3.1)

Let denote the trace of the block in the block decomposition of corresponding to . For , define

(3.2)

Here, if , then remains well-defined by the identity

(3.3)

and the definition of in (2.7). Let

(3.4)

be the multiset of real roots of the function , counted with their analytic multiplicities. We record here the following alternative definition of , and properties of and .

Proposition 3.1 (Properties of ).
  1. The matrix is equivalently defined as

    (3.5)
  2. For each , .

  3. For , is positive semi-definite.

  4. For , its multiplicity as a root of is equal to .

Proof.

By conjugation symmetry and continuity, the Marcenko-Pastur identity (2.8) holds for each . Part (a) then follows from substituting into (3.2) and applying (2.8). Part (b) follows from (a), as is the direct sum of an operator on and a non-zero multiple of on the orthogonal complement . Differentiating (3.1), for each , so . Then part (c) follows from (3.2). For , this implies each eigenvalue of satisfies as , so for . This yields (d). ∎

For two finite multisets , define

where and are the ordered values of and counting multiplicity. The following shows that the outlier eigenvalues of are close to the elements of . Note that by (3.2), each is an eigenvalue of the matrix

(3.6)

When is the MANOVA estimator of a variance component , we may interpret this matrix as a “surrogate” for the true matrix of interest.

Theorem 3.2 (Outlier locations).

Fix constants . Under Assumption 2.1, for a constant and all , with probability at least there exist and , containing all elements of these multisets outside , such that

The multiset represents a theoretical prediction for the locations of the outlier eigenvalues of —this is depicted in Figure 2 for an example of the one-way design. We clarify that is deterministic but -dependent, and it may contain values arbitrarily close to . Hence we state the result as a matching between two sets and rather than the convergence of outlier eigenvalues of to a fixed set . We allow and to contain values within so as to match values of the other set close to the boundaries of .

Figure 2. Outlier predictions for the MANOVA estimate in a one-way design. The population covariances are and , where . Left: Mean eigenvalue locations of across 10000 simulations, with black dots on the axis indicating the predicted values . Right: Means and 90% ellipsoids for the projections of the three outlier eigenvectors onto , with black dots indicating the predictions of Theorem 3.3. The simulated setting is groups of size , and traits.
Remark.

In the setting of sample covariance matrices

for i.i.d. multivariate samples, there is a phase transition phenomenon in which spike values greater than a certain threshold yield outlier eigenvalues in

, while spike values less than this threshold do not [BBP05, BS06, Pau07]. This phenomenon occurs also in our setting and is implicitly captured by the cardinality , which represents the number of predicted outlier eigenvalues of . In particular, will be empty if the spike values of are sufficiently small. However, the phase transition thresholds and predicted outlier eigenvalue locations in our setting are defined jointly by and the alignments between , rather than by the individual spectra of .

We next describe eigenvector projections and eigenvalue fluctuations for isolated outliers.

Theorem 3.3 (Eigenvector projections).

Fix constants . Suppose has multiplicity one, and for all other . Let be the unit vector in , and let be the unit eigenvector of the eigenvalue of closest to . Then, under Assumption 2.1,

  1. For all and some choice of sign for , with probability at least ,

  2. is uniformly distributed over unit vectors in

    and is independent of .

Thus represents a theoretical prediction for the projection of the sample eigenvector onto the subspace —this is also displayed in Figure 2 for the one-way design. Here, is the predicted inner-product alignment between and , which by Proposition 3.1(c) is at most 1.

Next, let denote the Hilbert-Schmidt norm of the block in the block decomposition of corresponding to . Define

(3.7)

where this is again well-defined by (3.3) even if and/or .

Theorem 3.4 (Gaussian fluctuations).

Fix a constant . Suppose has multiplicity one, and for all other . Let be the unit vector in , and let be the eigenvalue of closest to . Then under Assumption 2.1,

where

Furthermore, for a constant .

Figure 3 illustrates the accuracy of this Gaussian approximation for two settings of the one-way design. We observe that the approximation is fairly accurate in a setting with a single outlier, but (in the simulated sample sizes and

) does not adequately capture a skew in the outlier distribution in a setting with an additional positive outlier produced by a large spike in

. This skew is reduced in examples where there is increased separation between these two positive outliers.

Figure 3. Outlier eigenvalue fluctuations in a one-way design. Displayed are fluctuations of the largest outlier eigenvalue of

across 10000 simulations, compared with the density function and quantiles of the Gaussian distribution with mean and variance given in Theorem

3.4. The simulated setting is groups of size , traits, and (top) and where , or (bottom) and .
Example 3.5.

In the setting of large population spike eigenvalues, it is illustrative to understand the predictions of Theorem 3.2 using a Taylor expansion. Let us carry this out for the MANOVA estimator in the setting of a balanced one-way design (1.1) with groups of individuals.

Recalling the form (2.6) for , the computation in Proposition 5.4(b) for general balanced designs will yield, in this setting, the explicit expressions

Suppose first that there is a single large spike eigenvalue in , and no spike eigenvalues in . Theorem 3.2 and the form (3.5) for indicate that outlier eigenvalues should appear near the locations

It is known that is injective on (see [SC95, Theorems 4.1 and 4.2]). Hence is also injective by the above explicit form, so . Applying a Taylor expansion around , we obtain from (2.8)

where . For large and , solving yields

So we expect to observe one outlier with an approximate upward bias of .

Next, suppose there is a single large spike eigenvalue in , and no spike eigenvalues in . Then we expect outlier eigenvalues near the locations

Since is injective and the condition is quadratic in , we obtain . Taylor expanding around , we have after some simplification

Then for large , solving yields two predicted outlier eigenvalues near

Let us emphasize that these predictions are in the asymptotic regime where and is a large but fixed constant, rather than jointly with .

Finally, consider a single spike in and a single spike in . Letting the corresponding spike eigenvectors have inner-product , we expect outliers near

This is a cubic condition in , so . Applying the above Taylor expansions around , this condition becomes

In a setting where and are large and of comparable size, there is a predicted outlier near . More precisely, expanding the above around , the location of this outlier is

Thus the upward bias of this outlier is increased from , when there are no spikes in , to .

We conclude this section by describing two points of contact between Theorem 3.2 and the results of [BGN11] and [BBC17].

Example 3.6.

Consider the model (2.1) with , , , and . Then (2.7) yields . Writing where has i.i.d.  entries, we then simply have

The law approximates the empirical spectral distribution of . Applying (3.2), (2.8), and (3.1) we obtain

The function is the “-transform” of . By the form (3.5), the outlier eigenvalues of are predicted by

where are the diagonal entries of . This matches the multiplicative perturbation result of [BGN11, Theorem 2.7]. Depending on , the function is not necessarily injective over , and hence a single spike can generate multiple outlier eigenvalues of even in this simple setting.

Example 3.7.

Consider the model (2.1) with , , , and the columns of together forming an orthonormal basis of . (Thus , , and .) Consider and

for any real scalars . Then . Writing , we have