1. Introduction
We study multivariate random and mixed effects linear models. As a simple example, consider a twin study measuring quantitative traits in individuals, consisting of pairs of identical twins. We may model the observed traits of the individual in the pair as
(1.1) 
Here,
is a deterministic vector of mean trait values in the population, and
are unobserved, independent random vectors modeling trait variation at the pair and individual levels. Assuming the absence of shared environment, the covariance matrices may be interpreted as the genetic and environmental components of variance.
Since the pioneering work of R. A. Fisher [Fis18], such models have been widely used to decompose the variation of quantitative traits into constituent variance components. Genetic variance is commonly further decomposed into additive, dominance, and epistatic components [Wri35]. Components of environmental variance may be individualspecific or potentially also shared within families or batches of an experimental protocol. In many applications, for example measuring the heritability of traits, predicting evolutionary response to selection, and correcting for confounding variation from experimental procedures, it is of interest to estimate the individual variance components [FM96, LW98, VHW08]. Classically, this may be done by examining the resemblance between relatives in simple classification designs [Fis18, CR48]. In modern genomewide association studies, where genotypes are observed at a set of genetic markers, this is often done using models which treat contributions of singlenucleotide polymorphisms to polygenic traits as independent and unobserved random effects [YLGV11, ZCS13, MLH15, LTBS15, FBSG15].
These types of mixed effects linear models are often applied in univariate contexts,
, to study the genetic basis of individual traits. However, certain questions arising in evolutionary biology require an understanding of the joint variation of multiple, and oftentimes many, quantitative phenotypes
[Lan79, LA83, Hou10, HGO10]. In such multivariate contexts, it is often natural to interpret the covariance matrices of the variance components in terms of their principal component decompositions [Blo07, BM15]. For example, the largest eigenvalues and effective rank of the additive genetic component of covariance indicate the extent to which evolutionary response to natural selection is genetically constrained to a lower dimensional phenotypic subspace, and the principal eigenvectors indicate likely directions of phenotypic response [MH05, HB06, WB09, HMB14, BAC15]. Similar interpretations apply to the spectral structure of variance components that capture variation due to genetic mutation [MAB15]. In studies involving geneexpression phenotypes, trait dimensionality in the several thousands is common [MCM14, CMA18].Motivated by these applications, we study in this work the spectral behavior of variance components estimates when the number of traits is large. To illustrate some of the problems that may arise, Figure 1
depicts the eigenvalues and principal eigenvector of the multivariate analysis of variance (MANOVA)
[SR74, SCM09] and multivariate restricted maximum likelihood (REML) [KP69, Mey91] estimates of in the balanced oneway model (1.1). REML estimates were computed by the postprocessing procedure described in [Ame85]. In this example, the true group covariance has rank one, representing a single direction of variation. The true error covariance also represents a single direction of variation which is partially aligned with that of , plus additional isotropic noise. Partial alignment of eigenvectors of with those of may be common, for example, in sibling designs where the additive genetic covariance contributes both to and . We observe several problematic phenomena concerning either the MANOVA or REML estimate :Eigenvalue dispersion. The eigenvalues of are widely
dispersed, even though all but one true eigenvalue of is nonzero.
Eigenvalue aliasing. The estimate exhibits multiple
outlier eigenvalues which indicate significant directions of variation, even
though the true matrix has rank one.
Eigenvalue bias. The largest eigenvalue of is biased
upwards from the true eigenvalue of .
Eigenvector aliasing. The principal eigenvector of
is not aligned with the true eigenvector of
, but rather is biased in the direction of the eigenvector
of .
Several eigenvalue shrinkage and rankreduced estimation procedures have been proposed to address some of these shortcomings, with associated simulation studies of their performance in lowtomoderate dimensions [HH81, KM04, MK05, MK08, MK10]. In this work, we will focus on higherdimensional applications and study these phenomena theoretically and from an asymptotic viewpoint.
We focus on MANOVAtype estimators, with particular attention on balanced classification designs where such estimators are canonically defined (cf. Section 5). We leave the study of REML and likelihoodbased estimation as an important avenue for future work. We consider the asymptotic regime where proportionally, and the number of realizations of each random effect also increases proportionally with and . In this setting, the dispersion of sample eigenvalues was studied in [FJ16]. We study here the latter three phenomena above, under the simplifying assumption of a spiked covariance model where the noise exhibited by each random effect is isotropic [Joh01]. It was observed in [FJ17] that in this isotropic setting, the equations describing eigenvalue dispersion reduce to the MarcenkoPastur equation [MP67], and we review this in Section 2.
In Section 3, we provide a probabilistic characterization of the behavior of outlier eigenvalues and eigenvectors. We show that in the presence of highdimensional noise, each outlier eigenvalue of a MANOVA estimate is close to an eigenvalue of a certain surrogate matrix which is a linear combination of different population variance components. When is an isolated eigenvalue, we show furthermore that it exhibits asymptotic Gaussian fluctuations on the scale, and its corresponding eigenvector is partially aligned with the eigenvector of this surrogate.
These results describe quantitatively the aliasing phenomena exhibited in Figure 1—eigenvalues and eigenvectors of the MANOVA estimate for one variance component may be influenced by the other components. In Section 4, we propose a new procedure for estimating the true principal eigenvalues and eigenvectors of a single variance component, by identifying alternative matrices in the linear span of the classical MANOVA estimates where the surrogate matrix depends only on the single component being estimated. We prove theoretically that the resulting eigenvalue estimates are consistent in the highdimensional asymptotic regime. The eigenvector estimates remain inconsistent due to the highdimensional noise, but we show that they are asymptotically void of aliasing effects. We provide finitesample simulations of the performance of this algorithm in the oneway design (1.1) for moderately large and .
Proofs are contained in Section 6 and Appendix A. Our probabilistic results are analogous to those regarding outlier eigenvalues and eigenvectors for the spiked sample covariance model, studied in [BBP05, BS06, Pau07, Nad08, BY08], and our proofs use the matrix perturbation approach of [Pau07] which is similar also to the approaches of [BGN11, BGGM11, BY12]. An extra ingredient needed in our proof is a deterministic approximation for arbitrary linear and quadratic functions of entries of the resolvent in the MarcenkoPastur model. We establish this for spectral arguments separated from the limiting support, building on the local laws for this setting in [BEK14, KY17] and using a fluctuation averaging idea inspired by [EYY11, EYY12, EKYY13a, EKYY13b]. We note that new qualitative phenomena emerge in our model which are not present in the setting of spiked sample covariance matrices—outliers may depend on the alignments between population spike eigenvectors in different variance components, and a single spike may generate multiple outliers. This latter phenomenon was observed in a different context in [BBC17], which studied sums and products of independent unitarily invariant matrices in spiked settings. We discuss two points of contact between our results and those of [BGN11] and [BBC17] in Examples 3.6 and 3.7.
Notational conventions
For a square matrix , is its multiset of eigenvalues (counting multiplicity). For a law on , we denote its closed support
is the standard basis vector,
is the identity matrix, and
is the all1’s column vector, where dimensions are understood from context. We use and to explicitly emphasize the dimension .is the Euclidean norm for vectors and the Euclidean operator norm for matrices. is the matrix HilbertSchmidt norm.
is the matrix tensor product. When
and are matrices, is shorthand for , where and are the rowwise vectorizations of and . is the transpose of , is its column span, and is its kernel or null space.For subspaces and , is the dimension of , is the orthogonal direct sum, and is the orthogonal complement of in .
For , we typically write where and . For , is the distance from to .
Acknowledgments
We thank quantitative geneticist Mark W. Blows for introducing us to this problem and its applications in evolutionary biology. ZF was supported by a Hertz Foundation Fellowship. IMJ was supported in part by grants NIH R01 EB001988 and NSF DMS 1407813. YS was supported by a Junior Fellow award from the Simons Foundation and NSF Grant DMS1701654. ZF and YS would like to acknowledge the Park City Mathematics Institute (NSF grant DMS:1441467) where part of this research was conducted.
2. Model
We consider observations of traits in individuals, modeled by a Gaussian mixed effects linear model
(2.1) 
The matrices are independent, with each matrix having independent rows, representing (unobserved) realizations of a dimensional random effect with distribution . The incidence matrix , which is known from the experimental protocol, determines how the random effect contributes to the observations . The first term models possible additional fixed effects, where is a known design matrix of regressors and contains the corresponding regression coefficients.
This model is usually written with an additional residual error term . We incorporate this by allowing the last random effect to be and . For example, the oneway model (1.1) corresponds to (2.1) where . Supposing there are groups of equal size , we set , , stack the vectors , , and as the rows of , , and , and identify
(2.2) 
Here, is a single all1’s regressor, and has columns indicating the groups. We discuss examples with random effects in Section 5.
Under the general model (2.1),
has the multivariate normal distribution
(2.3) 
The unknown parameters of the model are . We study estimators of which are invariant to and take the form
(2.4) 
where the estimation matrix is symmetric and satisfies . To obtain an estimate of , observe that for any matrix . Then, as are independent with mean 0,
(2.5) 
So
is an unbiased estimate of
when satisfies and for all .In balanced classification designs, discussed in greater detail in Section 5, the classical MANOVA estimators are obtained by setting to be combinations of projections onto subspaces of . For example, in the oneway model corresponding to (2.2), defining as the orthogonal projections onto and , the MANOVA estimators of and are given by
(2.6) 
In unbalanced designs and more general models, various alternative choices of lead to estimators in the generalized MANOVA [SCM09] and MINQUE/MIVQUE families [Rao72, LaM73, SS78].
We study spectral properties of the matrix (2.4) in a highdimensional asymptotic regime, assuming a spiked model for each variance component .
Assumption 2.1.
The number of effects is fixed while . There are constants such that

(Number of traits) .

(Model design) and for each .

(Estimation matrix) , , and .

(Spiked covariance) For each ,
where has orthonormal columns, is diagonal, , , and . (We set when .)
Under Assumption 2.1(d), each has an isotropic noise level (possibly 0 if is lowrank) and a bounded number of signal eigenvalues greater than this noise level. We allow , , , and to vary with and . We will be primarily interested in scenarios where at least one variance is of size , although let us remark that setting also recovers the classical lowdimensional asymptotic regime where the ambient dimension of the data is bounded as .
In classification designs, Assumption 2.1(b) holds when the number of outermost groups is proportional to , and groups (and subgroups) are bounded in size. This encompasses typical designs in classical settings [Rob59a, Rob59b], and we discuss several examples in Section 5. In models where is a matrix of genotype values at SNPs, Assumption 2.1(b) holds if and is entrywise bounded by . This latter condition is satisfied if genotypes at each SNP are normalized to mean 0 and variance , and SNPs with minor allele frequency below a constant threshold are removed. Under Assumption 2.1(b), the scaling in Assumption 2.1(c) is then natural to ensure is bounded for each , and hence is on the same scale as .
Throughout, we denote by the combined column span of , where if . and denote the orthogonal projections onto and its orthogonal complement. We set
and define a block matrix by
(2.7) 
The “null” setting of no spikes, , was studied in [FJ17], which made the following simple observation.
Proposition 2.2.
If , then is equal in law to where has i.i.d. entries.
Proof.
If , we may write , where has i.i.d. entries. Then, applying ,
The result follows upon stacking rowwise as . ∎
In this case, the asymptotic spectrum of is described by the MarcenkoPastur equation:
Theorem 2.3.
For each , there is a unique value which satisfies
(2.8) 
This function
defines the Stieltjes transform of a probability distribution
on .Under Assumption 2.1, if , then weakly almost surely as , where is the empirical distribution of eigenvalues of .
Proof.
Denote the neighborhood of the support of by
We emphasize that and its support depend on , although we suppress this dependence notationally. Then, if , all eigenvalues of fall within with high probability:
Theorem 2.4.
Fix any constants . Under Assumption 2.1, if , then for a constant and all ,
Proof.
More generally, when so that exhibit a bounded number of spike eigenvalues, the bulk eigenvalue distribution of is still described by the above law , and Theorem 2.4 implies that only a bounded number of eigenvalues of should fall far from . These outlier eigenvalues and their corresponding eigenvectors are the focus of our study.
3. Outlier eigenvalues and eigenvectors
In this section, we describe results that characterize the asymptotic behavior of outlier eigenvalues and eigenvectors of a general matrix of the form (2.4).
Let be the Stieltjes transform of the law in Theorem 2.3, defined for all via
(3.1) 
Let denote the trace of the block in the block decomposition of corresponding to . For , define
(3.2) 
Here, if , then remains welldefined by the identity
(3.3) 
and the definition of in (2.7). Let
(3.4) 
be the multiset of real roots of the function , counted with their analytic multiplicities. We record here the following alternative definition of , and properties of and .
Proposition 3.1 (Properties of ).

The matrix is equivalently defined as
(3.5) 
For each , .

For , is positive semidefinite.

For , its multiplicity as a root of is equal to .
Proof.
By conjugation symmetry and continuity, the MarcenkoPastur identity (2.8) holds for each . Part (a) then follows from substituting into (3.2) and applying (2.8). Part (b) follows from (a), as is the direct sum of an operator on and a nonzero multiple of on the orthogonal complement . Differentiating (3.1), for each , so . Then part (c) follows from (3.2). For , this implies each eigenvalue of satisfies as , so for . This yields (d). ∎
For two finite multisets , define
where and are the ordered values of and counting multiplicity. The following shows that the outlier eigenvalues of are close to the elements of . Note that by (3.2), each is an eigenvalue of the matrix
(3.6) 
When is the MANOVA estimator of a variance component , we may interpret this matrix as a “surrogate” for the true matrix of interest.
Theorem 3.2 (Outlier locations).
Fix constants . Under Assumption 2.1, for a constant and all , with probability at least there exist and , containing all elements of these multisets outside , such that
The multiset represents a theoretical prediction for the locations of the outlier eigenvalues of —this is depicted in Figure 2 for an example of the oneway design. We clarify that is deterministic but dependent, and it may contain values arbitrarily close to . Hence we state the result as a matching between two sets and rather than the convergence of outlier eigenvalues of to a fixed set . We allow and to contain values within so as to match values of the other set close to the boundaries of .
Remark.
In the setting of sample covariance matrices
for i.i.d. multivariate samples, there is a phase transition phenomenon in which spike values greater than a certain threshold yield outlier eigenvalues in
, while spike values less than this threshold do not [BBP05, BS06, Pau07]. This phenomenon occurs also in our setting and is implicitly captured by the cardinality , which represents the number of predicted outlier eigenvalues of . In particular, will be empty if the spike values of are sufficiently small. However, the phase transition thresholds and predicted outlier eigenvalue locations in our setting are defined jointly by and the alignments between , rather than by the individual spectra of .We next describe eigenvector projections and eigenvalue fluctuations for isolated outliers.
Theorem 3.3 (Eigenvector projections).
Fix constants . Suppose has multiplicity one, and for all other . Let be the unit vector in , and let be the unit eigenvector of the eigenvalue of closest to . Then, under Assumption 2.1,

For all and some choice of sign for , with probability at least ,
Thus represents a theoretical prediction for the projection of the sample eigenvector onto the subspace —this is also displayed in Figure 2 for the oneway design. Here, is the predicted innerproduct alignment between and , which by Proposition 3.1(c) is at most 1.
Next, let denote the HilbertSchmidt norm of the block in the block decomposition of corresponding to . Define
(3.7) 
where this is again welldefined by (3.3) even if and/or .
Theorem 3.4 (Gaussian fluctuations).
Fix a constant . Suppose has multiplicity one, and for all other . Let be the unit vector in , and let be the eigenvalue of closest to . Then under Assumption 2.1,
where
Furthermore, for a constant .
Figure 3 illustrates the accuracy of this Gaussian approximation for two settings of the oneway design. We observe that the approximation is fairly accurate in a setting with a single outlier, but (in the simulated sample sizes and
) does not adequately capture a skew in the outlier distribution in a setting with an additional positive outlier produced by a large spike in
. This skew is reduced in examples where there is increased separation between these two positive outliers.across 10000 simulations, compared with the density function and quantiles of the Gaussian distribution with mean and variance given in Theorem
3.4. The simulated setting is groups of size , traits, and (top) and where , or (bottom) and .Example 3.5.
In the setting of large population spike eigenvalues, it is illustrative to understand the predictions of Theorem 3.2 using a Taylor expansion. Let us carry this out for the MANOVA estimator in the setting of a balanced oneway design (1.1) with groups of individuals.
Recalling the form (2.6) for , the computation in Proposition 5.4(b) for general balanced designs will yield, in this setting, the explicit expressions
Suppose first that there is a single large spike eigenvalue in , and no spike eigenvalues in . Theorem 3.2 and the form (3.5) for indicate that outlier eigenvalues should appear near the locations
It is known that is injective on (see [SC95, Theorems 4.1 and 4.2]). Hence is also injective by the above explicit form, so . Applying a Taylor expansion around , we obtain from (2.8)
where . For large and , solving yields
So we expect to observe one outlier with an approximate upward bias of .
Next, suppose there is a single large spike eigenvalue in , and no spike eigenvalues in . Then we expect outlier eigenvalues near the locations
Since is injective and the condition is quadratic in , we obtain . Taylor expanding around , we have after some simplification
Then for large , solving yields two predicted outlier eigenvalues near
Let us emphasize that these predictions are in the asymptotic regime where and is a large but fixed constant, rather than jointly with .
Finally, consider a single spike in and a single spike in . Letting the corresponding spike eigenvectors have innerproduct , we expect outliers near
This is a cubic condition in , so . Applying the above Taylor expansions around , this condition becomes
In a setting where and are large and of comparable size, there is a predicted outlier near . More precisely, expanding the above around , the location of this outlier is
Thus the upward bias of this outlier is increased from , when there are no spikes in , to .
We conclude this section by describing two points of contact between Theorem 3.2 and the results of [BGN11] and [BBC17].
Example 3.6.
Consider the model (2.1) with , , , and . Then (2.7) yields . Writing where has i.i.d. entries, we then simply have
The law approximates the empirical spectral distribution of . Applying (3.2), (2.8), and (3.1) we obtain
The function is the “transform” of . By the form (3.5), the outlier eigenvalues of are predicted by
where are the diagonal entries of . This matches the multiplicative perturbation result of [BGN11, Theorem 2.7]. Depending on , the function is not necessarily injective over , and hence a single spike can generate multiple outlier eigenvalues of even in this simple setting.
Example 3.7.
Consider the model (2.1) with , , , and the columns of together forming an orthonormal basis of . (Thus , , and .) Consider and
for any real scalars . Then . Writing , we have