Approximate Factor Models with Weaker Loadings

09/08/2021 ∙ by Jushan Bai, et al. ∙ Columbia University 0

Pervasive cross-section dependence is increasingly recognized as an appropriate characteristic of economic data and the approximate factor model provides a useful framework for analysis. Assuming a strong factor structure, early work established convergence of the principal component estimates of the factors and loadings to a rotation matrix. This paper shows that the estimates are still consistent and asymptotically normal for a broad range of weaker factor loadings, albeit at slower rates and under additional assumptions on the sample size. Standard inference procedures can be used except in the case of extremely weak loadings which has encouraging implications for empirical work. The simplified proofs are of independent interest.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Starting with Forni et al. (2000) and Stock and Watson (1998, 2002), a large body of research has been developed to estimate the latent common variations in large panels in which the units observed over periods are cross-sectionally correlated. A fundamental result shown in Bai and Ng (2002) is that the space spanned by the factors can be consistently estimated by the method of static principal components (APC) at rate . Bai (2003) then establishes asymptotic normality of the estimated factors up to a rotation matrix . The maintained assumption is that the factor structure is strong, meaning that if and are the latent factors and loadings, the matrices and are both positive definite in the limit. However, Onatski (2012) shows that the APC estimates are inconsistent when (without dividing by

) has a positive definite limit. This has generated a good deal of interest in determining the number of less pervasive factors. Some assume large idiosyncratic variances, some assume that the entries of

are non-zero but small, while others assume a sparse with many zero entries. See, for example, DeMol et al. (2008), Lettau and Pelger (2020), Uematsu and Yamagata (2019), Freyaldenhoven (2021). Though the term ‘weak factors’ is used in different ways, there is a presumption that the APC estimator has undesirable properties when the strong factor assumption fails. However, to our knowledge, there does not exist a clear statement of what those properties are.

In this paper, we consider the weaker condition that has a positive definite limit with . Since it is the strength of the loadings that is being weakened and positive definiteness of is maintained throughout, we use the terminology of weaker loadings. Our main result is that while the strong factor assumption of yields the fastest convergence rates possible, and the estimates are inconsistent when the loadings are extremely weak in the case of , the principal component estimator continues to be consistent when . For the factor space, we establish that , which depends on . However, for the common component, we obtain as in the strong factor case. In terms of distribution theory, we show that , , and are all asymptotically normal when . Though some additional assumptions on and are needed, is not required to grow at the same rate as , and knowledge of is not necessary for inference. The results have implications for factor augmented regressions, and more generally, for panel data models with cross-section dependence.

The convergence rate of the factor space we now obtain for is , which is faster than previously derived for the strong factor case. This is made possible by a different proof technique that also leads to significant simplifications, hence of independent interest.111An earlier version of the paper circulated as Simpler proofs for approximate factor models of large dimensions considers only. The simplifications come partly from using higher level assumptions, and partly from using approximations to the original rotation matrix which also make it possible to conduct inference using a representation of the asymptotic variance that the user deems most convenient.

The paper proceeds as follows. Section 2 sets up the econometric framework and presents three useful preliminary results. Section 3 studies consistent estimation of the factors, the loadings, and introduces four asymptotically equivalent rotation matrices. The distribution theory is given in Section 4. Implications of weaker loadings for factor augmented regressions are discussed.

Throughout, matrices are written in bold-face to distinguish them from vectors. As a matter of notation,

is the squared Frobenius norm of a matrix , denotes the squared spectral norm of , where

denotes the largest eigenvalue for a positive semi-definite matrix

. Note that , where . Thus when the rank is fixed, the two norms are equivalent in terms of asymptotic behavior.

2 The Econometric Setup

We use to index cross-section units and to index time series observations. Let be a

vector of random variables and

be a matrix. The normalized data

admit singular value decomposition (


where and . In the above, is a diagonal matrix of singular values arranged in descending order, are the corresponding left and right singular vectors respectively. By the Eckart and Young (1936) theorem, the best rank approximation of is . This is obtained without imposing probabilistic assumptions on the data.

We represent the data using a static factor model with factors. In matrix form,


To simplify notation, the subscripts indicating that is and is will be suppresed when the context is clear. The common component has reduced rank because and both have rank . The covariance matrix of takes the form

A strict factor model obtains when the errors are cross-sectionally and serially uncorrelated so that is a diagonal matrix. The classical factor model studied in Anderson and Rubin (1956) uses the stronger assumption that

is iid and normally distributed. For economic analysis, this error structure is overly restrictive. We work with the approximate factor model formulated in

Chamberlain and Rothschild (1983) which allows the idiosyncratic errors to be weakly correlated in both the cross-section and time series dimensions. In such a case, need not be a diagonal matrix.

2.1 The APC Estimator and Assumptions

Let and be the true values of and . The model for unit at time as

Letting and , the model for unit is

Estimation of and in an approximate factor model with factors proceeds by minimizing the sum of squared residuals:

As and are not separately identified, we impose the normalization restrictions


The solution is the (static) APC estimator defined as:


APC estimation of large dimensional approximate factor models must overcome two challenges not present in the classical factor analysis of Anderson and Rubin (1956). The first pertains to the fact that the errors are now allowed to be cross-sectionally correlated. The second issue arises because the covariance matrix of and the covariance of are of infinite dimensions when and are large. To study the properties of the APC estimates, we use to obtain:


But and thus . It follows that


Rearranging terms yields

where , , , , and . Stock and Watson (2002); Bai and Ng (2002); Bai (2003) established properties of the APC estimator by analyzing the four terms under certain assumptions, and this is by and large the approach that the literature has taken. In what follows, we work directly with the matrix norms of the terms in (5). This makes it possible to obtain simpler proofs under more general assumptions.

Assumption A1:

Let not depending on and and define

  • Mean independence: .

  • Weak (cross-sectional and serial) correlation in the errors.

    • ,

    • For all , , and for all ,

    • For all , and for all , .

    • .

Assumption A2:

(i) ;
(ii) , , for some with ;
(iii) the eigenvalues of are distinct.

Assumption A3:

For each , (i) , (ii) ; for each , (iii) , (iv) ; (v) .

Assumption A4:

As , for the same in Assumption A2.

Assumption A1.i uses mean independence in place of moment conditions on

as in previous work. Assumption A1.ii assumes weak time and cross-section dependence. Assumption A.1(d) is a bound on the maximum eigenvalues of

. For iid data with uniformly bounded fourth moments, the rate is implied by random matrix theory.

Moon and Weidner (2017) extends the case to data that are weakly correlated across and . We use it to obtain simpler proofs. Assumption A2 implies and . Parts of Assumption A3 also appear in Bai and Ng (2002). When the errors are independent, Assumptions A1 and A2 are enough to validate A3. The assumption should hold under weak cross-sectional and serial correlations. Allowing for weaker factors comes at a cost. As stated in Assumption A4, which is new, a small must be compensated by a larger .

The defining characteristic of an approximate factor model is that the first largest population eigenvalues of diverge with while all remaining eigenvalues of are zero, and all eigenvalues of are bounded. Previous works model the ‘diverge with ’ feature by assuming that and are positive definite in the limit. These two conditions have come to be known as the strong factor structure. Onatski (2012) considers the other extreme of . We accommodate weaker loadings but require that which nests the strong factor model as a special case. Note that the strength of the factor loadings affects the normalization of but not . Assumption A2.(ii) allows the eigenvalues of to diverge at a rate of with . Assumptions A3.(b) and (d) reflect the more general setup. Assumption A3 implies


DeMol et al. (2008) considers a setting in which the eigenvalues of are large relative to those of . As discussed in Onatski (2012), such a setup can be rewritten in terms of weaker loadings with . The focus of DeMol et al. (2008) is shrinkage priors in Bayesian estimation and the implications for forecasting. Lettau and Pelger (2020) considers a penalized APC estimator to account for pricing errors in expected returns. Uematsu and Yamagata (2019, 2020) assume in our notation that is a diagonal matrix with in the -th diagonal, where reflects the strength of factor , and for an unspecified choice of . Their main interest is in a new regularized estimator that captures sparsity of the loadings. Freyaldenhoven (2021) develops criteria to determine the number of factors of differing strength (local factors) and provides some results for the APC estimates when . We consider standard APC estimation without knowledge of and assume that it is the same for all factors, but we provide the complete distribution theory for , and , making explicit that is the limit of a matrix that depends on the data generating process. As we will now proceed to show, plays an important role in the asymptotic theory.

2.2 Useful Identities and Matrices

In the strong factor case, is a diagonal matrix of the largest eigenvalues of . The singular values of are those of divided by . In practice, each column of is transformed to have unit variance so is the fraction of variation in explained by factor . The following lemma shows that to accommodate weaker factors, must be scaled up by to have a limit matrix that is full rank.

Lemma 1

Let be a diagonal matrix consisting of the the ordered eigenvalues of . Under Assumption A, we have


Proper normalization of is key to accommodating . The diagonal matrix consists of the largest eigevalues of , and

Since the largest eigenvalue of is of order , the largest eigenvalue of the last matrix is bounded by . Furthermore, , and . A bound in spectral norm for the second matrix on the right hand side is

The third matrix is the transpose of the second. Thus, the largest eigenvalues of the last three matrices converges to zero. By the matrix perturbation theorem, the largest eigenvalues of are determined by the first matrix on the right hand side. The eigenvalues of this matrix are the same as those of

This matrix converges to whose eigenvalues are , proving the lemma.

Next, we turn to two matrices that will play important roles subsequently. The first is the covariance matrix . To obtain its limit, we multiply on each side of (5) and use the fact that to obtain


The right hand side converges to a positive definite matrix (thus invertible) by Lemma 1

. The last three matrices on the left hand side converges in probability to zero. In particular,



The limit on the left hand side is thus determined by the first matrix, ie.


The limit of can be obtained from this representatino.

Lemma 2

Under Assumption A,

  • , where

    consists of the eigenvectors of the matrix

    with .

  • For .

Part (i) is obtained by taking limit on each side of (10) and to yield . Since is diagonal with distinct elements, we can solve for as where consists of the orthonormal eigenvectors of the matrix . The matrix is full rank and invertible. The solution holds up to a column sign change, just like is determined up to a column sign change.

The rotation matrix , first derived in Stock and Watson (1998), has been used to evaluate the precision of . Bai (2003) shows that when . To accommodate weaker loadings, we consider

By assumption, the first matrix on the right hand side is invertible while the last two matrices are invertible by the previous lemmas. Hence . The matrix and its relation to are fundamental to the asymptotic theory in the strong factor case. Lemma 2 shows that the relations are unaffected when weaker loadings are allowed.

3 Consistent Estimation of the Factor Space

This section has three parts. Subsection 1 presents consistent estimation of the factors up to rotation by . Subsection 2 introduces fournew rotation matrices. Subsection 3 uses these new matrices to show consistent estimation of the loadings.

3.1 The Factors

To establish consistent estimation of for up to rotation by , we multiply to both sides of (5) and use the definition of to obtain


This implies

But by (7) and . Thus

Squaring it gives the following proposition.

Proposition 1

Under Assumption A, the following holds:

The result is stated in in squared Frobenius norm. For , Theorem 1 of Bai and Ng (2002) gives a convergence rate for the same quantity of . The proposition here uses a different proof to obtain a faster convergence rate of for the strong factor case of . Implications of the proposition will be discussed subsequently.

3.2 Equivalent Rotation Matrices

The rotation matrix is a product of three matrices and it is not easy to interpret. However, we can rewrite (8) as


As , the product of and

is an identity matrix up to an negligible term if it can be shown that the three terms inside the bracket is small. The next Lemma formalizes this result and shows that it also holds for four other rotation.

Lemma 3

Under Assumption A,

  • For , , where
    , and

  • .

Part (i), shown in the Appendix, establishes the error in approximating by while part (ii) considers four additional approximations that provide an intuitive interpretation of . For example, is coefficient matrix from projecting on the space spanned by and is asymptotically the fit from the projection. These alternative rotation matrices were used in Bai and Ng (2019) for . The above Lemma shows that they can still be used in place of when , but the adequacy of approximation will depend on .

3.3 The Loadings and the Common Component

The APC estimator satisfies and we already have We can now provide a simple consistency proof for . Multiply to both sides of to obtain . We have