Pervasive cross-section dependence is increasingly recognized as an appropriate characteristic of economic data and the approximate factor model provides a useful framework for analysis. Assuming a strong factor structure, early work established convergence of the principal component estimates of the factors and loadings to a rotation matrix. This paper shows that the estimates are still consistent and asymptotically normal for a broad range of weaker factor loadings, albeit at slower rates and under additional assumptions on the sample size. Standard inference procedures can be used except in the case of extremely weak loadings which has encouraging implications for empirical work. The simplified proofs are of independent interest.

## Authors

• 2 publications
• 10 publications
• ### Simpler Proofs for Approximate Factor Models of Large Dimensions

Estimates of the approximate factor model are increasingly used in empir...
08/01/2020 ∙ by Jushan Bai, et al. ∙ 0

• ### On Asymptotic Covariances of A Few Unrotated Factor Solutions

In this paper, we provide explicit formulas, in terms of the covariances...
11/12/2018 ∙ by Xingwei Hu, et al. ∙ 0

• ### A note on identifiability conditions in confirmatory factor analysis

Recently, Chen, Li and Zhang have established simple conditions characte...
12/05/2019 ∙ by William Leeb, et al. ∙ 0

• ### Determining the Number of Factors in High-dimensional Generalised Latent Factor Models

As a generalisation of the classical linear factor model, generalised la...
10/05/2020 ∙ by Yunxiao Chen, et al. ∙ 0

• ### Interpretable Proximate Factors for Large Dimensions

This paper approximates latent statistical factors with sparse and easy-...
05/09/2018 ∙ by Markus Pelger, et al. ∙ 0

• ### Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm

This paper studies Quasi Maximum Likelihood estimation of dynamic factor...
10/09/2019 ∙ by Matteo Barigozzi, et al. ∙ 0

• ### Naïve regression requires weaker assumptions than factor models to adjust for multiple cause confounding

The empirical practice of using factor models to adjust for shared, unob...
07/24/2020 ∙ by Justin Grimmer, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Starting with Forni et al. (2000) and Stock and Watson (1998, 2002), a large body of research has been developed to estimate the latent common variations in large panels in which the units observed over periods are cross-sectionally correlated. A fundamental result shown in Bai and Ng (2002) is that the space spanned by the factors can be consistently estimated by the method of static principal components (APC) at rate . Bai (2003) then establishes asymptotic normality of the estimated factors up to a rotation matrix . The maintained assumption is that the factor structure is strong, meaning that if and are the latent factors and loadings, the matrices and are both positive definite in the limit. However, Onatski (2012) shows that the APC estimates are inconsistent when (without dividing by

) has a positive definite limit. This has generated a good deal of interest in determining the number of less pervasive factors. Some assume large idiosyncratic variances, some assume that the entries of

are non-zero but small, while others assume a sparse with many zero entries. See, for example, DeMol et al. (2008), Lettau and Pelger (2020), Uematsu and Yamagata (2019), Freyaldenhoven (2021). Though the term ‘weak factors’ is used in different ways, there is a presumption that the APC estimator has undesirable properties when the strong factor assumption fails. However, to our knowledge, there does not exist a clear statement of what those properties are.

In this paper, we consider the weaker condition that has a positive definite limit with . Since it is the strength of the loadings that is being weakened and positive definiteness of is maintained throughout, we use the terminology of weaker loadings. Our main result is that while the strong factor assumption of yields the fastest convergence rates possible, and the estimates are inconsistent when the loadings are extremely weak in the case of , the principal component estimator continues to be consistent when . For the factor space, we establish that , which depends on . However, for the common component, we obtain as in the strong factor case. In terms of distribution theory, we show that , , and are all asymptotically normal when . Though some additional assumptions on and are needed, is not required to grow at the same rate as , and knowledge of is not necessary for inference. The results have implications for factor augmented regressions, and more generally, for panel data models with cross-section dependence.

The convergence rate of the factor space we now obtain for is , which is faster than previously derived for the strong factor case. This is made possible by a different proof technique that also leads to significant simplifications, hence of independent interest.111An earlier version of the paper circulated as Simpler proofs for approximate factor models of large dimensions considers only. The simplifications come partly from using higher level assumptions, and partly from using approximations to the original rotation matrix which also make it possible to conduct inference using a representation of the asymptotic variance that the user deems most convenient.

The paper proceeds as follows. Section 2 sets up the econometric framework and presents three useful preliminary results. Section 3 studies consistent estimation of the factors, the loadings, and introduces four asymptotically equivalent rotation matrices. The distribution theory is given in Section 4. Implications of weaker loadings for factor augmented regressions are discussed.

Throughout, matrices are written in bold-face to distinguish them from vectors. As a matter of notation,

is the squared Frobenius norm of a matrix , denotes the squared spectral norm of , where

denotes the largest eigenvalue for a positive semi-definite matrix

. Note that , where . Thus when the rank is fixed, the two norms are equivalent in terms of asymptotic behavior.

## 2 The Econometric Setup

We use to index cross-section units and to index time series observations. Let be a

vector of random variables and

be a matrix. The normalized data svd)

 Z=X√NT=UNTDNTV′NT=UNT,kDNT,kV′NT,k+UNT,N−kDNT,N−kV′NT,N−k

where and . In the above, is a diagonal matrix of singular values arranged in descending order, are the corresponding left and right singular vectors respectively. By the Eckart and Young (1936) theorem, the best rank approximation of is . This is obtained without imposing probabilistic assumptions on the data.

We represent the data using a static factor model with factors. In matrix form,

 X = FΛ′+e. (1)

To simplify notation, the subscripts indicating that is and is will be suppresed when the context is clear. The common component has reduced rank because and both have rank . The covariance matrix of takes the form

 ΣX=ΛΣFΛ′+Σe=ΣC+Σe.

A strict factor model obtains when the errors are cross-sectionally and serially uncorrelated so that is a diagonal matrix. The classical factor model studied in Anderson and Rubin (1956) uses the stronger assumption that

is iid and normally distributed. For economic analysis, this error structure is overly restrictive. We work with the approximate factor model formulated in

Chamberlain and Rothschild (1983) which allows the idiosyncratic errors to be weakly correlated in both the cross-section and time series dimensions. In such a case, need not be a diagonal matrix.

### 2.1 The APC Estimator and Assumptions

Let and be the true values of and . The model for unit at time as

 xit=Λ0′iF0t+eit.

Letting and , the model for unit is

 Xi = F0Λ0′i+ei

Estimation of and in an approximate factor model with factors proceeds by minimizing the sum of squared residuals:

 minF,Λ\textscssr(F,Λ;r) = minF,Λ1NT∥X−FΛ′∥2 = minF,Λ1NTN∑i=1T∑t=1(xit−Λ′iFt)2.

As and are not separately identified, we impose the normalization restrictions

 F′FT=Ir,Λ′Λis diagonal. (2)

The solution is the (static) APC estimator defined as:

 (~F,~Λ)=(√TUNT,r,√NVNT,rDNT,r). (3)

APC estimation of large dimensional approximate factor models must overcome two challenges not present in the classical factor analysis of Anderson and Rubin (1956). The first pertains to the fact that the errors are now allowed to be cross-sectionally correlated. The second issue arises because the covariance matrix of and the covariance of are of infinite dimensions when and are large. To study the properties of the APC estimates, we use to obtain:

 1NTXX′ = F0(Λ0′Λ0)NF0′T+F0Λ0′e′NT+eΛ0F0′NT+ee′NT. (4)

But and thus . It follows that

 F0(Λ0′Λ0)NF0′~FT+F0Λ0′e′~FNT+eΛ0F0′~FNT+ee′~FNT = ~FD2NT,r. (5)

Rearranging terms yields

 ~Ft−~H′NT,0F0t = ~D−2NT(1TT∑t=1~Fsγst+1TT∑s=1~Fsζst+1TT∑s=1~Fsηst+1TT∑s=1~Fsξst)

where , , , , and . Stock and Watson (2002); Bai and Ng (2002); Bai (2003) established properties of the APC estimator by analyzing the four terms under certain assumptions, and this is by and large the approach that the literature has taken. In what follows, we work directly with the matrix norms of the terms in (5). This makes it possible to obtain simpler proofs under more general assumptions.

#### Assumption A1:

Let not depending on and and define

 δNT=min(√N,√T).
• Mean independence: .

• Weak (cross-sectional and serial) correlation in the errors.

• ,

• For all , , and for all ,

• For all , and for all , .

• .

#### Assumption A2:

(i) ;
(ii) , , for some with ;
(iii) the eigenvalues of are distinct.

#### Assumption A3:

For each , (i) , (ii) ; for each , (iii) , (iv) ; (v) .

#### Assumption A4:

As , for the same in Assumption A2.

Assumption A1.i uses mean independence in place of moment conditions on

as in previous work. Assumption A1.ii assumes weak time and cross-section dependence. Assumption A.1(d) is a bound on the maximum eigenvalues of

. For iid data with uniformly bounded fourth moments, the rate is implied by random matrix theory.

Moon and Weidner (2017) extends the case to data that are weakly correlated across and . We use it to obtain simpler proofs. Assumption A2 implies and . Parts of Assumption A3 also appear in Bai and Ng (2002). When the errors are independent, Assumptions A1 and A2 are enough to validate A3. The assumption should hold under weak cross-sectional and serial correlations. Allowing for weaker factors comes at a cost. As stated in Assumption A4, which is new, a small must be compensated by a larger .

The defining characteristic of an approximate factor model is that the first largest population eigenvalues of diverge with while all remaining eigenvalues of are zero, and all eigenvalues of are bounded. Previous works model the ‘diverge with ’ feature by assuming that and are positive definite in the limit. These two conditions have come to be known as the strong factor structure. Onatski (2012) considers the other extreme of . We accommodate weaker loadings but require that which nests the strong factor model as a special case. Note that the strength of the factor loadings affects the normalization of but not . Assumption A2.(ii) allows the eigenvalues of to diverge at a rate of with . Assumptions A3.(b) and (d) reflect the more general setup. Assumption A3 implies

 F0′ee′F0NT = (6) Λ0′e′eΛ0NαT = 1TT∑t=1[(1√Nα∑iΛ0ieit)(1√Nα∑iΛ0ieit)′]=Op(1). (7)

DeMol et al. (2008) considers a setting in which the eigenvalues of are large relative to those of . As discussed in Onatski (2012), such a setup can be rewritten in terms of weaker loadings with . The focus of DeMol et al. (2008) is shrinkage priors in Bayesian estimation and the implications for forecasting. Lettau and Pelger (2020) considers a penalized APC estimator to account for pricing errors in expected returns. Uematsu and Yamagata (2019, 2020) assume in our notation that is a diagonal matrix with in the -th diagonal, where reflects the strength of factor , and for an unspecified choice of . Their main interest is in a new regularized estimator that captures sparsity of the loadings. Freyaldenhoven (2021) develops criteria to determine the number of factors of differing strength (local factors) and provides some results for the APC estimates when . We consider standard APC estimation without knowledge of and assume that it is the same for all factors, but we provide the complete distribution theory for , and , making explicit that is the limit of a matrix that depends on the data generating process. As we will now proceed to show, plays an important role in the asymptotic theory.

### 2.2 Useful Identities and Matrices

In the strong factor case, is a diagonal matrix of the largest eigenvalues of . The singular values of are those of divided by . In practice, each column of is transformed to have unit variance so is the fraction of variation in explained by factor . The following lemma shows that to accommodate weaker factors, must be scaled up by to have a limit matrix that is full rank.

###### Lemma 1

Let be a diagonal matrix consisting of the the ordered eigenvalues of . Under Assumption A, we have

 (NNα)D2NT,rp⟶D2r>0,1≥α>0.

#### Proof:

Proper normalization of is key to accommodating . The diagonal matrix consists of the largest eigevalues of , and

 1NαTXX′ = F0(Λ0′Λ0)NαF0′T+F0Λ0′e′NαT+eΛ0F0′NαT+ee′NαT.

Since the largest eigenvalue of is of order , the largest eigenvalue of the last matrix is bounded by . Furthermore, , and . A bound in spectral norm for the second matrix on the right hand side is

 ∥e∥sp∥F0||sp∥Λ0∥spTNα≤Op(√NNα1T)+Op(1Nα/2)p⟶0.

The third matrix is the transpose of the second. Thus, the largest eigenvalues of the last three matrices converges to zero. By the matrix perturbation theorem, the largest eigenvalues of are determined by the first matrix on the right hand side. The eigenvalues of this matrix are the same as those of

This matrix converges to whose eigenvalues are , proving the lemma.

Next, we turn to two matrices that will play important roles subsequently. The first is the covariance matrix . To obtain its limit, we multiply on each side of (5) and use the fact that to obtain

 (~F′F0T)Λ0′Λ0Nα(F0′~FT)+~F′F0Λ0′e′~FNαT2+~F′eΛ0F0′~FNαT2+~F′ee′~FNαT2 = NNαD2NT,r. (8)

The right hand side converges to a positive definite matrix (thus invertible) by Lemma 1

. The last three matrices on the left hand side converges in probability to zero. In particular,

 ∥~F′F0Λ0′e′~FNαT2∥≤∥~F′F0T∥∥Λ0′e′∥∥~F∥1NαT=Op(1Nα/2)=op(1)

and

 ∥~F′ee′~FNαT2∥≤ρmax(ee′)∥~F∥2T1NαT≤max{N,T}NαTOp(1)=op(1). (9)

The limit on the left hand side is thus determined by the first matrix, ie.

 (~F′F0T)Λ0′Λ0Nα(F0′~FT)+op(1)=NNαD2NT,r. (10)

The limit of can be obtained from this representatino.

###### Lemma 2

Under Assumption A,

• , where

consists of the eigenvectors of the matrix

with .

• For .

Part (i) is obtained by taking limit on each side of (10) and to yield . Since is diagonal with distinct elements, we can solve for as where consists of the orthonormal eigenvectors of the matrix . The matrix is full rank and invertible. The solution holds up to a column sign change, just like is determined up to a column sign change.

The rotation matrix , first derived in Stock and Watson (1998), has been used to evaluate the precision of . Bai (2003) shows that when . To accommodate weaker loadings, we consider

 HNT,0=(Λ0′Λ0Nα)(F0′~FT)(NNαD2NT,r)−1.

By assumption, the first matrix on the right hand side is invertible while the last two matrices are invertible by the previous lemmas. Hence . The matrix and its relation to are fundamental to the asymptotic theory in the strong factor case. Lemma 2 shows that the relations are unaffected when weaker loadings are allowed.

## 3 Consistent Estimation of the Factor Space

This section has three parts. Subsection 1 presents consistent estimation of the factors up to rotation by . Subsection 2 introduces fournew rotation matrices. Subsection 3 uses these new matrices to show consistent estimation of the loadings.

### 3.1 The Factors

To establish consistent estimation of for up to rotation by , we multiply to both sides of (5) and use the definition of to obtain

 ~F−F0HNT,0 = (F0Λ0′e′~FNT+eΛ0F0′~FNT+ee′~FNT)D−2NT,r (11) = (F0Λ0′e′~FNαT+eΛ0F0′~FNαT+ee′~FNαT)(NNαD2NT,r)−1.

This implies

 1√T∥~F−F0HNT,0∥ ≤ {2(∥F0∥∥~F∥T)(1√TNα∥Λ0′e′∥)+∥ee′~F∥NαT3/2}∥(NNαD2NT,r)−1∥ = Op(1√TNα∥Λ0′e′∥)+Op(∥ee′~F∥NαT3/2).

But by (7) and . Thus

 1√T∥~F−F0HNT,0∥=Op(1√Nα)+1TNNαOp(1).

Squaring it gives the following proposition.

###### Proposition 1

Under Assumption A, the following holds:

 1T∥~F−F0HNT,0∥2=1TT∑t=1∥~Ft−H′NT,0F0t∥2=Op(1Nα)+1T2(NNα)2Op(1).

The result is stated in in squared Frobenius norm. For , Theorem 1 of Bai and Ng (2002) gives a convergence rate for the same quantity of . The proposition here uses a different proof to obtain a faster convergence rate of for the strong factor case of . Implications of the proposition will be discussed subsequently.

### 3.2 Equivalent Rotation Matrices

The rotation matrix is a product of three matrices and it is not easy to interpret. However, we can rewrite (8) as

 (~F′F0T)HNT,0=Ir−{~F′F0Λ0′e′~FNαT2+~F′eΛ0F0′~FNαT2+~F′ee′~FNαT2}(NNαD2NT,r)−1. (12)

As , the product of and

is an identity matrix up to an negligible term if it can be shown that the three terms inside the bracket is small. The next Lemma formalizes this result and shows that it also holds for four other rotation.

###### Lemma 3

Under Assumption A,

• For , , where
,
,
, and
.

• .

Part (i), shown in the Appendix, establishes the error in approximating by while part (ii) considers four additional approximations that provide an intuitive interpretation of . For example, is coefficient matrix from projecting on the space spanned by and is asymptotically the fit from the projection. These alternative rotation matrices were used in Bai and Ng (2019) for . The above Lemma shows that they can still be used in place of when , but the adequacy of approximation will depend on .