 # Simpler Proofs for Approximate Factor Models of Large Dimensions

Estimates of the approximate factor model are increasingly used in empirical work. Their theoretical properties, studied some twenty years ago, also laid the ground work for analysis on large dimensional panel data models with cross-section dependence. This paper presents simplified proofs for the estimates by using alternative rotation matrices, exploiting properties of low rank matrices, as well as the singular value decomposition of the data in addition to its covariance structure. These simplifications facilitate interpretation of results and provide a more friendly introduction to researchers new to the field. New results are provided to allow linear restrictions to be imposed on factor models.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

An active area of research in the last twenty years is analysis of panel data with cross-section dependence, where the panel has dimension , and where (the time) and (the cross-section) dimensions are both large. Classical factor models studied by Anderson and Rubin (1956) and Lawley and Maxwell (1974) among others are designed to capture cross-section dependence when either or is fixed, and that errors are iid across time and units. The approximate factor model formulated in Chamberlain and Rothschild (1983) relaxes many these assumptions, so what remains is to be to able take the theory to the data. Connor and Korajczyk (1993) suggest to estimate the factors by the method of asymptotic principal components (APC). Consistency proofs were subsequently given in Stock and Watson (2002a), Bai and Ng (2002) under the assumption that with . Bai and Ng (2006) provide the conditions under which the factor estimates can be treated in subsequent regressions as though they were observed. Novel uses of the factor estimates such as diffusion index forecasting pioneered in Stock and Watson (2002b)) and factor-augmented autoregressions such as considered in Bernanke et al. (2005), along with the natural role that common factors play in many theoretical models in economics and finance have contributed to the popularity of large dimensional factor analysis.

Arguably, the three fundamental results in this literature are i) the consistency proof of the estimated factor space at rate , ii) consistent estimation of the number of factors, and (iii) and asymptotic normality of the estimated factors, the loadings, and the common component, respectively. The point of departure in these results, given Bai and Ng (2002) and Bai (2003), is an analysis of the factor estimates relative to a specific rotation of the true factors first considered in Stock and Watson (1998) that is defined from the covariance structure of the data. This leads to a decomposition of the estimation error into four terms and carefully deriving the limit for each of them. Though a large body of research is built on these theoretical results, the arguments are lengthy and often not particularly intuitive.

In this paper, we show that the key results can be obtained using simpler arguments and under higher level assumptions. It turns out that inspection of the norm of the population covariance of the errors is already sufficient to establish that the factor space can be consistently estimated at rate

from which consistent estimation of the number of factors can be easily established. Exploiting the eigen-decomposition of the data and not only its covariance leads to different representation of the factor estimates that also simplify the analysis. Most important is the recognition that the rotation matrix is not unique. We present four asymptotically equivalent rotation matrices that simplify the proofs for asymptotic normality. It will be shown that the asymptotic variance of the factor estimates can be represented in many ways. This little known fact makes it possible to conduct inference using an estimate of the variance that the researcher finds most computationally convenient. The simplified arguments, presented in consistent notation, should help students and researchers new to the field better understand the role that large

and play in estimation of approximate factor models.

Economic analysis sometimes impose specific restrictions on the model. Because we can only estimate the factor space up to a rotation matrix, the problem is a bit more tricky. We provide results for estimation of factor models with linear restrictions These results should be of interest as factor estimation finds more ways into economic applications.

## 2 Model Setup and Assumptions

We use to index cross-section units and to index time series observations. Let be a vector of random variables and be a matrix. In practice, is transformed to be stationary, demeaned, and often standardized. The normalized data has singular value decomposition (svd)

 Z=X√NT=UNTDNTV′NT=UNT,rDNT,rV′NT,r+UNT,N−rDNT,N−rV′NT,N−r.

In the above, is a diagonal matrix of singular values arranged in descending order, are the corresponding left and right singular vectors respectively. Note that while the large singular values of diverge and the remaining ones are bounded, the largest singular values of are bounded and the remaining ones tend to zero because the singular values of are those of divided by . The Eckart and Young (1936) theorem posits that the best rank approximation of is

. The nonzero eigenvalues of

are the same as those , which when multiplied by , equal the nonzero eigenvalues of and .

We are interested in the low rank component of viewed from the perspective of a factor model. The static factor representation of the data is

 X = FΛ′+e. (1)

The common component has reduced rank because and both have rank . Let and . The factor representation for data of each unit is

 Xi = FΛi+ei.

The covariance matrix of takes the form

 ΣX=ΛΣFΛT+Σe=ΣC+Σe.

A strict factor model obtains when is a diagonal matrix, which holds when the errors are cross-sectionally and serially uncorrelated. The classical factor model studied in Anderson and Rubin (1956) uses the stronger assumption that

is iid and normally distributed. For economic analysis, this error structure is overly restrictive. We work with the

approximate factor model formulated in Chamberlain and Rothschild (1983), which allows the idiosyncratic errors to be weakly correlated in both the cross-section and time series dimensions. In such a case, need not be a diagonal matrix.

The defining characteristic of an approximate factor model is that the population eigenvalues of diverge with while all eigenvalues of are bounded. Since can be consistently estimated, we will assume that is known. To simplify notation, the subscripts indicating that is and is will be suppresed when the context is clear. Estimation of and in an approximate factor model with factors proceeds by minimizing the sum of squared residuals:

 minF,Λ\textscssr(F,Λ;r) = minF,Λ1NT∥X−FΛ′∥2F = minF,Λ1NTN∑i=1T∑t=1(xit−Λ′iFt)2.

As and are not separately identified, we impose the normalization restrictions

 F′FT=Ir,Λ′ΛNis diagonal. (2)

Even with these restrictions, the problem is not convex and is difficult to solve. But we can iteratively solve two bi-convex problems: (i) conditional on , minimizing the objective function with respect to suggests that time series regressions of on will give estimates of for each ; (ii) conditional on , doing cross-section regressions of on will given estimates of for each . That is, we iteratively compute

 ~F = X~Λ(~Λ′~Λ)−1, (3a) ~Λ′ = (~F′~F)−1~F′X=1T~F′X. (3b) The solution upon convergence is the (static) asymptotic principal components (APC): (~F,~Λ)=(√TUNT,r,√NVNT,rDNT,r). (3c)

Evidently, the solution involves eigenvectors because the algoirthm is an implementation of ’orthogonal subspace iteration’ algorithm for computing eigenvectors,

Golub and Loan (2012, Algorithm 8.2). A related method is the ’alternating least squares’ developed in De Leeuw (2004) and refined in Unkel and Trendafilov (2010) that treats as unknowns to be recovered. Provided that a low rank structure exists, the error bounds for these algorithms can be shown without probabilistic assumptions about , and . We will need these assumptions to obtain distribution theory, and will treat as residuals rather than choice variables.

Analysis of the APC estimates in a setting of large and large must overcome two new challenges not present in the classical factor analysis of Anderson and Rubin (1956). The first pertains to the fact that the errors are now allowed to be cross-sectionally correlated. The second pertains to the fact that covariance matrix of or are of dimensions and respectively, which are of infinite dimensions when and are large. The asymptotic properties of the factor estimates were first studied in Stock and Watson (2002a); Bai and Ng (2002); Bai (2003). Though the theory is well developed, the derivations are quite involved.

In what follows, we will establish the properties of and using simpler proofs and under weaker assumptions than previously used. Throughout, we let

 δNT=min(√N,√T).

Unless otherwise stated, is understood to be the squared Frobenius norm of a matrix . That is, . The factor model can also be represented as

 Xit=Λ′iFt+eit.

A strict factor model assumes that for . An approximate factor model relaxes this requirement.

#### Assumption A1:

Let and be the true values of and . Let , not depending on and .

• Mean independence: .

• Weak (cross-sectional and serial) correlation in the errors.

• ,

• For all ,

• For all , and for all , .

#### Assumption A2:

(i) ; (ii); ; (iii) the eigenvalues of are distinct.

#### Assumption A3:

(i) For each , and ; (ii) for each , and .

Assumption A1 assumes mean independence and some moment conditions. Assumption A2 implies that

and , and that all eigenvalues of diverge at the same rate of . The conditions ensure a strong factor structure which is needed for identification. Under Assumption A3, the following holds:

 1TF0′ee′F0NT = 1T1NN∑i=1[(1√T∑tF0teit)(1√T∑tF0teit)′]=Op(1/T) (4) 1NΛ0′e′eΛ0NT = 1N1TT∑t=1[(1√N∑iΛ0ieit)(1√N∑iΛ0ieit)′]=Op(1/N). (5)
###### Lemma 1

Under Assumption A,

 ∥ee′NT∥2 = Op(1T)+Op(1N)=Op(δ−2NT).

Lemma 1 establishes that the normalized sum of squared covariances of the errors is of stochastic order that depends on the size of the panel in both dimensions. The proof comes from observing that is a matrix with as its entry. Thus

 ∥ee′NT∥2 = 1N2T2T∑t=1T∑s=1(N∑j=1ejtejs)2 = 1T[1TT∑t=1(1NN∑j=1e2jt)2t=s]+1N[1T2T∑t=1T∑s≠t(1√NN∑j=1ejtejs)2t≠s].

The first term is . The second term is in the special case that are serially uncorrelated. In general, the second term is , which can be proved by adding and subtracting and use Assumption A1(ii)(b). Hence under Assumption A, the idiosyncratic errors can only have limited time and cross-section correlations.

## 3 Consistency Results

From , we have . Plugging in and expanding terms give

 F0(Λ0′Λ0)NF0′~FT+F0Λ0′e′~FNT+eΛ0F0′~FNT+ee′~FNT = ~FD2NT,r. (6)

Various results will be obtained from this useful identity. Define the rotation matrix

Note that this is the transpose of the one defined in Bai and Ng (2002).

### 3.1 Consistent Estimation of the Factor Space

We want to establish that is close to and is close to in some well-defined sense. Multiplying to both sides of (6) and using the definition of , we have

 ~F−F0HNT,0 = (F0Λ0′e′~FNT+eΛ0F0′~FNT+ee′~FNT)D−2NT,r. (7)

Taking the norm on both sides. we have

 1T∥~F−F0HNT,0∥2 ≤ {2(∥F0∥2∥~F∥2T2)(1T∥1NΛ0′e′∥2)+∥~F∥2T∥∥ee′NT∥∥2}∥D−2NT,r∥2,
###### Proposition 1

Under Assumption A, the following holds in squared Frobenius norm

 (i). 1T∥~F−F0HNT,0∥2=1TT∑t=1∥~Ft−H′NT,0F0t∥2=Op(δ−2NT) (ii). 1N∥~Λ−Λ0(H′NT,0)−1∥2=1NN∑i=1∥~Λi−H−1NT,0Λ0i∥2=Op(δ−2NT) (iii). 1NT∥~C−C0∥2=1NTN∑i=1T∑t=1∥~Cit−C0it∥2=Op(δ−2NT).

Part (i) of Proposition 1 says that the average squared deviation between and the space spanned by the true factors will vanish at rate , which is the smaller of the sample size in the two dimensions. This result corresponds to Theorem 1 of Bai and Ng (2002), but the argument is now simpler. It uses the fact that by Assumption A2, by normalization, , from equation (5) and by Lemma 1. Part (ii) follows by symmetry. Part (iii) does not depend on and is a consequence of (i) and (ii).

Part (i) is weaker than uniform convergence of to . However, this result is sufficient to validate many uses of , the most important being consistent estimation of the number of factors, and being able to treat as in factor augmented regressions.

### 3.2 The Limit of ~F′F0/T

An important quantity in determining the properties of the factor estimates is .

###### Proposition 2

Let the matrix denote and its spectral decomposition with . Under Assumption A, then and

 ~F′F0′/Tp⟶Q=DrΥΣ−1/2Λ.

#### Proof.

The proof of is given in Stock and Watson (1998). We focus on the limit of . Multiply on both sides of (6), we have111Proposition 2 corresponds to Proposition 1 of Bai (2003) which is stated in terms of instead of .

 =F0′~FTD2NT,r.

The second and third terms on the left hand side are negligible since the matrix

 F0′eΛ0NT=1NT∑i∑tFtΛ′ieit=Op(δ−2NT).

The fourth term is also negligible because and each term is negligible. This implies that

 (F0′F0′T)(Λ0′Λ0N)(F0′~FT)+op(1)=F0′~FTD2NT,r

If we left multiply on each side and define

 ΣNT = (Λ0′Λ0N)1/2(F0′F0T)(Λ0′Λ0N)1/2, ¯ΥNT = (Λ0′Λ0N)1/2(F0′~FT),

we have

 ΣNT¯ΥNT+op(1)=¯ΥNTD2NT,r.

Now can be interpreted as the (non-normalized) eigenvectors of matrix . These eigenvectors do not have unit length even asymptotically because . We can define normalized eigenvectors as so that . Since and , converges to . From , taking the limit yields , where is the limit of (note that since the eigenvalues of are distinct, is unique up to a column sign change, depending the column sign of ). So is the diagonal matrix consisting of the eigenvalues of , and is the matrix of eigenvectors with . We have

 F0′~FT=(Λ0′Λ0N)−1/2ΥNTDNT,rp⟶Σ−1/2ΛΥDr≡Q′.

Note that

is not, in general, an identity matrix. Proposition

2 implies two useful results for what is to follow:

 Q′D−2r = Σ−1ΛQ−1 (8a) Σ−1FQ′ = Q−1. (8b)

The first identity follows from the definition of that . The second identity uses which simplifies to . The two identities can equivalently be stated as and , respectively.

### 3.3 Equivalent Rotation Matrices

As seen above, is based on , the left singular vectors of

and thus all linear transformations of

are also solutions. The following Lemma will be useful in establishing that has asymptotically equivalent representations.

###### Lemma 2

Under Assumption A, .

Proof: From (4), . Now adding and subtracting terms,

 ~F′ee′~FNT2 = (~F−F0H)′ee′(~F−F0H)NT2 +HF0′ee′(~F−F0H)NT2+(~F−F0H)′ee′F0HNT2+H′F0′ee′F0HNT2 = a+b+c+d.
 ∥a∥ ≤ ∥~F−F0H∥2T∥ee′∥NT=Op(δ−2NT)Op(δ−1NT) ∥b∥ ≤ ∥~F−F0H∥√T∥ee′∥NT∥F0∥√T∥H∥=Op(δ−1NT)Op(δ−1NT)Op(1)=Op(δ−2NT) ∥b∥ ≡ ∥c∥ ∥d∥ ≤ ∥H∥2∥F0′ee′F0∥NT2=Op(δ−2NT).

We are now in a position to consider asymptotically equivalent rotation matrices:

###### Lemma 3

Let and define

 HNT,1=(Λ0′Λ0)(~Λ′Λ0)−1,H−1NT,1=(~Λ′Λ0)(Λ0′Λ0)−1,HNT,2=(F0′F0)−1(F0′~F),H−1NT,2=(F0′~F)−1(F0′F0)HNT,3=(~F′F0)−1(~F′~F)=(~F′F0/T)−1H−1NT,3=(~F′F0/T)=(~F′~F)−1(~F′F0)HNT,4=(Λ0′~Λ)(~Λ′~Λ)−1=(Λ0′~Λ/N)D−2NT,r,H−1NT,4=D2NT,r(Λ0′~Λ/N)−1.

Under Assumption A, the following holds for

• ;

• .

Proof: Part (ii) follows from Proposition 2 that . It remains to show that all alternative rotation matrices are asymptotically equivalent.

We begin with . Recall that is the matrix of eigenvalues of associated with the eigenvectors . Using the normalization , we have . Substituting into the above, we have

 D2NT,r=(~F′F0T)(Λ0′Λ0N)(F0′~FT)+1T(~F′ee′~FNT)Op(δ−2NT)+Op(δ−2NT) (9)

where the last term represents the cross product term, which is dominated. The second on the right hand side is by Lemma 2. Substituting for into gives

 HNT,0=(~F′F0T)−1+Op(δ−2NT).

Next, left and right multiplying by and respectively, dividing by , and using , we obtain

 ~Λ′Λ0N=(~F′F0T)(Λ0′Λ0N)+Op(δ−2NT).

Substituting into , we obtain