# Consistent Fixed-Effects Selection in Ultra-high dimensional Linear Mixed Models with Error-Covariate Endogeneity

Recently, applied sciences, including longitudinal and clustered studies in biomedicine require the analysis of ultra-high dimensional linear mixed effects models where we need to select important fixed effect variables from a vast pool of available candidates. However, all existing literature assume that all the available covariates and random effect components are independent of the model error which is often violated (endogeneity) in practice. In this paper, we first investigate this important issue in ultra-high dimensional linear mixed effects models with particular focus on the fixed effects selection. We study the effects of different types of endogeneity on existing regularization methods and prove their inconsistencies. Then, we propose a new profiled focused generalized method of moments (PFGMM) approach to consistently select fixed effects under error-covariate endogeneity. Our proposal is proved to be oracle consistent with probability tending to one and works well under most other types of endogeneity too. Additionally, we also propose and illustrate a few consistent parameter estimators, including those of the variance components, along with variable selection through PFGMM. Empirical simulations and an interesting real data example further support the claimed utility of our proposal.

• 31 publications
• 8 publications
06/02/2022

### Bayesian high-dimensional covariate selection in non-linear mixed-effects models using the SAEM algorithm

High-dimensional data, with many more covariates than observations, such...
05/24/2018

### Convex method for selection of fixed effects in high-dimensional linear mixed models

Analysis of high-dimensional data is currently a popular field of resear...
07/28/2020

### Outcome model free causal inference with ultra-high dimensional covariates

Causal inference has been increasingly reliant on observational studies ...
07/27/2019

### Estimating the Random Effect in Big Data Mixed Models

We consider three problems in high-dimensional Gaussian linear mixed mod...
08/14/2017

### Fixed effects testing in high-dimensional linear mixed models

Many scientific and engineering challenges -- ranging from pharmacokinet...
04/09/2018

### Tensor Mixed Effects Model with Applications in Nanomanufacturing Inspection

Raman mapping technique has been used to do in-line quality inspections ...
07/18/2014

### Extensions of stability selection using subsamples of observations and covariates

We introduce extensions of stability selection, a method to stabilise va...

## 1 Introduction

Linear mixed effects models are widely used for analysis of clustered data in econometrics, biomedicine and other applied sciences. It consists of additional random-effect components, along with the usual fixed-effects regression modeling, to account for variability among clusters. In biomedical applications, typical examples are longitudinal studies with repeated measurements within individuals and multi-center studies with patients clustered within centers. Due to recent technological advances, we often have access to sets of extremely high-dimensional explanatory variables, typically so-called omics data, in such studies. Hence, the potential fixed effects variables are often in the order of millions even in studies with relatively few patients. Thus, we have to select the important fixed-effects variables from the vast pool of available variables under an ultra-high dimensional set-up. Note that, in most such studies the relevant random effect variables are typically few and hence, their selection is not necessary; all of them can be included in the model.

Mathematically, given groups (e.g., centers) indexed by , we observe responses in the -th group, denoted by the

-dimensional vector

. The associated fixed and random effect covariate values are, respectively denoted by the matrix and the matrix ; often is a subset of . Let denote the total number of observations. Then, the linear mixed model (LMM) is defined by the relation (Pinheiro and Bates, 2000)

 yi=Xiβ+Zibi+ϵi,     i=1,...,I, (1)

where is the fixed effects (regression) coefficient vector, s are the random effects and s are random error components in the model. We assume that and , for each , and they are independent of each other and also of the s; here is a -dimensional variance parameter that completely specifies the matrix and

denotes the identity matrix of order

. Then, given (and ), , independently for each , where . Stacking the variables in larger matrices, we can rewrite the LMM (1) as

 y=Xβ+Zb+ϵ, (2)

where , and are the stacked matrices, and and . Now given , we have with . Given this model, our objective is to perform inference about the unknown regression parameter vector and the variance parameter vector . Following recent applications as described in the beginning, we assume that the number of fixed-effect variables () is much larger than the total sample size but the number of random effects , and prefixed. In particular, we assume the ultra-high dimensional set-up with for some , which is often the case with omics data analysis. Then the total number of parameters is and we need to impose some sparsity condition when estimating . This entitles a selection of important fixed-effect variables; we assume that the number of such variables is . Under such a sparsity assumption, the selection of the important fixed-effect variables and the corresponding parameter estimation are done by maximizing a suitably penalized log-likelihood function (Schelldorfer, Buhlmann and Van de Geer, 2011; Ghosh and Thoresen, 2018; Fan and Li, 2012); the resulting estimator of is known as the maximum penalized likelihood estimator (MPLE) and a brief description is provided in Section 2.

Under our ultra-high dimensional regime, an important desired property of the MPLE is the oracle variable selection consistency which ensures that only and all the true important variables are selected asymptotically with probability tending to one. All the existing literature studies this property of the MPLE under the assumption of full exogeneity of the model i.e., independence between clusters and mutual independence between the covarites, the random effects and the error variable. However, these independence assumptions may not always hold in practice as already studied in detail for the classical low-dimensional settings along with appropriate remedies like the instrumental variable (IV) method; see, among many others, Ebbes et al. (2004, 2015), Kim and Frees (2007), Wooldridge (2010,2012), Bates et al. (2014). In our ultra-high dimensional set-up, it is practically too demanding to always expect all exogeneity assumptions to hold; in particular, the assumption regarding independence between the error and all the covarites is quite vulnerable and also not verifiable for extremely large values of . We will see, in Section 2, that the usual MPLE of the linear mixed model parameters gets seriously affected under violation of different exogeneity assumptions; it also significantly increases the number of false positives in fixed effects selections. To our knowledge, there is no literature on studying the effects of such endogeneity and developing appropriate remedies under high-dimensional mixed models. This paper aims to tackle this important problem with particular focus on level-1 endogeneity arising from the error-covariate correlation and to propose a new consistent selection procedure of fixed-effect variables, along with estimation of all parameters, under such endogenity.

The endogeneity issue under high or ultra-high dimensional set-up was first considered in Fan and Liao (2014) under the usual regression set-up where the authors proposed a focused generalized method-of-moments (FGMM) estimator to consistently select and estimate the non-zero regression coefficients. In this paper, we will extend their FGMM approach to consistently select the important fixed effect variables under our ultra-high dimensional linear mixed model set-up and then to estimate the variance parameters along with the (fixed-effects) regression coefficients of the selected variables in a second stage. The proposed method is shown to satisfy the oracle variable selection consistency for the fixed effects even under error-covariate endogeneity. The overall procedure is implemented by an efficient algorithm and verified with suitable numerical illustrations.

The main contributions of the paper can be summarized as follows.

• We investigate the effect of endogeneity on the selection of fixed-effects and parameter estimation in ultra-high dimensional linear mixed-effects models (LMMs). This is indeed the first such attempt for mixed models with exponentially increasing number of fixed-effects covariates and we prove the inconsistency of the usual penalized likelihood procedures in such cases under violation of the exogeneity assumption.

• We propose a new procedure for selecting important fixed-effects variables in presence of level-1 (error-covariate) endogeneity in such ultra-high dimensional LMM. Our method is based on the profiled focused generalized method of moments (PFGMM). It handles the endogeneity issue through use of appropriate instrumental variables and it uses general non-concave penalties like SCAD to carry out sparse variable selection. The problem of unknown variance components is solved by use of an appropriate proxy matrix as in Fan and Li (2012). Our proposal is seen to produce significantly less false positives, both in simulations and in a real data application, compared to the usual penalized likelihood method of Fan and Li (2012) in presence of endogeneity in data.

• We rigorously prove the consistency of the parameter estimates of fixed-effects coefficients and their oracle variable selection property under appropriate verifiable conditions. Our assumptions on the penalty are completely general to cover most common non-concave penalties like SCAD or MCP. The proof also allows the important selected variables to be endogenous, by allowing the instrumental variables to be completely external to the regression model.

• We also prove, under appropriate conditions, an asymptotic normality result for the estimates of the fixed-effects coefficients obtained by our PFGMM. This will further help us to develop testing procedures in endogenous high-dimensional LMM in the future.

• An efficient computational algorithm is also discussed along with the practical issue of selecting the proxy matrix and the regularization parameter. Along with extensive numerical illustrations, good-to-go suggestions for their choices are also provided which are expected to work for most practical data with strong signal-to-noise ratio. The (unoptimized) MATLAB code is available from the authors upon request.

• Once the important fixed-effects variables are selected consistently, we also discuss and illustrate a few second stage estimation procedures to estimate the variance parameters of the LMM along with refinements of the fixed-effects coefficients.

• Although our primary focus is on error-covariate endogeneity, finally we also briefly illustrate the effects of other types of endogeneity (level-2) on our proposed PFGMM approach of variable selection. Interestingly, our proposal is seen to work consistently in most such scenarios as well which we would like to investigate theoretically in our subsequent works.

The rest of the paper is organized as follows. We start with the description of the usual maximum penalized likelihood approach and its inconsistency in the presence of endogeneity in Section 2. In Section 3, we discuss the proposed PFGMM approach with its motivation, oracle consistency of variable selection and parameter estimation, asymptotic normality result and computational aspects with numerical illustrations. Estimation of the variance parameters in the second stage refinement are discussed and illustrated in Section 4. The effect of level-2 endogeneity is examined numerically in Section 5 and a real data application is presented in Section 6. Finally the paper ends with some brief concluding remarks in Section 7.

## 2 The MPLE and Endogeneity: Types and Effects

Let us start with a brief description of the MPLE under the ultra-high dimensional linear mixed model and study the effects of different types of endogeneity on it. Consider the notation and set-up of Section 1. Using the normality of the stacked response vector in our LMM (2), the corresponding log-likelihood function of the parameters turns out to be

 ln(β,η) = −12[nlog(2π)+log|σ2V(θ,σ2)|+1σ2(y−Xβ)TV(θ,σ2)−1(y−Xβ)]. (3)

Adding an appropriate penalty function, say , on each component of , the corresponding MPLE is defined as the minimizer of the penalized objective function given by

 Qn,λ(β,η)=−ln(β,η)+p∑j=1Pn,λ(|βj|). (4)

With suitable regularization parameter , the MPLE obtained by the minimization of with respect to simultaneously selects the important components of (by estimating unimportant components of as zero) along with estimating the variance parameters consistently. However, the computation is a little tricky for different penalty functions and several extensions of basic coordinate descent methods have been proposed for the ultra-high dimensional case. In particular, Schelldorfer, Buhlmann and Van de Geer (2011) have considered the penalty in (4) under high-dimensionality, whereas Ghosh and Thoresen (2018) have extended the theory for general non-concave penalties under both low and high-dimensional set-ups. An alternative two-stage approach has been proposed in Fan and Li (2012) which uses a proxy matrix in place of the unknown variance matrix and then maximize the resulting profile likelihood of only, with suitable penalizations, to select the important fixed effect variables. The problem of estimation and selection of random effect variables is considered in a second step.

Under certain assumptions, all the existing approaches to MPLE with different penalty functions are shown to satisfy the oracle variable selection consistency, i.e., they estimate exactly the true active set (set of non-zero regression coefficients) with probability tending to one. One crucial assumption is the independence of all the random effects, which is refereed to as the exogeneity assumption. This refer to the following three types of independence:

1. Clusters are independent, i.e., , and for any two different clusters .

2. Covariates are independent of the error terms, i.e., for all clusters . (Unit level exogeneity)

3. Covariates are independent of the random effect coefficients, i.e., for all clusters . (Cluster level exogeneity)

Although (1.) may often be true in most applications through proper design, (2.) and (3.) can be violated due to unknown underlying mechanisms and relationships between the variables under study; we refer to them, respectively, as the “unit level endogeneity” or “level-1 endogeneity” and the “cluster level endogeneity” or “level-2 endogeneity”.

We start with a numerical illustration of the effect of different types of endogeneity on the MPLE and corresponding variable selection methods. Here, we use the algorithm proposed by Ghosh and Thoresen (2018) with the famous SCAD penalty (Antoniadis and Fan, 2001; Fan and Li, 2001); other existing algorithms also indicate the same behavior of the MPLE under endogeneity and are skipped for brevity. More illustrations are provided in later sections.

Example 2.1. We simulate random samples from the LMM (1) with , for each (so that ), , , and the random effects coefficients having distribution , where . The design matrix has the first column as yielding the intercept, and the next

columns are chosen from a multivariate normal distribution with mean

and a covariance matrix having -th element as for all ; the first two columns of correspond to the two random effect covariates and are kept non-penalized. The true values of the parameters , and are , and for , whereas (correlated covariates) is considered. The regularization tuning parameter is chosen by minimizing the BIC for each replication separately whereas the parameter in the definition of the SCAD penalty has been kept fixed at .

First, we repeat the above simulation 100 times without violating any exogeneity assumption and compute the summary measures about the performance of the MPLE as reported in Table 1. Next, to study the effects of endogenity, some covariates are made endogenous with either the error () or the -th component of the random effect vector , respectively, through the transformations

 Xij←(Xij+1)(ρeϵi+1), or Xij←(Xij+1)(ρbbik+1), for all i.

These produce correlations of and , respectively, for the error or -th random-effect coefficients with the endogenous covariates. The summary performance measures of the resulting MPLE under such endogeneity are reported in Table 1 for (strong correlations of 0.688 and 0.698, respectively) and four particular sets of endogenous covariates: (i) Set 1: , i.e, 10 unimportant covariates are endogenous, (ii) Set 2: , i.e, 10 unimportant covariates and one important fixed effect covariate are endogenous, (iii) Set 3: , i.e, one important covariate that have both fixed effect component and random effect slope is endogenous along with 10 unimportant covariates, and (iv) Set 4: , i.e, all unimportant covariates are endogenous.

The major observations from Table 1 and other similar simulations, not reported here for brevity, can be summarized as follows.

• Under endogeneity, we have a significant increase in false positives compared to the ideal exogenous case. The number of such wrongly selected fixed effect variables further increases with the strength of endogeneity and/or number of endogenous variables. Such an effect is more serious for level-1 endogeneity compared to level-2 endogeneity.

• Under level-1 endogeneity, we are not expected to loose any truly significant fixed-effect variables. But, in some cases of level-2 endogeneity, we may loose true positives as well.

• The model prediction error is reduced in presence of level-1 endogeneity, since more variables are selected in the final model. However, for level-2 endogeneity, the model prediction error can increase significantly when we loose the few true positives.

• The estimates of the fixed-effect intercept are more affected (increased bias and MSE) for level-1 endogeneity, whereas the estimates of other fixed-effects are affected more by level-2 endogeneity.

• The error variance also becomes severely underestimated in presence of level-1 endogeneity. Level-2 endogeneity has a mixed effect in this case, producing significantly overestimated values of for some cases with higher degrees of endogeneity.

• As known, the random effect variances are generally underestimated even under the ideal exogenous conditions. The effect of endogeneity on them is not very clear but always moderate except for level-2 endogeneity with a lot of variables (Set 4).

As our motivation is to select the important fixed-effect variables from a large pool of available candidates, in summary, the effect of level-1 endogeneity is more serious and needs proper treatment to decrease the false positives; on the other hand, level-2 endogeneity needs to be controlled to ensure no loss in true positives.

Having an idea of the effect of different types of endogeneity on the MPLE, we can now investigate further the underlying mechanism of these effects with theoretical justification. In this regard, we will first present a set of necessary conditions for the MPLE under the LMM (1) to be consistent for both estimation and fixed-effects selection. Violation of some of these necessary conditions under endogeneity in turn leads to inconsistency of the MPLE.

###### Theorem 2.1 (Necessary conditions for variable selection consistency)

Consider a general loss function

(need not to be likelihood loss) for the LMM (1) and a general penalty . Further, assume the following.

1. is twice differentiable with respect to its arguments and the maximum of its second derivatives at the true parameter value is .

2. There is a local minimizer and of the penalized objective function

 Ln(β,η)+p∑j=1Pn,λ(|βj|),

which satisfies , and ( is the dimension of ), as .

3. The penalty function is non-negative with , is nonincreasing on for some , and .

Then, for any , we have

 ∣∣∣∂Ln(β0,η0)∂βl∣∣∣P→0.

Proof: Define . Then, by an application of the Karush-Kuhn-Tucker (KKT) condition on the local minimizers and , we get

 ∂Ln(ˆβ,ˆη)∂βl+vl=0,    for all l≤p,

where if , and if . Therefore, by using the monotonicity and the limit of from Condition 1, we get

 ∣∣ ∣∣∂Ln(ˆβ,ˆη)∂βl∣∣ ∣∣≤P′n,λ(0+)=o(1). (5)

Next, by the first order Taylor series expansion of at around , we get a on the line segment joining and such that

 ∂Ln(ˆβ,ˆη)∂βl−∂Ln(β0,η0)∂βl=p∑j=1∂2Ln(˜β,˜η)∂βlβj(ˆβj−β0j)+m∑k=1∂2Ln(˜β,˜η)∂βlηk(ˆηk−η0k).

Therefore, in the event having probability tending to one (by Condition 2), we get

 ∣∣ ∣∣∂Ln(ˆβ,ˆη)∂βl−∂Ln(β0,η0)∂βl∣∣ ∣∣=∣∣ ∣∣∑j∈S∂2Ln(˜β,˜η)∂βlβj(ˆβj−β0j)+m∑k=1∂2Ln(˜β,˜η)∂βlηk(ˆηk−η0k)∣∣ ∣∣ ≤maxl,j≤p∣∣ ∣∣∂2Ln(˜β,˜η)∂βlβj∣∣ ∣∣∣∣∣∣ˆβS−β0S∣∣∣∣1+maxl,k≤p∣∣ ∣∣∂2Ln(˜β,˜η)∂βlηk∣∣ ∣∣||ˆη−η0||1 ≤maxl,j≤p∣∣ ∣∣∂2Ln(˜β,˜η)∂βlβj∣∣ ∣∣√s∣∣∣∣ˆβS−β0S∣∣∣∣2+maxl,k≤p∣∣ ∣∣∂2Ln(˜β,˜η)∂βlηk∣∣ ∣∣√m||ˆη−η0||2,

where the last step follows by Cauchy-Swartz inequality. Now, by Conditions 1 and 2, we get

 ∣∣ ∣∣∂Ln(ˆβ,ˆη)∂βl−∂Ln(β0,η0)∂βl∣∣ ∣∣=oP(1).

Then the theorem follows using (5)

Note that, Condition 3 in the above theorem, imposed on the penalty function, is the same as the one used in Theorem 2.1 of Fan and Liao (2014) and is quite general. It is satisfied by most common penalties including , SCAD or MCP by appropriately choosing the regularization parameter going to zero asymptotically ( as ). Therefore, as in Fan and Liao (2014), our result in Theorem 2.1 above also provides a necessary condition on the loss function for a large class of useful penalty functions. As a special case, our next result proves the inconsistency of the PMLE, in at least one aspect, under endogeneous LMM.

###### Theorem 2.2 (Inconsistency of penalized MLE)

Consider i.i.d. data and assume there is endogeneity in at least one in at least one group . The loss function of the penalized MLE (PMLE) is given by (3). Assume that and has finite 4th order moments and that satisfies the condition of Theorem 2.1. If and are local optimizers of the PMLE objective function such that ( is the dimension of ), then either , or .

Proof: The proof follows directly from Theorem 2.1 by noting the fact that, if is endogeneous, then

 ∣∣∣∂Ln(β0,η0)∂βl∣∣∣→a,  almost surely,  % as n→∞,

for some

. The proof of this fact can be done by standard calculation of the derivative and applying the Strong Law of Large Numbers; hence it is omitted for brevity.

## 3 Focused Selection of the Fixed Effect Variables

Consider the linear mixed model set-up as described in Section 1

. We will now propose a new extension of the MPLE of Fan and Li (2012) that will lead to consistent oracle selection of important fixed effect variables. For this purpose, we will use the concept of the focused generalized method of moments (FGMM) approach, with non-concave penalization, as proposed initially by Fan and Liao (2014) in the context of high-dimensional linear regression; the resulting loss function indeed simultaneously performs sparse selection and applies the IV (Instrumental Variable) method against endogeneity. The IV method basically assumes the availability of a vector of observable

instrumental variables which is correlated with the covariates but uncorrelated with the model error, i.e.,

 E[ϵ|W]=0.

The choice of a proper IV helps us to tackle different statistical problems; they are often chosen as a function of the covariates or even a subset of and hence the above condition can be easily verified through some simple moment conditions. As noted earlier, the IV technique is seen to be extremely useful to address the endogeneity issues in classical low-dimensional LMM; see Hall and Horowitz (2005), Wooldridge (2010), Lin et al. (2015), Chesher and Rosen (2017) for details on popular recent IV methods. Here we will extend them to the high-dimensional LMM in the line of Fan and Liao (2014).

### 3.1 The Profiled Focused GMM (PFGMM) with non-concave penalization

Under the linear mixed model set-up considered in this paper, keeping consistent with the many real-life applications, we have assumed that the number of random effects are small enough so that their individual analysis is possible in the classical sense. Hence, we will assume that the matrix is positive definite (pd). Let us first assume, for the time being, that the variance parameter is known. Then, based on (3), the likelihood of only the fixed effects coefficients (ignoring the constant term) is given by

 Lprofile(β)∝exp{−12σ2(y−Xβ)TV(θ,σ2)−1(y−Xβ)}. (6)

Note that, this is also the profiled likelihood of obtained by substituting the MLE of the random effect vector , given , in the joint likelihood of and given covariates; see Fan and Li (2012) for details. A penalized version of this profile likelihood (in logarithm) can be maximized for sparse selection of the fixed effects and estimation of the corresponding coefficients; Fan and Li (2012) have suggested to use a suitable proxy matrix for the unknown matrix .

Now, in case of endogeneity, we need to additionally apply the IV method to achieve consistency. Let us again assume, for the time being, that and hence is known. Let us define the transformed variables

 y∗=V(θ,σ2)−1/2y,      X∗=V(θ,σ2)−1/2X,      ϵ∗=V(θ,σ2)−1/2ϵ.

Then we have and hence . Therefore, the profile likelihood of , given in (6), under the linear mixed effects model (1) is also the ordinary likelihood of under the following linear regression model in the transformed space:

 y∗=X∗β+ϵ∗,    ϵ∗∼Nn(0,σ2Ip). (7)

Under level-1 endogeneity in the mixed model (1), we also have endogeneity in the transformed regression (7) with . Noting that (7) is exactly the same as considered by Fan and Liao (2014), our idea is to apply their FGMM approach to this transformed model in the transformed space and then go back to the original space of the data to achieve our goal of fixed effects selection in the LMM (1) with endogeneity. This would have been straightforward if the original data were independent and identically distributed (iid) and was known, but none of these conditions hold in practice. Hence, we need appropriate non-trivial extensions to handle the implementation and theoretical derivations. Let us start with defining our proposed loss function below.

Note that the components of are iid and those of

are independent with the same variance but different means. Let us denote the corresponding random variables in the transformed space by

and , respectively. Then, obtaining a consistent solution under endogeneity is based on the availability of an appropriate set of observable instrumental variables in the transformed space such that

 E[ϵ∗|W∗]=0.

Fan and Liao (2014) achieved variable selection consistency under endogeneity through over-identification by use of two sets of sieve functions (Chen, 2007), say, and where and are scalar functions. Letting denote the index set of true non-zero coefficients, the above condition of IV implies that, for , we have the following set of over-identified equations:

 E[(Y∗−X∗SβS)F∗S]=0,    E[(Y∗−X∗SβS)H∗S]=0. (8)

Under these conditions, Fan and Liao (2014) have proposed to consider the FGMM loss function

 Ln(β) = [1nn∑i=1(Y∗i−X∗iβ)Π∗i(β)]TJ(β)[1nn∑i=1(Y∗i−X∗iβ)Π∗i(β)] (9) = [1nΠ∗(β)(y∗−X∗β)]TJ(β)[1nΠ∗(β)(y∗−X∗β)], (10)

where for all and is a diagonal weight matrix with non-zero weights corresponding only to the non-zero components of . In particular, the non-zero weights of components can be chosen as the inverse of the estimated variances of and , respectively. Then, a consistent solution of the transformed problem can be obtained by minimizing the penalized FGMM loss function; see Fan and Liao (2014) for details.

Now, let us look back at our original problem of the linear mixed model (1) and map the FGMM loss function (10) back into our data space to get a clear idea for this case. Note that, through an inverse transformation, we can assume for some IV in the data space. Hence the FGMM loss function for our mixed model set-up turns out to have the form

 Ln(β) = [1nΠ(β)V(θ,σ2)−1(y−Xβ)]TJ(β)[1nΠ(β)V(θ,σ2)−1(y−Xβ)]. (11)

Note that, in practice with mixed models, we can not directly minimize this FGMM loss function or its penalized version because it depends on the unknown variance parameters through . To avoid this problem, we follow the approach of Fan and Li (2012) and propose to use in place of , where is some suitable proxy matrix for the unknown variance component matrix . Therefore, we finally minimize, with respect to , the penalized objective function

 Qn(β) = LPn(β)+p∑j=1Pn,λ(|βj|), (12)

where

 LPn(β) = [1nΠ(β)˜V−1z(y−Xβ)]TJ(β)[1nΠ(β)˜V−1z(y−Xβ)]. (13)

We will refer to as the profiled Focused GMM (PFGMM) loss function based on its link to the profile likelihood. If we would have used for known variance parameters, the asymptotic consistency results for the resulting estimator would have followed directly from the results of Fan and Liao (2014). However, we will here prove that, even using the proxy matrix , we can still achieve variable selection consistency under the linear mixed model provided the proxy matrix is not very far away; we will present the rigorous proof along with the necessary assumptions in the next subsection.

### 3.2 Oracle Variable Selection Consistency

Consider the set-up of the previous subsection and assume that the true parameter value is the unique solution of the set of over-identified IV equations in (8), where the non-zero component vector . Further, we need the following sets of assumptions.

Assumptions on the penalty (P):
The general penalty function satisfies

• is concave and non-decreasing on , with ,

• has continuous derivative on , with ,
where denotes the strength of the signal,

• There exists a constant such that , where

 (14)

It is worthwhile to note that Conditions (P1)–(P3) are quite standard in high-dimensional analysis and used by several authors including Fan and Liao (2014). These are satisfied by a large class of folded-concave penalties including with , hard-thresholding, SCAD and MCP for appropriately chosen tuning parameters. Also for any , by the concavity of the penalty functions. Condition (P2) is related to the signal strength, on which we need the following additional assumptions depending on the dimension of the problem; these are needed to ensure variable selection consistency and are satisfied also by properly chosen SCAD and MCP penalties for strong signal and small .

Assumptions on the dimension and signal strength (A):

• , , .

• and

Next we assume the following conditions on the instrumental variables and with the notation and for . These are motivated from Fan and Liao (2014) and similar justifications hold for their selection in our setup; see Remark 4.1 there.

Assumptions on the Instruments (I):

• There exists such that

 maxl≤pP(|F∗l|>t)≤e−(tb1)r1,  maxl≤pP(|H∗l|>t)≤e−(tb2)r2,  for any t>0.
• Var and Var are bounded away from both zero and infinity uniformly in and .

• and are bounded away from zero.

• There exists constants such that and , where

 A=limn→∞1nΠ(β0S)˜V−1zXS. (15)
• There exists a constant such that , where

 Υ=limn→∞σ2nΠ(β0)˜V−1zV(θ,σ2)˜V−1zΠ(β0). (16)

Note that, Assumptions (I4) and (I5) depend on the choice of proxy matrix . We further need the following additional assumptions regarding .

Assumptions on the Proxy Matrix (M):

• There exists a constant such that
and .

• .

Then we have the following main theorem.

###### Theorem 3.1

Suppose and Assumptions (P), (A), (I) and (M) hold under the linear mixed model set-up. Then, there exists a local minimizer of in (12) that satisfies

1. . In addition, the local minimizer is strict with probability arbitrarily close to one for all sufficiently large .

2. Assuming denotes the estimated active set, .

3. For any unit vector ,

 √nαtΓ−1/2Σ(ˆβS−β0S)D→N(0,1),

where and .

For simplicity in presentation, we have deferred the proof of the above theorem to Appendix A and continue with implementation details and applications of our proposal below.

### 3.3 Computational Aspects

For implementing the proposed PFGMM algorithm, one can follow the same algorithm as used by Fan and Liao (2014) on the transformed variables and . But, these transformed variables leading to the loss function in (11) are not known, so we need to use the proxy matrix and the approximated loss given in (13). So, given a proxy matrix , we first compute the matrix and its square root . For computation of the matrix square root we use the Blocked Schur algorithm as developed by Deadman et al. (2013); its implementation can be found in standard statistical softwares like MATLAB or R (function named ‘sqrtm’ in both). Then, the approximation of the transformed variables are computed as and Following this, the FGMM loss function based on and is nothing but the proposed loss in (13) and hence we can next device an algorithm following the Fan and Liao (2014) approach. The minimization of the resulting penalized PFGMM objective function is done through the iterative coordinate algorithm applied to a smoothed version of a non-smooth PFGMM minimization problem. More details and justifications can be found in Fan and Liao (2014); for brevity, we only present the crucial considerations regarding the choice of proxy matrix and the choice of regularization parameter in our context.

On the Choice of Proxy matrix:
The choice of proxy matrix is not straightforward from the assumed conditions but some light can be shed following the discussions in Fan and Li (2012, Section 2.3). In particular, assuming standard non-singularity conditions involving the random effects covariates , one possible choice of which can be obtained for large is times the identity matrix. We have used this particular proxy matrix for all our empirical illustrations in the present paper.

On the Choice of :
Although for the regression modeling as considered in Fan and Liao (2014), the regularization parameter can be chosen by cross-validation and hence can also be used in connection with any loss function other than likelihood-loss (like the FGMM loss), it is not ideal to apply the cross-validation technique in case of mixed models. In likelihood based estimation and variable selection in high or ultra-high dimensional mixed-effects models, the usual proposal is to choose corresponding to the minimum BIC given by (Schelldorfer et al., 2011; Delattre et al., 2014; Ghosh and Thoresen, 2018)

 BIC(λ)=−2ln(ˆβ,ˆη)+[|ˆS|+dim(λ)]logn.

When we are using the proposed PFGMM loss function to estimate in the ultra-high dimensional mixed model, one can define a natural extension of BIC as

 ExBIC(λ)=−2LPn(ˆβPλ)+|ˆS|logn,

where is the estimate of obtained through the proposed PFGMM approach with regularization parameter . But, in this context, it may be questionable if the above formulations provide the correct penalty and this clearly needs further detailed investigation. However, it has been observed that the simple choice of , as suggested in Fan and Liao (2014), works sufficiently well for all our numerical studies.

### 3.4 Empirical illustrations

We consider the same simulation set-up as in Example 2.1 with level-1 endogeneity and different values of the underlying parameters and apply the proposed PFGMM algorithm to select the relevant fixed-effects variables. In particular, we consider the true values of as representing strong signal, and the values of other parameters as and as in Example 2.1. Further, we consider two values of ; 0 and , indicating uncorrelated and correlated covariates, respectively, and different values of to represent varying strength of endogeneity (correlations being 0, 0.1, 0.24, 0.51, 0.69, respectively); note that gives the ideal case with no endogeneity. We have also studied negative values of with the same magnitudes leading to negative correlations, but their effects are the same as the positive cases (only depends on the magnitudes) and hence the results are not reported in the paper for brevity. For comparison, we also apply the profile likelihood proposal (referred to here as the PLS) of Fan and Li (2012). The average sizes of the estimated active sets obtained by both methods are presented in Figures 1 and 2, respectively, for the correlated and the independent covariate cases.

These empirical illustrations clearly show the significantly improved performance of the proposed PFGMM method under level-1 endogeneity. In particular, for correlated covariates, even a small amount of endogeneity (small ) increases the estimated active set sizes in the PLS method, which becomes further damaging for larger extent of endogeneity, either through more endogenous variables or higher values of . On the other hand, the proposed PFGMM method produces an active set of size almost equal to the original active set size (5) under any extent of endogeneity and the variation over different replications is also negligible compared to the PLS method. The results for the independent variables are also similar although the harmful effect of endogeneity on the PLS method is not significant for smaller values of . The proposed PFGMM method still performs better than the PLS method overall, producing the same sets of active variables in the cases where the PLS also performs well.

The next section describes the performance of the estimated regression coefficients obtained through PFGMM, PLS and further refinements along with proposals for variance component estimation.

## 4 Estimation of the Variance Parameters

Once we have selected the important fixed effect variables consistently through the proposed PFGMM algorithm, our problem reduces to a low dimensional one. Let denote the set of indices of estimated non-zero coefficients, which is asymptotically the same as the true active set with probability tending to one from Theorem 3.1. So, we now have the reduced model

 yi=XiˆSβˆS+Zibi+ϵi,     i=1,...,I. (17)

Also, we have an estimate of , which is consistent and asymptotically normal from Theorem 3.1. Then, the most straightforward and intuitive estimates of can be obtained by applying the Maximum likelihood method to the resulting residual (random effect) model

 ˆri:=yi−XiˆSˆβS=Zibi+ϵi,     i=1,...,I. (18)

We will refer to the resulting estimator, say as the PFGMM estimator of in the line of the associated PFGMM estimator of . It is important to note that the PFGMME of will also be consistent and asymptotically normal by standard results on likelihood based inference for the low-dimensional residual model (18). Once has been computed as described in Section 3, the PFGMME of can be computed routinely by using the available software packages for low-dimensional LMM (e.g., package ‘lme4’ in R, function ‘fitlme’ in MATLAB).

Alternatively, if we just want to use the proposed PFGMM for selection of important fixed effects, in the second stage we can also fine-tune the estimates of along with estimation of to achieve better finite sample efficiency. For this purpose, we consider the reduced low-dimensional linear mixed effect model given in (17), containing only the fixed-effect variables from selected by the PFGMM algorithm, and apply the standard maximum likelihood (ML) or the restricted maximum likelihood (REML) approach to get the new estimates of and of . Let us refer to the resulting estimators