DeepAI

# Semiparametric model averaging for high dimensional conditional quantile prediction

In this article, we propose a penalized high dimensional semiparametric model average quantile prediction approach that is robust for forecasting the conditional quantile of the response. We consider a two-step estimation procedure. In the first step, we use a local linear regression approach to estimate the individual marginal quantile functions, and approximate the conditional quantile of the response by an affine combination of one-dimensional marginal quantile regression functions. In the second step, based on the nonparametric kernel estimates of the marginal quantile regression functions, we utilize a penalized method to estimate the suitable model weights vector involved in the approximation. The objective of the second step is to select significant variables whose marginal quantile functions make a significant contribution to estimating the joint multivariate conditional quantile function. Under some mild conditions, we have established the asymptotic properties of the proposed robust estimator. Finally, simulations and a real data analysis have been used to illustrate the proposed method.

• 1 publication
• 6 publications
• 3 publications
02/09/2021

### Nonparametric C- and D-vine based quantile regression

Quantile regression is a field with steadily growing importance in stati...
07/20/2018

### Wild Residual Bootstrap Inference for Penalized Quantile Regression with Heteroscedastic Errors

We consider a heteroscedastic regression model in which some of the regr...
05/30/2020

### Parametric Modeling of Quantile Regression Coefficient Functions with Longitudinal Data

In ordinary quantile regression, quantiles of different order are estima...
07/03/2019

### Mid-quantile regression for discrete responses

We develop quantile regression methods for discrete responses by extendi...
07/23/2018

### Prediction based on conditional distributions of vine copulas

Vine copula models are a flexible tool in multivariate non-Gaussian dist...
04/21/2021

### Modeling sign concordance of quantile regression residuals with multiple outcomes

Quantile regression permits describing how quantiles of a scalar respons...
02/26/2019

### Penalized Sieve GEL for Weighted Average Derivatives of Nonparametric Quantile IV Regressions

This paper considers estimation and inference for a weighted average der...

## 1 Introduction

In many practical situations, especially for economic and medical fields, forecasting and predictive inference are our main goals. In practice, we often face a large number of predictors and uncertain functional forms when making statistical prediction. A popular approach to solve this problem is to consider the model selection tool that can select a optimal model from all candidate models, but we have to recognize that model selection technique yields only one final model, so useful information may be ignored when significant variables absent from the final model. This may result in misleading predictive outcomes. Instead of depending on only one best model, an alternative method, called model averaging technique, aims to improve the prediction accuracy through giving higher weights to the better marginal models. Thus, model averaging can be regarded as a smoothed extension of model selection and generally leads to a lower risk than model selection. Earlier development for model average was linked closely the Bayesian statistics including Hoeting et al. (1999), Raftery et al. (1997) and Hjort and Claeskens (2003). Recently, various strategies have been developed to construct optimal model averaging weights for frequentist models. For example, Hansen (2007) proposed a frequentist model average approach with weights selected by minimizing a Mallows criterion. Wan et al. (2010) focused on two assumptions of Hansen (2007) and provided a stronger theoretical basis for the use of the Mallows criterion in model averaging. Liang et al. (2011) considered a new procedure of weight choice by minimizing frequentist model average estimators’ mean squared errors. To deal with heteroscedastic data, Hansen and Racine (2012) developed a jackknife model averaging approach to choose weights by minimizing a leave-one-out cross-validation criterion and had proved that the proposed approach achieved the lowest possible asymptotic squared error. Zhang et al. (2013) further extended the method of Hansen and Racine (2012) to general models with a non-diagonal error covariance structure or lagged dependent variables. In the framework of linear mixed-effects models, Zhang et al. (2014) constructed an unbiased estimator of the squared risk for the model averaging, which has been demonstrated to be asymptotically optimal in theory under some regularity conditions. Zhang et al. (2016) studied optimal model averaging methods for generalized linear models and generalized linear mixed-effects models, which can be taken as an extension of Zhang et al. (2014)’s. Under the local asymptotic framework, Liu et al. (2015) studied the limiting distributions of least squares averaging estimators and proposed a plug-in averaging estimator by minimizing the sample asymptotic mean squared error. Other related literature can refer to Hansen (2008), Claeskens and Hjort (2008), Zhang et al. (2012), Cheng and Hansen (2015).

Almost all mentioned above research work focus on averaging a set of parametric models by assuming some parametrically linear or nonlinear relationships between the response and predictors. Although parametric models are easy to understand and widely accepted by scientific researchers, they make strong assumptions in practical applications, which may increase the risk of bias prediction. In contrast, nonparametric models with less structural restriction may provide more flexible predictive inference. Recently, Li et al. (2015) firstly proposed a nonparametric model averaging approach which is more flexible than traditional parametric averaging method. They estimated the multivariate conditional mean regression function by averaging a set of estimated marginal mean regression functions with proper weights obtained by minimizing least squares loss. Motivated by the nonparametric model averaging technique, Chen et al. (2016) studied the semiparametric dynamic portfolio choice and utilized a novel data-driven method to estimate the nonparametric optimal portfolio choice. Huang and Li (2018) extended the method of Li et al. (2015) to panel data and established the asymptotic results of the proposed procedure. Li et al. (2018) approximated the conditional mean regression function by a weighted average of varying coefficient regression functions, which can handle discrete and continuous predictors.

In recent years, we often encounter datasets with a very large number of potential predictors, but only a minority of predictors are truly relevant in prediction. However, most of literature focus on the determination of weights for individual models under a fixed number of covariates. So far, Ando and Li (2014) proposed a two-step model averaging procedure to predict the conditional mean of the response for a ultra-high dimensional linear regression. In order to obtain more accurate prediction of the conditional mean of the response for ultra-high dimensional time series, Chen et al. (2018) introduced a two-step semiparametric procedure that includes the kernel sure independence screening technique and the semiparametric penalized method of model averaging marginal regression. All mentioned above references aim to forecast the conditional mean of the response, but sometimes we are more interested in predicting the conditional quantile of the response. Compared to mean regression, quantile regression not only provides a more complete description of the entire response distribution but also does not require specification of the error distribution, and thus it is more robust.

In this paper, we aims to develop a new semiparametric model averaging procedure for achieving more accurate prediction for the true conditional quantile of the response under the high dimensional setting. This paper may have several innovation as follows: (1) our objective is to predict the conditional quantile of the response rather than its conditional mean. Thus we may encounter more challenge to establish asymptotic theories of model weights since we cannot obtain the closed-form expression of model weights; (2) the proposed approach can offer a complete prediction for the response when different quantiles are adopted; (3) our method produces more accurate in-sample and out-of-sample prediction when non-normal error are considered.

The rest of the paper is organized as follows. In Section 2, we first give the approximation of the conditional quantile function of the response. Then a two-step semiparametric model averaging approach is applied to estimate the conditional quantile function of the response. In Section 3, we establish the asymptotic theory for the proposed estimator. In Section 4, numerical studies including simulation studies and a real data analysis are carried out to investigate the finite sample performance of the proposed method. Some discussions are reported in Section 5. Finally, all technical proofs are given in the Appendix.

## 2 Model approximation and estimation method

Let be independent and identically distributed observations from , where is a -vector of predictors and

is the response variable. The goal of this paper is to develop new procedure for forecasting the

th conditional quantile function of given , namely, . If the dimension of is high, it is not practical to model conditional quantile function

without any structure assumption due to the curse of dimensionality. Recently, authors approximated the quantile function

by semiparametric models such as quantile additive models (Horowitz and Lee, 2005, Lv et al., 2017), quantile varying coefficient models (Tang et al., 2013) and among others. However, using a specified model with fixed model structure may increase the risk of model misspecification, which results in poor predictive performance. Therefore, we adopt the model averaging technique to predict . Specifically, motivated by Li et al. (2015), we model or approximate by an affine combination of one-dimensional nonparametric functions , where is the th conditional quantile of given . Here, each marginal regression can be regarded as a candidate model and is the corresponding model weight coefficient. In the rest of the article, we omit and from and for notational simplicity, but it is helpful to bear in mind that these quantities are and -specific.

What we are most interested in is to accurately estimate and the model average weight vector . We consider a two-step estimation procedure. In the first step, we employ local linear regression technique to estimate the individual marginal regression functions . Specifically, considering a Taylor expansion, we have

 mj(Xij)≈mj(x)+˙mj(x)(Xij−x)≡a+b(Xij−x),i=1,...,n,j=1,...,pn,

where is the first-order derivative of . Let

be check loss function at

quantile. Then, we estimate by minimizing the following local weighted quantile loss

 n∑i=1ρτj{Yi−a−b(Xij−x)}K(Xij−xhj), (1)

where is a kernel function and is a bandwidth. Let be the minimizer of the objective function (1). Then, we have .

In the second step, let be the optimal values of the weights in the model averaging defined in Li et al. (2015). To estimate , we minimize the following function with respect to with ,

 Qn(wn)=n∑i=1ρτ{Yi−w0−pn∑j=1^mj(Xij)wj}+npn∑j=1pλ(∣∣wj∣∣), (2)

where is a penalty function with a tuning parameter , such as SCAD penalty function, is its first order derivative, defined by

 ˙pλ(x)=λ{I(x≤λ)+(aλ−x)+(a−1)λI(x>λ)},

where , and is a nonnegative penalty parameter which governs sparsity of the model. It is easy to find that is close to zero if is large.

The estimator of the optimal weights can be obtained through minimizing the objective function (2), that is, . This paper uses the R package “rqPen” to obtain the estimator . Finally, for a future observation , we can predict by .

## 3 The theoretical results

Define and , where is the second-order derivative of , , , and . We assume and . To prove the theoretical results of the proposed estimators, we next present the following technical conditions.

(C1) Let be the marginal density function of the covariates , the -th element of . Assume that has continuous derivatives up to the second order and

 0

where is the compact support of . For each , the conditional density functions of for given exists and satisfies the Lipschitz continuous condition. Furthermore, the length of is uniformly bounded by a positive constant.

(C2) The kernel function

is a Lipschitz continuous, symmetric and bounded probability density function with a compact support.

(C3) The marginal regression function has continuous derivatives up to the second order and there exists a positive constant such that

(C4) Let and be the marginal density and distribution functions of , and be the density and distribution functions of . The density functions and are bounded and bounded away from zero in a neighborhood of zero.

(C5) There exists a sequence of fixed vectors in , with bounded, such that , where denotes the norm for any vector.

(C6) The matrix

 Λn=⎛⎜ ⎜⎝E[m1(Xi1)m1(Xi1)]⋯E[m1(Xi1)mpn(Xipn)]⋮⋮⋮E[mpn(Xipn)m1(Xi1)]⋯E[mpn(Xipn)mpn(Xipn)]⎞⎟ ⎟⎠

is positive definite with the eigenvalues bounded away from zero and infinity. In particular, the smallest eigenvalue of

is larger than , a small positive constant.

(C7) , and for all , where and are the smallest and largest eigenvalues of .

(C8) .

(C9) Let and , and there exist two positive constants and such that when .

Without loss of generality, we define the vector of the optimal weights

 wo=(wo0,wo1,...,wopn)T=(wo0,wTo(1),wTo(2))T,

where stands for non-zero weights with dimension and is zero weights with dimension . Let and be the estimators of and respectively.

Define , , is first vector of , and are the top-left submatrix of and , and . Let with , with and . Obviously, the mean of is zero, and we define . Let , and , where is the sign function. Define and . In the following theorems, we give the asymptotic theories of and .

Theorem 1. Suppose that is an interior point of the support of . Under the regularity conditions (C1)–(C4), if and for

, then the asymptotic conditional bias and variance of the local linear estimator

are given by

Furthermore, conditioning on , we have

for , where stands for convergence in distribution.

Remark 1. Theorem 1 shows that the proposed nonparametric estimate is

consistent and enjoys a asymptotically normal distribution.

Theorem 2. Under conditions (C1)–(C9), together with for and , if , then such that , we have

(i) there exists a local minimizer of the objective function defined in (2) such that ;

(ii) with probability approaching one;

(iii) .

Remark 2. Theorem 2 indicates that the estimate of the optimal weight is still consistent although the dimension of predictor goes to infinite. Meanwhile, it also shows that the proposed estimate enjoys well-known properties in high dimensional variable selection such as the sparsity and oracle property.

## 4 Numerical studies

We investigate the performance of the proposed approach by three simulation examples and an empirical application. In our numerical studies, we set the kernel function as the Epanechnikov kernel, namely, . Bandwidth selection is crucial in local smoothing since it governs the curvature of the fitted function. Similar to Kai et al. (2011), we use the following formula to choose the bandwidth , where is the selected optimal bandwidth for least squares, and and represent the density function and distribution function of standard normal distribution, respectively. The rule of thumb is used to select the bandwidth . In addition, the tuning parameter in the proposed penalized procedure plays an important role. Lian (2012) had proved that the Schwarz information criterion (SIC) is a consistent variable selection criterion under the framework of fixed dimension. In this paper, we select by minimizing the following modified SIC criterion (MSIC)

 MSIC(λ)=log(Qn(^wn))+dfCnlog(n)/dfCnlog(n)(2n)(2n), (3)

where is the estimated model weight vector for a given , is the number of nonzero coefficients in and diverges with . For example, the MSIC criterion reduces to tradition SIC criterion Lian (2012) when , and the MSIC criterion is more suitable for high dimensional data if is selected as .

In order to investigate the superiority of the proposed method, we consider the following methods: (1) the proposed semiparametric model average quantile prediction (without SCAD penalty, denoted as SMAQP), (2)the proposed penalized semiparametric model average quantile prediction (with SCAD penalty, denoted as PSMAQP), (3) semiparametric model average mean prediction proposed by Li et al. (2015) (without SCAD penalty, denoted as SMAMP), (4) penalized semiparametric model average mean prediction proposed by Chen et al. (2018) (with SCAD penalty, denoted as PSMAMP). SMAMP and PSMAMP aim to forecast the conditional mean function , and detailed descriptions about the two methods can refer to the section 3 of Li et al. (2015) and subsection 2.1 of Chen et al. (2018). The tuning parameter involved in PSMAMP is chosen by the cross-validation according to the advice of Chen et al. (2018), and the R package “ncvreg” can be used to obtain the penalized estimator PSMAMP.

### 4.1 Simulation studies

In all simulation examples, the sample size consists of a training set of size and a testing set of size , namely, .

Example 1. For a clear comparison, we adopt similar settings used in Chen et al. (2018) and generate the random samples from the following model

 Yi=m1(Xi1)+m2(Xi2)+m3(Xi3)+m4(Xi4)+εi,i=1,...,n, (4)

where , , and . We fix and consider and for example 1. The covariates are independently drawn from , and we set the dimension of covariates as which satisfies the theoretical condition , where stands for the largest integer not greater than . Obviously, the first four variables make a significant contribution to estimating the joint multivariate quantile function , while the rest are not. Therefore, we have reasons to believe that the first four model weights are nonzero and the rest are zero. Please note that the model average component given in section 2 is different from reported in model (4) for . Our mission is to achieve the goal of accurately predicting the conditional quantile function , so we are not attempting to estimate in this paper.

In order to examine the robustness of the proposed procedure, we consider the following three different error distributions of : standard normal distribution (SN),

-distribution with 3 degrees of freedom (

), contaminated normal distribution () representing a mixture of and with weights and respectively. In addition, four criteria are adopted to evaluate the performance of proposed approach. Firstly, “C”, “IC” and “CF” are considered to examine variable selection performance, where “C” represents the average number of zero coefficients in the model weight vector that are correctly estimated to be zero; “IC” represents the average number of nonzero coefficients in the model weight vector that are incorrectly estimated to be zero and “CF” represents the proportion of correctly fitted models (“correctly fit” means that the estimation procedure correctly chooses all significant components from the model weight vector). Secondly, the mean prediction error (MPE) is used to measure accuracy of prediction, which is defined as , where stands for an index set of either the training sample or the testing sample.

Example 2. In this example, similar to Huang and Li (2018), we generate the random samples from the following model

 Yi=3m1(Xi1)+3m2(Xi2)+2m3(Xi3)+2m4(Xi4)+√1.74εi,i=1,...,n, (5)

where , , and . The covariates are simulated by for and , where and are independently drawn from and . We also fix and consider and for example 2. Other settings are the same as that in example 1.

It is easy to find that the conditional mean function is equal to the conditional quantile function for . Thus, we can compare mean prediction approaches (SMAMP and PSMAMP) with quantile prediction approaches (SMAQP and PSMAQP) at . The MPE criterion is reduced to for , and thus this criterion also can be used to assess the prediction performance of mean prediction approaches. The corresponding results of mean prediction approaches (SMAMP and PSMAMP) and quantile prediction approaches (SMAQP and PSMAQP) at are reported in Tables 1 and 3. We can obtain the following findings. Firstly, the values in the column labeled “C” gradually tend to the true number of zero components with the training sample size increasing. The CF values are very close to one for a large training sample size (e.g. ), which shows that the proposed penalized procedure can consistently select significant components in weight vector. However, the existing mean prediction approach PSMAMP performs badly due to lower CF values. Secondly, unpenalized methods always has smaller in-sample MPE than the penalized methods’s, but it does not hold true for out-of-sample MPE. For heavy-tailed distributions and contaminated distribution MN, it is not hard to find that our proposed penalized method PSMAQP is best in terms of prediction accuracy among all methods. Meanwhile, there is little difference for PSMAMP and PSMAQP under the normal error distribution. Thirdly, Tables 2 and 4 give the simulation results of SMAQP and PSMAQP at . The results also show that PSMAQP has better prediction performance.

Example 3. The conditional quantile function is considered as

 mτ(X)=Qτ(Y|X)=1+Φ−1(τ)+2Xi1+3X2i2−log(1−Xi3)+Φ−1(Xi4)+                              (1+Φ−1(τ)−log(1−τ))Xi5,i=1,...,n, (6)

where is the standard normal distribution function, are independently drawn from and can be regarded as the intercept. We fix and consider and for example 3 and . Obviously, the fifth covariate’s coefficient varies with , and only the first five predictors are significant for predicting . The first two examples come from the nonparametric additive model, but the proposed approach do not need any model assumption. Thus we aim to confirm that our method is model free in this example. To assess the estimation accuracy of , we consider the mean estimation error (MEE) defined as in this example. Table 5 lists the simulation results which show that the proposed PSMAQP performs well for different quantiles.

Overall, the proposed model free procedure PSMAQP is competitive when compared with the existing methods, and its finite sample performances are satisfactory.

### 4.2 An application

In this section, we apply our proposed method to analyze the body fat dataset (Johnson, 1996), which is available from http://lib.stat.cmu.edu/datasets/bodyfat. This dataset consists of 252 observations without missing values. The purpose of studying this dataset is to predict the percentage of body fat according to various body circumference measurements. Thus, the percentage of body fat is taken as the response variable and other body circumference measurements are regarded as the predictors. Brief descriptions and marginal Pearson correlations of 14 variables are summarized in Table 6. More details can refer to Johnson (1996). Before employing prediction methods, we take the logarithm transformation for all predictors.

To evaluate the predictive performance of various methods, the data is split into two parts. One part including observations is used as a training data set to estimate the weight vector and the marginal quantile functions , and the other part including observations is considered as a testing data set to evaluate the predictive ability of various methods. In this real data analysis, we consider and , and .

Table 7 reports the in-sample and out-of-sample mean prediction errors (MPE) and the corresponding sample standard deviations (SD) over 500 random partitions. Firstly, for and in-sample performance, it is easy to see that SMAQP performs best among four approaches. For out-of-sample performance, one can see clearly that the proposed penalized approach PSMAQP has smallest MPE and SD for different settings, which shows that our proposed method has better predictive ability. Secondly, for and , we can see that PSMAQP always performs better than SMAQP in terms of out-of-sample performance.

To investigate the estimated weights, we list the estimated weights at and their standard deviations (in brackets) calculated by the bootstrap resampling method (Horowitz, 1998). Obviously, the weights for the penalized prediction methods (PSMAMP and PSMAQP) are relatively sparse with much smaller standard deviations than the unpenalized prediction methods (SMAMP and SMAQP). Meanwhile, it is not hard to find that PSMAQP is most efficient among all methods due to the smallest standard deviations. In addition, for PSMAMP, only the sixth predictor () is chose as the significant variable whose marginal quantile function has significant influence on estimating . However, PSMAQP selects five predictors (including , , , and ) as the significant variables. In summary, our proposed model averaging procedure generally works well and outperforms other existing methods.

## 5 Conclusion

In this paper, we provide a new semiparametric model averaging estimation for forecasting the conditional quantile function under the high-dimensional settings. Based on local linear regression, we firstly estimate the individual marginal regression functions by minimizing the local weighted quantile loss function. Then, a penalized quantile regression is developed to select the regressors whose marginal regression functions make significant contribution in estimating the quantile function . Simulations and empirical example in Section 4 show that the proposed method performs reasonably well in finite samples.

Recently, under the framework of ultra-high dimension setting, Ando and Li (2014) developed a new model averaging approach based on delete-one cross-validation criterion and proved that the proposed method could achieve the lowest possible prediction loss asymptotically. But they only considered high dimensional parametric model averaging, which may increase the risk of model misspecification. Thus, it is interesting to study semiparametric model averaging estimation for ultra-high dimensional data. Research in these aspects is ongoing.

Acknowledgments
This work is supported by the National Social Science Fund of China (Grant No. 17CTJ015).

Appendix

Let denote a positive constant that may be different at different place throughout this paper. Let , , and . Define .

Lemma 1. Denote as the minimizer of (1). Then, under the regularity conditions (C1)–(C4), we have

 ^θ+1fj(x)S−1E[W∗n∣∣Xj]d→N(0,1fj(x)S−1ΣS−1),

where and .

Proof of Lemma 1. To apply the identity (Knight, 1998)

 ρ