1 Introduction
For the classical linear regression model , we are interested in the problem of variable selection and estimation, where
is the response vector,
is an design matrix, and is a random vector. The main topic is how to estimate the coefficients vector when increases with sample size and many elements of equal zero. We can transfer this problem into a minimization of a penalized least squares objective functionwhere is the norm of the vector, is a tuning parameter, and a penalty term. We have known that least squares estimation is not robust, especially when the data exists abnormal values or the error term has the heavy tailed distribution.
In this paper we consider the loss function be least absolute deviation,i.e., minimize the following objective function:
where the loss function is least absolute deviation(LAD for short), that does not need the noise obeys a gaussian distribution and be more robust than least squares estimation. In fact, LAD estimation is the special case of M-estimation, which is named by Huber(1964, 1973, 1981)
[1] [2] [3]firstly and can be obtained by minimizing the objective functionwhere the function can be selected. For example, if we choose , where , Huber estimator can be obtained; if we choose , where , estimator will be obtained, with two special cases: LAD estimator for and OLS estimator for . If we choose , where , we call it quantile regression, and can also get LAD estimator for especially.
When approaches infinity as tends to infinity, we assume that the function is convex and not monotone, and the monotone function is the derivative of . By imposing the appropriate regularity conditions,
Huber(1973), Portnoy(1984)[4],Welsh(1989)[5] and Mammen(1989)[6] have proved that the M-estimator enjoyed the properties of consistency and asymptotic normality, where Welsh(1989) gave the weaker condition imposed on and the stronger condition on .
Bai and Wu [7] further pointed that the condition on could be a part of the integrable condition imposed on design matrix. Moreover, He and Shao(2000)[8] studied the asymptotic properties of M-estimator in the case of the generalized model setting and the dimension getting bigger and bigger. Li(2011)[9] obtained the Oracle property of non-concave penalized M-estimator in high-dimensional model with the condition of , and proposed RSIS to make variable selection by applying rank sure independence screening method in the ultra high-dimensional model. Zou and Li(2008)[10] combined penalized function and local linear approximation method(LLA) to prove that the obtained estimator enjoyed good asymptotic properties, and demonstrated this method improved the computational efficiency of local quadratic approximation(LQA) in the part of simulation.
Inspired by this, in this paper we consider the following problem:
where is the derivative of the penalized function, and is the non-penalized estimator.
In this paper, we assume that the function is convex, hence the objective function is still convex and the obtained local minimizer is global minimizer.
2 Main results
For the convenience of statement, we first give some notations. Let be the true parameter. Without loss of generality, we assume the first coefficients of covariates are nonzero, be coviariates with zero coefficients. correspondingly. For the given symmetric matrix , denote by and
the minimum and maximum eigenvalue of
, respectively. Denote and where . Finally we denote that .Next, we state some assumptions which will be needed in the following results.
The function is convex on , and its left derivative and right derivative satisfies that
.
The error term is i.i.d, and the distribution function of satisfies , where is the set of discontinuous points of .
Moreover,
, and
, where. Besides these, we assume that
.
There exist constants such that and
.
, .
Let be the transpose of the th row vector of , such that
It is worth mentioning that conditions and are classical assumptions for M-estimation in linear model, which can be found in many references, for example Bai, Rao and Wu(1992)[11]and Wu(2007)[12]. The condition is frequently used for sparse model in the linear model regression theory, which requires that the eigenvalues of the matrices and are bounded. The condition is weaker than that in previous references. In the condition we broad the order of to , but in the references Huber(1973) and Li,Peng and Zhu(2011)[9] they required that , Portnoy(1984) required , and Mammen(1989) required . Compared with these results, it is obvious that our sparse condition is much weaker. The condition is the same as that in Huang, Horowitz and Ma(2008)[13], which is used to prove the asymptotic properties of the nonzero part of M-estimation.
Theorem 2.1 (Consistency of estimator) If the conditions hold, there exists a non-concave penalized M-estimation , such that
Remark 2.1 From Theorem 2.1, we can obtain that there exists a global M-estimation if we choose the appropriate tuning parameter , moreover this M-estimation is -consistent. This convergence rate is the same as that in the references Huber(1973) and Li,Peng and Zhu(2011).
Theorem 2.2 (The sparse of the model) If the conditions hold and , for the non-concave penalized M-estimation we have
Remark 2.2
By Theorem 3.2, we can get that under the suitable conditions the global M-estimation of zero-coefficient variables goes to zero with a high probability when
is large enough. This also shows that the model is sparse.Theorem 2.3 (Oracle property) If the conditions hold and , with probability converging to one the non-concave penalized M-estimation has the following properties:
(1)(The consistency of the model selection);
(2)(Asymptotic normality)
where , and is any dimensional vector such that . Meanwhile, is the transpose of the th row vector of a matrix .
Remark 2.3 From Theorem 2.3, M-estimation enjoys Oracle property, that is, the adaptive bridge estimator can correctly select covariates with nonzero coefficients with probability converging to one and that the estimator of nonzero coefficients has the same asymptotic distribution that they would have if the zero coefficients were known in advance.
Remark 2.4 In Fan and Peng(2004)[14], the authors obtained that the non-concave penalized M-estimation has the property of consistency with the condition ,
and enjoyed the property of asymptotic normality with the condition . By Theorem 3.1-3.3, we can see that the corresponding conditions we exert is quite weak.
3 Proofs of main results
The proof of Theorem 2.1: Let , where is a any -dimensional vector such that . In the following part we only need to prove that there exists a great enough positive constant such that
for any , that is, there at least exists a local minimizer such that in the closed ball . Firstly by the triangle inequality we can get that
(3.2) | ||||
where , . Noticing that
(3.3) | ||||
where , ,
combining with Von-Bahr Esseen inequality and the fact that , we instantly have
hence
Secondly for , let , where , so
We can easily obtain . From Von-Bahr Esseen inequality, Schwarz inequality and the condition , it follows that
together by Markov inequality yields that
hence
As for ,
Finally considering , we can easily obtain
This together with (3.3)-(3.7) yields that we can choose a great enough constant such that and is controlled by , which follows that there at least exists a local minimizer such that
in the closed ball .
The proof of Theorem 2.2: From Theorem 2.1, as long as we choose a great enough constant and appropriate , then will be in the ball with probability converging to one, where . For any -dimensional vector , now we denote , where . Meanwhile let
then by minimizing we can obtain the estimator , where . In the following part, we will prove that as long as ,
holds, for any -dimensional vector . We can easily find the fact that
where and are and dimensional vectors respectively such that . Similar to the proof of Theorem 3.1, we get that
and
By formula (3.10)-(3.12) and the condition , it follows that
which yields that as long as ,
holds, for any -dimensional vector .
The proof of Theorem 2.3: It is obvious that the conclusion (1) can be obtained instantly by Theorem 2.2, so we only need to prove the conclusion (2). It follows from Theorem 2.1 that is consistent of and with probability converging to one from Theorem 2.2. Therefore holds that
that is
where
In the following part we give the Taylor expansion of upper left first term:
Noticing that , we have
which yields that
Then as long as
Comments
There are no comments yet.