In recent years, there has been a growing interest in statistical estimation of non-smooth functionals [1, 6, 13, 14, 7, 8, 2, 5]. Some of these papers deal with the normal means model [1, 2] addressing the problems of estimation of the -norm and of the sparsity index, respectively. In the present paper, we analyze a family of non-smooth functionals including, in particular, the -norm. We establish non-asymptotic minimax optimal rates of estimation on the classes of sparse vectors and we construct estimators achieving these rates.
Assume that we observe
where is an unknown vector of parameters, is a known noise level, and are i.i.d. standard normal random variables. We consider the problem of estimating the functionals
assuming that the vector is -sparse, that is, belongs to the class
Here, denotes the number of nonzero components of and . We measure the accuracy of an estimator of by the maximal quadratic risk over :
Here and in the sequel, we denote by
the expectation with respect to the joint distributionof satisfying (1).
In this paper, for all we propose rate optimal estimators in a non-asymptotic minimax sense, that is, estimators such that
where denotes the infimum over all estimators and, for two quantities and possibly depending on , we write if there exist positive constants that may depend only on such that . We also establish the following explicit non-asymptotic characterization of the minimax risk :
Note that the rate on the right hand side of (2) is an increasing function of , which is slightly greater than for much smaller than , equal to for , and slightly smaller than for much greater than .
In the case , , the same minimax risk was studied in Cai and Low , where it was proved that
and also claimed that for with , which agrees with (2).
We see from (2) that, for the general sparsity classes and any , there exist two different regimes with an elbow at . We call them the sparse zone and the dense zone. The estimation methods for these two regimes are quite different. In the sparse zone, where is smaller than , we show that one can use suitably adjusted thresholding to achieve optimality. In this zone, rate optimal estimators can be obtained based on the techniques developed in  to construct minimax optimal estimators of linear and quadratic functionals. In the dense zone, where is greater than , we use another approach. We follow the general scheme of estimation of non-smooth functionals from  and our construction is especially close in the spirit to . Specifically, we consider the best polynomial approximation of the function
in a neighborhood of the origin and plug in unbiased estimators of the coefficients of this polynomial. Outside of this neighborhood, forsuch that is, roughly speaking, greater than the ”noise level” of the order , we use as an estimator of . The main difference from the estimator suggested in  for lies in the fact that, for the polynomial approximation part, we need to introduce a block structure with exponentially increasing blocks and carefully chosen thresholds depending on . This is needed to achieve optimal bounds for all in the dense zone and not only for (or comfortably greater than ).
This paper is organized as follows. In Section 2, we introduce the estimators and state the upper bounds for their risks. Section 3 provides the matching lower bounds. The rest of the paper is devoted to the proofs. In particular, some useful results from approximation theory are collected in Section 6.
2. Definition of estimators and upper bounds for their risks
In this section, we propose two different estimators, for the dense and sparse regimes defined by the inequalities and , respectively. Recall that, in the Introduction, we used the inequalities and , respectively, to define the two regimes. The factor 4 that we introduce in the definition here is a matter of convenience for the proofs. We note that such a change does not influence the final result since the optimal rate (cf. (2)) is the same, up to a constant, for all such that .
2.1. Dense zone:
For any positive integer , we denote by the best approximation of by polynomials of degree at most on the interval , that is
where is the class of all real polynomials of degree at most . Since is an even function, it suffices to consider approximation by polynomials of even degree. The quality of the best polynomial approximation of is described by Lemma 7 below.
We denote by the coefficients of the canonical representation of :
and by the th Hermite polynomial
To construct the estimator in the dense zone, we use the sample duplication device, i.e., we transform into randomized observations as follows. Let be i.i.d. random variables such that and are independent of . Set
Then, , for where and the random variables are mutually independent.
Define the estimator of as follows:
Here and in what follows denotes the indicator function, and is a constant that will be chosen small enough (see the proof of Theorem 1 below).
We will show that the estimator is optimal in a non-asymptotic minimax sense on the class in the dense zone. The next theorem provides an upper bound on the risk of in this zone.
Let the integers and be such that and let . Then the estimator defined in (3) satisfies
where is a constant depending only on .
2.2. Sparse zone:
If belongs to the sparse zone we do not invoke the sample duplication and we use the estimator
The next theorem establishes an upper bound on the risk of this estimator.
Let the integers and be such that and . Then the estimator defined in (5) satisfies
where is a constant depending only on .
Note that, intuitively, the optimal estimator in the sparse zone can be viewed as an example of applying the following routine developed in . We start from the optimal estimator in the case and we threshold every term. Then, we center every term by its mean under the assumption that there is no signal. Finally, we choose a threshold that makes the best compromise between the first and second type errors in the support estimation problem. The only subtle ingredient in applying this argument in the present context is that we drop the polynomial part, which would almost always be removed by thresholding. In fact, one can notice that the polynomial approximation is only useful in a neighborhood of but in the sparse zone we renounce to estimating small instances of .
3. Lower bounds
We denote by the set of all monotone non-decreasing functions such that and .
Let be integers such that and
let be any loss function in the class
be any loss function in the class. There exist positive constants and depending only on and such that
where denotes the infimum over all estimators.
The proof follows the lines of the proof of the lower bound in [3, Theorem 1] with the only difference that should be replaced by . Note that though Theorem 3 is valid for all the bound becomes suboptimal in the dense zone.
Let be integers such that and let be any loss function in the class . There exist positive constants and depending only on and and a constant depending only on such that, if , then
where denotes the infimum over all estimators.
4. Proofs of the upper bounds
Throughout the proofs, we denote by positive constants that can depend only on and may take different values on different appearances.
4.1. Proof of Theorem 1
Denote by the support of
. We start with a bias-variance decomposition
leading to the bound
where is the bias of as an estimator of and is its variance. We now bound separately the four terms in (6).
Bias for . If , then using Lemma 2 we obtain
The last exponential is smaller than by the definition of , so that
Variance for . If , then
if is chosen such that . Here, we use the assumption . For , we use Lemma 3 to obtain
if we chose such that . In conclusion, under this choice of , using the facts that and we get
Bias for . If , the bias has the form
where . We will analyze this expression separately in three different ranges of values of .
Case . In this case, we use the bound
where . Since for all , we can use Lemma 4 to obtain
In addition, using Lemma 1 we get
where and we have used the inequalities and . It follows that
Case . Let be the integer such that . We have
where . Analogously to (10) we find
Next, Lemma 1 and the fact that imply
Finally, we consider the first sum on the right hand side of (12). Notice that
since for . Using these inequalities and Lemma 5 we get
Choose such that . As , this yields
where we have used that . Since , this also implies that (14) does not exceed
Combining the above arguments yields
Case . Recall that the bias has the form
where . Using Lemma 5 we get
and the last upper bound is smaller than if is small enough. On the other hand, it follows from (13) that . Thus,
Finally, we get
Variance for . We consider the same three cases as in item above. For the first two cases, it suffices to use a coarse bound granting that, for all ,
Case . In this case, we deduce from (17) that
where . Lemma 4 and the fact that imply
Hence, if is small enough, we conclude that
Case . As in item above, we denote by the integer such that . We deduce from (17) that
where . The last two terms on the right hand side are controlled as in item . For the first term, we find using Lemma 5 that, for ,
Choosing small enough allows us to obtain the desired bound
Case . We first note that