On estimation of nonsmooth functionals of sparse normal means

by   Olivier Collier, et al.

We study the problem of estimation of the value N_gamma(θ) = sum(i=1)^d |θ_i|^gamma for 0 < gamma <= 1 based on the observations y_i = θ_i + ϵξ_i, i = 1,...,d, where θ = (θ_1,...,θ_d) are unknown parameters, ϵ>0 is known, and ξ_i are i.i.d. standard normal random variables. We prove that the non-asymptotic minimax risk on the class B_0(s) of s-sparse vectors and we propose estimators achieving the minimax rate.


page 1

page 2

page 3

page 4


Leveraging vague prior information in general models via iteratively constructed Gamma-minimax estimators

Gamma-minimax estimation is an approach to incorporate prior information...

Ensemble minimaxity of James-Stein estimators

This article discusses estimation of a multivariate normal mean based on...

Structural adaptation in the density model

This paper deals with non-parametric density estimation on ^2 from i.i.d...

Entropies of sums of independent gamma random variables

We establish several Schur-convexity type results under fixed variance f...

A note on sum and difference of correlated chi-squared variables

Approximate distributions for sum and difference of linearly correlated ...

Rate of Estimation for the Stationary Distribution of Stochastic Damping Hamiltonian Systems with Continuous Observations

We study the problem of the non-parametric estimation for the density π ...

Minimax Structured Normal Means Inference

We provide a unified treatment of a broad class of noisy structure recov...

1. Introduction

In recent years, there has been a growing interest in statistical estimation of non-smooth functionals [1, 6, 13, 14, 7, 8, 2, 5]. Some of these papers deal with the normal means model [1, 2] addressing the problems of estimation of the -norm and of the sparsity index, respectively. In the present paper, we analyze a family of non-smooth functionals including, in particular, the -norm. We establish non-asymptotic minimax optimal rates of estimation on the classes of sparse vectors and we construct estimators achieving these rates.

Assume that we observe


where is an unknown vector of parameters, is a known noise level, and are i.i.d. standard normal random variables. We consider the problem of estimating the functionals

assuming that the vector is -sparse, that is, belongs to the class

Here, denotes the number of nonzero components of and . We measure the accuracy of an estimator of by the maximal quadratic risk over :

Here and in the sequel, we denote by

the expectation with respect to the joint distribution

of satisfying (1).

In this paper, for all we propose rate optimal estimators in a non-asymptotic minimax sense, that is, estimators such that

where denotes the infimum over all estimators and, for two quantities and possibly depending on , we write if there exist positive constants that may depend only on such that . We also establish the following explicit non-asymptotic characterization of the minimax risk :


Note that the rate on the right hand side of (2) is an increasing function of , which is slightly greater than for much smaller than , equal to for , and slightly smaller than for much greater than .

In the case , , the same minimax risk was studied in Cai and Low [1], where it was proved that

and also claimed that for with , which agrees with (2).

We see from (2) that, for the general sparsity classes and any , there exist two different regimes with an elbow at . We call them the sparse zone and the dense zone. The estimation methods for these two regimes are quite different. In the sparse zone, where is smaller than , we show that one can use suitably adjusted thresholding to achieve optimality. In this zone, rate optimal estimators can be obtained based on the techniques developed in [3] to construct minimax optimal estimators of linear and quadratic functionals. In the dense zone, where is greater than , we use another approach. We follow the general scheme of estimation of non-smooth functionals from [9] and our construction is especially close in the spirit to [1]. Specifically, we consider the best polynomial approximation of the function

in a neighborhood of the origin and plug in unbiased estimators of the coefficients of this polynomial. Outside of this neighborhood, for

such that is, roughly speaking, greater than the ”noise level” of the order , we use as an estimator of . The main difference from the estimator suggested in [1] for lies in the fact that, for the polynomial approximation part, we need to introduce a block structure with exponentially increasing blocks and carefully chosen thresholds depending on . This is needed to achieve optimal bounds for all in the dense zone and not only for (or comfortably greater than ).

This paper is organized as follows. In Section 2, we introduce the estimators and state the upper bounds for their risks. Section 3 provides the matching lower bounds. The rest of the paper is devoted to the proofs. In particular, some useful results from approximation theory are collected in Section 6.

2. Definition of estimators and upper bounds for their risks

In this section, we propose two different estimators, for the dense and sparse regimes defined by the inequalities and , respectively. Recall that, in the Introduction, we used the inequalities and , respectively, to define the two regimes. The factor 4 that we introduce in the definition here is a matter of convenience for the proofs. We note that such a change does not influence the final result since the optimal rate (cf. (2)) is the same, up to a constant, for all such that .

2.1. Dense zone:

For any positive integer , we denote by the best approximation of by polynomials of degree at most on the interval , that is

where is the class of all real polynomials of degree at most . Since is an even function, it suffices to consider approximation by polynomials of even degree. The quality of the best polynomial approximation of is described by Lemma 7 below.

We denote by the coefficients of the canonical representation of :

and by the th Hermite polynomial

To construct the estimator in the dense zone, we use the sample duplication device, i.e., we transform into randomized observations as follows. Let be i.i.d. random variables such that and are independent of . Set

Then, , for where and the random variables are mutually independent.

Define the estimator of as follows:





Here and in what follows denotes the indicator function, and is a constant that will be chosen small enough (see the proof of Theorem 1 below).

We will show that the estimator is optimal in a non-asymptotic minimax sense on the class in the dense zone. The next theorem provides an upper bound on the risk of in this zone.

Theorem 1.

Let the integers and be such that and let . Then the estimator defined in (3) satisfies

where is a constant depending only on .

2.2. Sparse zone:

If belongs to the sparse zone we do not invoke the sample duplication and we use the estimator



The next theorem establishes an upper bound on the risk of this estimator.

Theorem 2.

Let the integers and be such that and . Then the estimator defined in (5) satisfies

where is a constant depending only on .

Note that, intuitively, the optimal estimator in the sparse zone can be viewed as an example of applying the following routine developed in [3]. We start from the optimal estimator in the case and we threshold every term. Then, we center every term by its mean under the assumption that there is no signal. Finally, we choose a threshold that makes the best compromise between the first and second type errors in the support estimation problem. The only subtle ingredient in applying this argument in the present context is that we drop the polynomial part, which would almost always be removed by thresholding. In fact, one can notice that the polynomial approximation is only useful in a neighborhood of but in the sparse zone we renounce to estimating small instances of .

3. Lower bounds

We denote by the set of all monotone non-decreasing functions such that and .

Theorem 3.

Let be integers such that and let

be any loss function in the class

. There exist positive constants and depending only on and such that

where denotes the infimum over all estimators.

The proof follows the lines of the proof of the lower bound in [3, Theorem 1] with the only difference that should be replaced by . Note that though Theorem 3 is valid for all the bound becomes suboptimal in the dense zone.

Theorem 4.

Let be integers such that and let be any loss function in the class . There exist positive constants and depending only on and and a constant depending only on such that, if , then

where denotes the infimum over all estimators.

In the case of quadratic loss , combining these two theorems with the bounds of Theorems 1 and 2, immediately leads to the relation (2).

4. Proofs of the upper bounds

Throughout the proofs, we denote by positive constants that can depend only on and may take different values on different appearances.

4.1. Proof of Theorem 1

Denote by the support of

. We start with a bias-variance decomposition

leading to the bound


where is the bias of as an estimator of and is its variance. We now bound separately the four terms in (6).

Bias for . If , then using Lemma 2 we obtain

The last exponential is smaller than by the definition of , so that


Variance for . If , then


The last term in (8) is bounded from above as in item Next, in view of Lemma 3,

if is chosen such that . Here, we use the assumption . For , we use Lemma 3 to obtain

if we chose such that . In conclusion, under this choice of , using the facts that and we get


Bias for . If , the bias has the form

where . We will analyze this expression separately in three different ranges of values of .

Case . In this case, we use the bound

where . Since for all , we can use Lemma 4 to obtain


In addition, using Lemma 1 we get

where and we have used the inequalities and . It follows that


Case . Let be the integer such that . We have


where . Analogously to (10) we find

Next, Lemma 1 and the fact that imply


Finally, we consider the first sum on the right hand side of (12). Notice that

since for . Using these inequalities and Lemma 5 we get

Choose such that . As , this yields



where we have used that . Since , this also implies that (14) does not exceed

Combining the above arguments yields


Case . Recall that the bias has the form

where . Using Lemma 5 we get

and the last upper bound is smaller than if is small enough. On the other hand, it follows from (13) that . Thus,

Finally, we get


Variance for . We consider the same three cases as in item above. For the first two cases, it suffices to use a coarse bound granting that, for all ,


where .

Case . In this case, we deduce from (17) that

where . Lemma 4 and the fact that imply

Hence, if is small enough, we conclude that


Case . As in item above, we denote by the integer such that . We deduce from (17) that

where . The last two terms on the right hand side are controlled as in item . For the first term, we find using Lemma 5 that, for ,


Choosing small enough allows us to obtain the desired bound


Case . We first note that