Ensemble minimaxity of James-Stein estimators

by   Yuzo Maruyama, et al.
University of Pennsylvania

This article discusses estimation of a multivariate normal mean based on heteroscedastic observations. Under heteroscedasticity, estimators shrinking more on the coordinates with larger variances, seem desirable. Although they are not necessarily minimax in the ordinary sense, we show that such James-Stein type estimators can be ensemble minimax, minimax with respect to the ensemble risk, related to empirical Bayes perspective of Efron and Morris.


page 1

page 2

page 3

page 4


Admissible estimators of a multivariate normal mean vector when the scale is unknown

We study admissibility of a subclass of generalized Bayes estimators of ...

On estimation of nonsmooth functionals of sparse normal means

We study the problem of estimation of the value N_gamma(θ) = sum(i=1)^d ...

Bayes Minimax Competitors of Preliminary Test Estimators in k Sample Problems

In this paper, we consider the estimation of a mean vector of a multivar...

Leveraging vague prior information in general models via iteratively constructed Gamma-minimax estimators

Gamma-minimax estimation is an approach to incorporate prior information...

Efficient Minimax Optimal Estimators For Multivariate Convex Regression

We study the computational aspects of the task of multivariate convex re...

On admissible estimation of a mean vector when the scale is unknown

We consider admissibility of generalized Bayes estimators of the mean of...

Enhanced Balancing of Bias-Variance Tradeoff in Stochastic Estimation: A Minimax Perspective

Biased stochastic estimators, such as finite-differences for noisy gradi...

1 Introduction

Let where , and . Let us assume


We are interested in the estimation of

with respect to the ordinary squared error loss function


where the risk of an estimator is . The MLE with constant risk is shown to be extended Bayes and hence minimax for any and any .

In the homoscedastic case , James and Stein (1961) showed that the shrinkage estimator


dominates the MLE for . There is some literature discussing the minimax properties of shrinkage estimators under heteroscedasticity. Brown (1975) showed that the James-Stein estimator (1.3) is not necessarily minimax when the variances are not equal. Specifically, it is not minimax for any when . Berger (1976) showed that


is minimax for and any . However, Casella (1980) argued that the James-Stein estimator (1.4) may not be desirable even if it is minimax. Ordinary minimax estimators, as in (1.4), typically shrink most on the coordinates with smaller variances. From Casella’s (1980) viewpoint, one of the most natural Jame-Stein variants is


which we are going to rescue, by providing some minimax properties related to Bayesian viewpoint.

In many applications, are thought to follow some exchangeable prior distribution . It is then natural to consider the compound risk function which is then the Bayes risk with respect to the prior


Efron and Morris (1971); Efron and Morris (1972a, b, 1973) addressed this problem from both the Bayes and empirical Bayes perspective. In particular, they considered a prior distribution with , and used the term “ensemble risk” for the compound risk. By introducing a set of ensemble risks


we can define ensemble minimaxity with respect to a set of priors


that is, an estimator is said to be ensemble minimax with respect to if


As a matter of fact, the second author in his unpublished manuscript, Brown, Nie and Xie (2011), has already introduced the concept of ensemble minimaxity. In this article, we follow their spirit but propose a simpler and clearer approach for establishing ensemble minimaxity of estimators.

Our article is organized as follows. In Section 2, we elaborate the definition of ensemble minimaxity and explain Casella’s (1980) viewpoint on the contradiction between minimaxity and well-conditioning. In Section 3, we show the ensemble minimaxity of various shrinkage estimators including a variant of the James-Stein estimator


as well as the generalized Bayes estimator with respect to the hierarchical prior


which is a generalization of the harmonic prior for the heteroscedastic case.

2 Minimaxity, Ensemble Minimaxity and Casella’s viewpoint

If the prior were known, the resulting posterior mean would then be the optimal estimate under the sum of the squared error loss. However, it is typically not feasible to exactly specify the prior. One approach to avoid excessive dependence on the choice of prior, is to consider a set of priors on and study the properties of estimators based on the corresponding set of ensemble risks. As in classical decision theory, there rarely exists an estimator that achieves the minimum ensemble risk uniformly for all . A more realistic goal as pursued in this paper is to study the ensemble minimaxity of James-Stein type estimators.

Recall that with ordinary risk , is said to be minimax if


Similarly for the case of ensemble risk we have the following definition. Note the Bayes risk of under the prior is given by (1.6). The estimator is said to be ensemble minimax with respect to if


The motivation for the above definitions comes from the use of the empirical Bayes method in simultaneous inference.

Efron and Morris (1972b), derived the James-Stein estimator through the parametric empirical Bayes model with . Note that in such an empirical Bayes model, is the unknown non-random parameter. Given the family , the Bayes risk is a function of as follows,


Hence, with , the estimator is said to be ensemble minimax with respect to if


which may be seen as the counterpart of ordinary minimaxity in the empirical Bayes model.

Clearly the usual estimator has constant risk, has constant Bayes risk and hence is ensemble minimax. Then the ensemble minimaxity of follows if

Remark 2.1.

Note that ensemble minimaxity can also be interpreted as a particular case of Gamma minimaxity studied in the context of robust Bayes analysis by Good (1952); Berger (1979). However, in such studies, a “large” set consisting of many diffuse priors are usually included in the analysis. Since this is quite different from our formulation of the problem, we use the term ensemble minimaxity throughout our paper, following the Efron and Morris papers cited above.

A class of shrinkage estimators which we consider in this paper, is given by


where with

Berger and Srinivasan (1978) showed, in their Corollary 2.7, that, given positive-definite and non-singular , a necessary condition for an estimator of the form

to be admissible is for some constant , which is satisfied by estimators among the class of (2.5).

A version of Baranchik’s (1964) sufficient condition for ordinary minimaxity is given in Appendix A; For given which satisfies

given by (2.5) is ordinary minimax if


Berger (1976) showed that, for any given ,

which seems the right choice of . However, from the “conditioning” viewpoint of Casella (1980) which advocates more shrinkage on higher variance estimates, the descending order


is desirable, whereas corresponding to the ascending order under given by (1.1). As Casella (1980) pointed out, ordinary minimaxity cannot be enjoyed together with well-conditioning given by (2.7) when

for some . In fact, when and , we have

and hence follows. The motivation of Casella (1980, 1985) seems to provide a better treatment for the case. Actually Brown (1975) pointed out essentially the same phenomenon from a slightly different viewpoint.

Ensemble minimaxity, based on ensemble risk given by (1.7), provides a way of saving shrinkage estimators with well-conditioning, estimators which are not necessarily ordinary minimax.

3 Ensemble minimaxity

3.1 A general theorem

We have the following theorem on ensemble minimaxity of with general , though we will eventually focus on with with the descending order as in (2.7).

Theorem 3.1.

Assume is non-negative, non-decreasing and concave. Also is assumed non-increasing. Then

is ensemble minimax if


Recall, for ,

Then the posterior and marginal are given by

respectively, where are mutually independent and are mutually independent. Then the Bayes risk is given by

Since the first term of the r.h.s. of the above equality is rewritten as

we have




and and are mutually independent. With the notation, we have

and hence

Since is non-increasing and is non-decreasing, by the correlation inequality, we have

and hence


In the first part of the r.h.s. of the inequality (3.3), we have


where the first and second inequality follow from the correlation inequality and Jensen’s inequality, respectively. In the second part of the r.h.s. of the inequality (3.3), by the inequality

we have


By (3.3), (3.4) and (3.5), we have


and, by (3.2) and (3.6),

which guarantees for all under the condition (3.1). ∎

Given , the choice with descending order , is one of the most natural choice of from Casella’s (1980) viewpoint. In this case, we have

and hence a following corollary.

Corollary 3.1.

Assume that is non-negative, non-decreasing and concave. Also is assumed non-increasing. Then

is ensemble minimax if


3.2 An ensemble minimax James-Stein variant

As an example of Corollary 3.1, we consider


for and , which is motivated by Stein (1956) and James and Stein (1961). Under , Stein (1956) suggested that there exist estimators dominating the usual estimator among a class of estimators with given by (3.8) for small and large . Following Stein (1956), James and Stein (1961) showed that with and is ordinary minimax. The choice is, however, not good since, by Corollary 3.1, cannot be larger than . With positive , we can see that can be much larger as follows.

Note that given by (3.8) is non-negative, increasing and concave and that is decreasing. Then the sufficient condition in (3.7) is

which is equivalent to


Hence we have a following result.

Theorem 3.2.
  1. When


    the shrinkage estimator

    is ensemble minimax.

  2. It is ordinary minimax if

Part 2 above follows from Theorem A.1.

It seems to us that one of the most interesting estimators with ensemble minimaxity from Part 1 is


with the choice satisfying (3.9). It is clear that the -th shrinkage factor

is nonnegative for any and any , which is a nice property.

3.3 A generalized Bayes ensemble minimax estimator

In this subsection, we provide a generalized Bayes ensemble minimax estimator. Following Strawderman (1971), Berger (1976) and Maruyama and Strawderman (2005), we consider the generalized harmonic prior


where satisfies . Note that for , the density of is exactly , since and

The prior is called the harmonic prior and was originally investigated by Baranchik (1964) and Stein (1974). Berger (1980) and Berger and Strawderman (1996) recommended the use of the prior (3.11) mainly because it is on the boundary of admissibility.

By the way of Strawderman (1971), the generalized Bayes estimator with respect to the prior is given by



where satisfies the following properties

  1. [label= H0]

  2. is increasing in .

  3. is concave.

  4. .

  5. is decreasing in .

  6. The derivative of at is .

Under the choice and with the condition of Corollary 3.1, we have a following result.

Theorem 3.3.
  1. The estimator is ensemble minimax.

  2. The estimator is ordinary minimax when

  3. The estimator is conventional admissible.


[Part 1] Recall that the sufficient condition for ensemble minimaxity is given by Corollary 3.1. By 15, we have only to check (3.7) in Corollary 3.1.

For , we have

By the properties 1 and 3,

for . Hence for , it follows that

So it suffices to show

when and . By the properties 2 and 5, we have for all . Then