Ensemble minimaxity of James-Stein estimators

06/22/2022
by   Yuzo Maruyama, et al.
University of Pennsylvania
0

This article discusses estimation of a multivariate normal mean based on heteroscedastic observations. Under heteroscedasticity, estimators shrinking more on the coordinates with larger variances, seem desirable. Although they are not necessarily minimax in the ordinary sense, we show that such James-Stein type estimators can be ensemble minimax, minimax with respect to the ensemble risk, related to empirical Bayes perspective of Efron and Morris.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

03/19/2020

Admissible estimators of a multivariate normal mean vector when the scale is unknown

We study admissibility of a subclass of generalized Bayes estimators of ...
05/28/2018

On estimation of nonsmooth functionals of sparse normal means

We study the problem of estimation of the value N_gamma(θ) = sum(i=1)^d ...
11/30/2017

Bayes Minimax Competitors of Preliminary Test Estimators in k Sample Problems

In this paper, we consider the estimation of a mean vector of a multivar...
12/10/2020

Leveraging vague prior information in general models via iteratively constructed Gamma-minimax estimators

Gamma-minimax estimation is an approach to incorporate prior information...
05/06/2022

Efficient Minimax Optimal Estimators For Multivariate Convex Regression

We study the computational aspects of the task of multivariate convex re...
02/24/2021

On admissible estimation of a mean vector when the scale is unknown

We consider admissibility of generalized Bayes estimators of the mean of...
02/12/2019

Enhanced Balancing of Bias-Variance Tradeoff in Stochastic Estimation: A Minimax Perspective

Biased stochastic estimators, such as finite-differences for noisy gradi...

1 Introduction

Let where , and . Let us assume

(1.1)

We are interested in the estimation of

with respect to the ordinary squared error loss function

(1.2)

where the risk of an estimator is . The MLE with constant risk is shown to be extended Bayes and hence minimax for any and any .

In the homoscedastic case , James and Stein (1961) showed that the shrinkage estimator

(1.3)

dominates the MLE for . There is some literature discussing the minimax properties of shrinkage estimators under heteroscedasticity. Brown (1975) showed that the James-Stein estimator (1.3) is not necessarily minimax when the variances are not equal. Specifically, it is not minimax for any when . Berger (1976) showed that

(1.4)

is minimax for and any . However, Casella (1980) argued that the James-Stein estimator (1.4) may not be desirable even if it is minimax. Ordinary minimax estimators, as in (1.4), typically shrink most on the coordinates with smaller variances. From Casella’s (1980) viewpoint, one of the most natural Jame-Stein variants is

(1.5)

which we are going to rescue, by providing some minimax properties related to Bayesian viewpoint.

In many applications, are thought to follow some exchangeable prior distribution . It is then natural to consider the compound risk function which is then the Bayes risk with respect to the prior

(1.6)

Efron and Morris (1971); Efron and Morris (1972a, b, 1973) addressed this problem from both the Bayes and empirical Bayes perspective. In particular, they considered a prior distribution with , and used the term “ensemble risk” for the compound risk. By introducing a set of ensemble risks

(1.7)

we can define ensemble minimaxity with respect to a set of priors

(1.8)

that is, an estimator is said to be ensemble minimax with respect to if

(1.9)

As a matter of fact, the second author in his unpublished manuscript, Brown, Nie and Xie (2011), has already introduced the concept of ensemble minimaxity. In this article, we follow their spirit but propose a simpler and clearer approach for establishing ensemble minimaxity of estimators.

Our article is organized as follows. In Section 2, we elaborate the definition of ensemble minimaxity and explain Casella’s (1980) viewpoint on the contradiction between minimaxity and well-conditioning. In Section 3, we show the ensemble minimaxity of various shrinkage estimators including a variant of the James-Stein estimator

(1.10)

as well as the generalized Bayes estimator with respect to the hierarchical prior

(1.11)

which is a generalization of the harmonic prior for the heteroscedastic case.

2 Minimaxity, Ensemble Minimaxity and Casella’s viewpoint

If the prior were known, the resulting posterior mean would then be the optimal estimate under the sum of the squared error loss. However, it is typically not feasible to exactly specify the prior. One approach to avoid excessive dependence on the choice of prior, is to consider a set of priors on and study the properties of estimators based on the corresponding set of ensemble risks. As in classical decision theory, there rarely exists an estimator that achieves the minimum ensemble risk uniformly for all . A more realistic goal as pursued in this paper is to study the ensemble minimaxity of James-Stein type estimators.

Recall that with ordinary risk , is said to be minimax if

(2.1)

Similarly for the case of ensemble risk we have the following definition. Note the Bayes risk of under the prior is given by (1.6). The estimator is said to be ensemble minimax with respect to if

(2.2)

The motivation for the above definitions comes from the use of the empirical Bayes method in simultaneous inference.

Efron and Morris (1972b), derived the James-Stein estimator through the parametric empirical Bayes model with . Note that in such an empirical Bayes model, is the unknown non-random parameter. Given the family , the Bayes risk is a function of as follows,

(2.3)

Hence, with , the estimator is said to be ensemble minimax with respect to if

(2.4)

which may be seen as the counterpart of ordinary minimaxity in the empirical Bayes model.

Clearly the usual estimator has constant risk, has constant Bayes risk and hence is ensemble minimax. Then the ensemble minimaxity of follows if

Remark 2.1.

Note that ensemble minimaxity can also be interpreted as a particular case of Gamma minimaxity studied in the context of robust Bayes analysis by Good (1952); Berger (1979). However, in such studies, a “large” set consisting of many diffuse priors are usually included in the analysis. Since this is quite different from our formulation of the problem, we use the term ensemble minimaxity throughout our paper, following the Efron and Morris papers cited above.

A class of shrinkage estimators which we consider in this paper, is given by

(2.5)

where with

Berger and Srinivasan (1978) showed, in their Corollary 2.7, that, given positive-definite and non-singular , a necessary condition for an estimator of the form

to be admissible is for some constant , which is satisfied by estimators among the class of (2.5).

A version of Baranchik’s (1964) sufficient condition for ordinary minimaxity is given in Appendix A; For given which satisfies

given by (2.5) is ordinary minimax if

(2.6)

Berger (1976) showed that, for any given ,

which seems the right choice of . However, from the “conditioning” viewpoint of Casella (1980) which advocates more shrinkage on higher variance estimates, the descending order

(2.7)

is desirable, whereas corresponding to the ascending order under given by (1.1). As Casella (1980) pointed out, ordinary minimaxity cannot be enjoyed together with well-conditioning given by (2.7) when

for some . In fact, when and , we have

and hence follows. The motivation of Casella (1980, 1985) seems to provide a better treatment for the case. Actually Brown (1975) pointed out essentially the same phenomenon from a slightly different viewpoint.

Ensemble minimaxity, based on ensemble risk given by (1.7), provides a way of saving shrinkage estimators with well-conditioning, estimators which are not necessarily ordinary minimax.

3 Ensemble minimaxity

3.1 A general theorem

We have the following theorem on ensemble minimaxity of with general , though we will eventually focus on with with the descending order as in (2.7).

Theorem 3.1.

Assume is non-negative, non-decreasing and concave. Also is assumed non-increasing. Then

is ensemble minimax if

(3.1)
Proof.

Recall, for ,

Then the posterior and marginal are given by

respectively, where are mutually independent and are mutually independent. Then the Bayes risk is given by

Since the first term of the r.h.s. of the above equality is rewritten as

we have

(3.2)

Let

Then

and and are mutually independent. With the notation, we have

and hence

Since is non-increasing and is non-decreasing, by the correlation inequality, we have

and hence

(3.3)

In the first part of the r.h.s. of the inequality (3.3), we have

(3.4)

where the first and second inequality follow from the correlation inequality and Jensen’s inequality, respectively. In the second part of the r.h.s. of the inequality (3.3), by the inequality

we have

(3.5)

By (3.3), (3.4) and (3.5), we have

(3.6)

and, by (3.2) and (3.6),

which guarantees for all under the condition (3.1). ∎

Given , the choice with descending order , is one of the most natural choice of from Casella’s (1980) viewpoint. In this case, we have

and hence a following corollary.

Corollary 3.1.

Assume that is non-negative, non-decreasing and concave. Also is assumed non-increasing. Then

is ensemble minimax if

(3.7)

3.2 An ensemble minimax James-Stein variant

As an example of Corollary 3.1, we consider

(3.8)

for and , which is motivated by Stein (1956) and James and Stein (1961). Under , Stein (1956) suggested that there exist estimators dominating the usual estimator among a class of estimators with given by (3.8) for small and large . Following Stein (1956), James and Stein (1961) showed that with and is ordinary minimax. The choice is, however, not good since, by Corollary 3.1, cannot be larger than . With positive , we can see that can be much larger as follows.

Note that given by (3.8) is non-negative, increasing and concave and that is decreasing. Then the sufficient condition in (3.7) is

which is equivalent to

or

Hence we have a following result.

Theorem 3.2.
  1. When

    (3.9)

    the shrinkage estimator

    is ensemble minimax.

  2. It is ordinary minimax if

Part 2 above follows from Theorem A.1.

It seems to us that one of the most interesting estimators with ensemble minimaxity from Part 1 is

(3.10)

with the choice satisfying (3.9). It is clear that the -th shrinkage factor

is nonnegative for any and any , which is a nice property.

3.3 A generalized Bayes ensemble minimax estimator

In this subsection, we provide a generalized Bayes ensemble minimax estimator. Following Strawderman (1971), Berger (1976) and Maruyama and Strawderman (2005), we consider the generalized harmonic prior

(3.11)

where satisfies . Note that for , the density of is exactly , since and

The prior is called the harmonic prior and was originally investigated by Baranchik (1964) and Stein (1974). Berger (1980) and Berger and Strawderman (1996) recommended the use of the prior (3.11) mainly because it is on the boundary of admissibility.

By the way of Strawderman (1971), the generalized Bayes estimator with respect to the prior is given by

(3.12)

with

where satisfies the following properties

  1. [label= H0]

  2. is increasing in .

  3. is concave.

  4. .

  5. is decreasing in .

  6. The derivative of at is .

Under the choice and with the condition of Corollary 3.1, we have a following result.

Theorem 3.3.
  1. The estimator is ensemble minimax.

  2. The estimator is ordinary minimax when

  3. The estimator is conventional admissible.

Proof.

[Part 1] Recall that the sufficient condition for ensemble minimaxity is given by Corollary 3.1. By 15, we have only to check (3.7) in Corollary 3.1.

For , we have

By the properties 1 and 3,

for . Hence for , it follows that

So it suffices to show

when and . By the properties 2 and 5, we have for all . Then