1 Introduction
Let where , and . Let us assume
(1.1) |
We are interested in the estimation of
with respect to the ordinary squared error loss function
(1.2) |
where the risk of an estimator is . The MLE with constant risk is shown to be extended Bayes and hence minimax for any and any .
In the homoscedastic case , James and Stein (1961) showed that the shrinkage estimator
(1.3) |
dominates the MLE for . There is some literature discussing the minimax properties of shrinkage estimators under heteroscedasticity. Brown (1975) showed that the James-Stein estimator (1.3) is not necessarily minimax when the variances are not equal. Specifically, it is not minimax for any when . Berger (1976) showed that
(1.4) |
is minimax for and any . However, Casella (1980) argued that the James-Stein estimator (1.4) may not be desirable even if it is minimax. Ordinary minimax estimators, as in (1.4), typically shrink most on the coordinates with smaller variances. From Casella’s (1980) viewpoint, one of the most natural Jame-Stein variants is
(1.5) |
which we are going to rescue, by providing some minimax properties related to Bayesian viewpoint.
In many applications, are thought to follow some exchangeable prior distribution . It is then natural to consider the compound risk function which is then the Bayes risk with respect to the prior
(1.6) |
Efron and Morris (1971); Efron and Morris (1972a, b, 1973) addressed this problem from both the Bayes and empirical Bayes perspective. In particular, they considered a prior distribution with , and used the term “ensemble risk” for the compound risk. By introducing a set of ensemble risks
(1.7) |
we can define ensemble minimaxity with respect to a set of priors
(1.8) |
that is, an estimator is said to be ensemble minimax with respect to if
(1.9) |
As a matter of fact, the second author in his unpublished manuscript, Brown, Nie and Xie (2011), has already introduced the concept of ensemble minimaxity. In this article, we follow their spirit but propose a simpler and clearer approach for establishing ensemble minimaxity of estimators.
Our article is organized as follows. In Section 2, we elaborate the definition of ensemble minimaxity and explain Casella’s (1980) viewpoint on the contradiction between minimaxity and well-conditioning. In Section 3, we show the ensemble minimaxity of various shrinkage estimators including a variant of the James-Stein estimator
(1.10) |
as well as the generalized Bayes estimator with respect to the hierarchical prior
(1.11) |
which is a generalization of the harmonic prior for the heteroscedastic case.
2 Minimaxity, Ensemble Minimaxity and Casella’s viewpoint
If the prior were known, the resulting posterior mean would then be the optimal estimate under the sum of the squared error loss. However, it is typically not feasible to exactly specify the prior. One approach to avoid excessive dependence on the choice of prior, is to consider a set of priors on and study the properties of estimators based on the corresponding set of ensemble risks. As in classical decision theory, there rarely exists an estimator that achieves the minimum ensemble risk uniformly for all . A more realistic goal as pursued in this paper is to study the ensemble minimaxity of James-Stein type estimators.
Recall that with ordinary risk , is said to be minimax if
(2.1) |
Similarly for the case of ensemble risk we have the following definition. Note the Bayes risk of under the prior is given by (1.6). The estimator is said to be ensemble minimax with respect to if
(2.2) |
The motivation for the above definitions comes from the use of the empirical Bayes method in simultaneous inference.
Efron and Morris (1972b), derived the James-Stein estimator through the parametric empirical Bayes model with . Note that in such an empirical Bayes model, is the unknown non-random parameter. Given the family , the Bayes risk is a function of as follows,(2.3) |
Hence, with , the estimator is said to be ensemble minimax with respect to if
(2.4) |
which may be seen as the counterpart of ordinary minimaxity in the empirical Bayes model.
Clearly the usual estimator has constant risk, has constant Bayes risk and hence is ensemble minimax. Then the ensemble minimaxity of follows if
Remark 2.1.
Note that ensemble minimaxity can also be interpreted as a particular case of Gamma minimaxity studied in the context of robust Bayes analysis by Good (1952); Berger (1979). However, in such studies, a “large” set consisting of many diffuse priors are usually included in the analysis. Since this is quite different from our formulation of the problem, we use the term ensemble minimaxity throughout our paper, following the Efron and Morris papers cited above.
A class of shrinkage estimators which we consider in this paper, is given by
(2.5) |
where with
Berger and Srinivasan (1978) showed, in their Corollary 2.7, that, given positive-definite and non-singular , a necessary condition for an estimator of the form
to be admissible is for some constant , which is satisfied by estimators among the class of (2.5).
A version of Baranchik’s (1964) sufficient condition for ordinary minimaxity is given in Appendix A; For given which satisfies
given by (2.5) is ordinary minimax if
(2.6) |
Berger (1976) showed that, for any given ,
which seems the right choice of . However, from the “conditioning” viewpoint of Casella (1980) which advocates more shrinkage on higher variance estimates, the descending order
(2.7) |
is desirable, whereas corresponding to the ascending order under given by (1.1). As Casella (1980) pointed out, ordinary minimaxity cannot be enjoyed together with well-conditioning given by (2.7) when
for some . In fact, when and , we have
and hence follows. The motivation of Casella (1980, 1985) seems to provide a better treatment for the case. Actually Brown (1975) pointed out essentially the same phenomenon from a slightly different viewpoint.
Ensemble minimaxity, based on ensemble risk given by (1.7), provides a way of saving shrinkage estimators with well-conditioning, estimators which are not necessarily ordinary minimax.
3 Ensemble minimaxity
3.1 A general theorem
We have the following theorem on ensemble minimaxity of with general , though we will eventually focus on with with the descending order as in (2.7).
Theorem 3.1.
Assume is non-negative, non-decreasing and concave. Also is assumed non-increasing. Then
is ensemble minimax if
(3.1) |
Proof.
Recall, for ,
Then the posterior and marginal are given by
respectively, where are mutually independent and are mutually independent. Then the Bayes risk is given by
Since the first term of the r.h.s. of the above equality is rewritten as
we have
(3.2) |
Let
Then
and and are mutually independent. With the notation, we have
and hence
Since is non-increasing and is non-decreasing, by the correlation inequality, we have
and hence
(3.3) |
In the first part of the r.h.s. of the inequality (3.3), we have
(3.4) |
where the first and second inequality follow from the correlation inequality and Jensen’s inequality, respectively. In the second part of the r.h.s. of the inequality (3.3), by the inequality
we have
(3.5) |
By (3.3), (3.4) and (3.5), we have
(3.6) |
which guarantees for all under the condition (3.1). ∎
Given , the choice with descending order , is one of the most natural choice of from Casella’s (1980) viewpoint. In this case, we have
and hence a following corollary.
Corollary 3.1.
Assume that is non-negative, non-decreasing and concave. Also is assumed non-increasing. Then
is ensemble minimax if
(3.7) |
3.2 An ensemble minimax James-Stein variant
As an example of Corollary 3.1, we consider
(3.8) |
for and , which is motivated by Stein (1956) and James and Stein (1961). Under , Stein (1956) suggested that there exist estimators dominating the usual estimator among a class of estimators with given by (3.8) for small and large . Following Stein (1956), James and Stein (1961) showed that with and is ordinary minimax. The choice is, however, not good since, by Corollary 3.1, cannot be larger than . With positive , we can see that can be much larger as follows.
Note that given by (3.8) is non-negative, increasing and concave and that is decreasing. Then the sufficient condition in (3.7) is
which is equivalent to
or
Hence we have a following result.
Theorem 3.2.
-
When
(3.9) the shrinkage estimator
is ensemble minimax.
-
It is ordinary minimax if
3.3 A generalized Bayes ensemble minimax estimator
In this subsection, we provide a generalized Bayes ensemble minimax estimator. Following Strawderman (1971), Berger (1976) and Maruyama and Strawderman (2005), we consider the generalized harmonic prior
(3.11) |
where satisfies . Note that for , the density of is exactly , since and
The prior is called the harmonic prior and was originally investigated by Baranchik (1964) and Stein (1974). Berger (1980) and Berger and Strawderman (1996) recommended the use of the prior (3.11) mainly because it is on the boundary of admissibility.
By the way of Strawderman (1971), the generalized Bayes estimator with respect to the prior is given by
(3.12) |
with
where satisfies the following properties
-
[label= H0]
-
is increasing in .
-
is concave.
-
.
-
is decreasing in .
-
The derivative of at is .
Under the choice and with the condition of Corollary 3.1, we have a following result.
Theorem 3.3.
-
The estimator is ensemble minimax.
-
The estimator is ordinary minimax when
-
The estimator is conventional admissible.