 # A Gaussian sequence approach for proving minimaxity: A Review

This paper reviews minimax best equivariant estimation in these invariant estimation problems: a location parameter, a scale parameter and a (Wishart) covariance matrix. We briefly review development of the best equivariant estimator as a generalized Bayes estimator relative to right invariant Haar measure in each case. Then we prove minimaxity of the best equivariant procedure by giving a least favorable prior sequence based on non-truncated Gaussian distributions. The results in this paper are all known, but we bring a fresh and somewhat unified approach by using, in contrast to most proofs in the literature, a smooth sequence of non truncated priors. This approach leads to some simplifications in the minimaxity proofs.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We review some results on minimaxity of best equivariant estimators from what we hope is a fresh and somewhat unified perspective. Our basic approach is to start with a general equivariant estimator, and demonstrate that the best equivariant estimator is a generalized Bayes estimator, , with respect to an invariant prior. We then choose an appropriate sequence of Gaussian priors whose support is the entirety of the parameter space and show that the Bayes risks converge to the constant risk of . This implies that is minimax. All results on best equivariance and minimaxity, which we consider in this paper, are known in the literature. But, using a sequence of Gaussian priors as a least favorable sequence, simplifies the proofs and gives fresh and unified perspective.

In this paper, we consider the following three estimation problems.

Estimation of a location parameter:

Let the density function of be given by

 f(\bmx−μ)=f(x1−μ,…,xn−μ). (1.1)

Consider estimation of the location parameter under location invariant loss

 L(δ−μ). (1.2)

We study equivariant estimators under the location group, given by

 δ(\bmx−μ)=δ(\bmx)−μ. (1.3)
Estimation of a scale parameter:

Let the density function of be given by

 σ−nf(\bmx/σ), (1.4)

with scale parameter , where . Consider estimation of the scale under scale invariant loss

 L(δ/σ). (1.5)

We study equivariant estimators under scale group, given by

 δ(\bmx/σ)=δ(\bmx)/σ. (1.6)
Estimation of covariance matrix:

We study estimation of based on a random matrix having a Wishart distribution , where the density is given in (2.3) below. An estimator is evaluated by the invariant loss

 L(\bmSi−1\bmde). (1.7)

We consider equivariant estimators under the lower triangular group, given by

 \bmde(\bmA\bmV\bmAT)=\bmA\bmde(\bmV)\bmAT, (1.8)

where , the set of lower triangular matrices with positive diagonal entries.

For the first two cases with the squared error loss and the entropy loss , respectively, the so called Pitman (1939) estimators

 ^μ0(\bmx) =∫∞−∞μf(\bmx−μ)\rdμ∫∞−∞f(\bmx−μ)\rdμ, (1.9) ^σ0(\bmx) =∫∞0σ−n−1f(\bmx/σ)\rdσ∫∞0σ−n−2f(\bmx/σ)\rdσ (1.10)

are well-known to be best equivariant and minimax. Clearly, they are generalized Bayes with respect to and , respectively. Girshick and Savage (1951) gave the original proof of minimaxity. Kubokawa (2004)

also gives a proof and further developments in the restricted parameter setting. Both use a sequence of uniform distribution on expanding interval as least favorable priors.

For the last case, James and Stein (1961) show that the best equivariant estimator is given by

 (1.11)

where is from the Cholesky decomposition of and for . Note that the group of lower triangular matrices with positive diagonal entries is solvable, and the result of Kiefer (1957) implies the minimaxity of . Tsukuma and Kubokawa (2015) gives as a sequence of least favorable priors, the invariant prior truncated on a sequence of expanding sets.

In each case, the sequence of priors we employ is based on a Gaussian sequence of possibly transformed parameters. This is in contrast to most proofs in the literature which use truncated versions of the invariant prior. As a consequence, the resulting proofs are less complicated.

Section 2 is devoted to developing the best equivariant estimator as a generalized Bayes estimator with respect to a right invariant (Haar measure) prior in each case. The general approach is basically that of Hora and Buehler (1966). Section 3 provides minimaxity proofs of the best equivariant procedure by giving a least favorable prior sequence based on (possibly transformed) Gaussian priors in each cases. We give some concluding remarks in Section 4.

## 2 Establishing best equivariant procedures

All results in this section are well-known. Our proof of best equivariance for , and follow from Hora and Buehler (1966). The reader is referred to Hora and Buehler’s (1966) for further details on their general development of a best equivariant estimator as the generalized Bayes estimator relative to right invariant Haar measure.

### 2.1 Estimation of location parameter

Consider an equivariant estimator which satisfies . Then we have a following result.

###### Theorem 2.1.

Let have distribution (1.1) and let the loss be given by (1.2). The generalized Bayes estimator with respect to the invariant prior , , is best equivariant under the location group, that is,

 ^μ0(\bmx)=\argminδ∫∞−∞L(δ(\bmx)−μ)f(\bmx−μ)\rdμ.
###### Proof.

The risk of the equivariant estimator (1.3) is written as

 R(δ(\bmx),μ) =∫RnL(δ(\bmx)−μ)f(\bmx−μ)\rd\bmx =∫RnL(δ(\bmx−μ))f(\bmx−μ)\rd\bmx =∫RnL(δ(\bmz))f(\bmz)\rd\bmz (2.1) =∫Rn−1∫∞−∞L(δ(\bmzn−1,zn))f(\bmzn−1,zn)\rdzn\rd\bmzn−1 =∫Rn−1∫∞−∞L(δ(\bmzn−1,un−θ))f(\bmzn−1,un−θ)\rdθ\rd\bmzn−1 zn=un−θ (un is a constant and θ is variable) =∫Rn−1∫∞−∞L(δ(\bmun−1−θ,un−θ))f(\bmun−1−θ,un−θ)\rdθ\rd\bmun−1 =∫Rn−1(∫∞−∞L(δ(\bmu)−θ)f(\bmu−θ)\rdθ)\rd\bmun−1.

Then the best equivariant estimator is

 ^μ0(\bmx)=\argminδ∫∞−∞L(δ(\bmx)−μ)f(\bmx−μ)\rdμ.

### 2.2 Estimation of scale

Consider an equivariant estimator which satisfies . Then we have a following result.

###### Theorem 2.2.

Let have distribution (1.4) and let the loss be given by (1.5). Then the generalized Bayes estimator, with respect to the prior , , is best equivariant under the scale group, that is,

 ^σ0(\bmx)=\argminδ∫∞0L(δ/σ)f(\bmx/σ)σn\rdσσ.
###### Proof.

The risk of the equivariant estimator is written as

 R(δ(\bmx),σ) =∫RnL(δ(\bmx)/σ)σ−nf(\bmx/σ)\rd\bmx =∫RnL(δ(\bmx/σ))σ−nf(\bmx/σ)\rd\bmx =∫RnL(δ(\bmz))f(\bmz)\rd\bmz (2.2) =∫Rn−1(∫0−∞+∫∞0)L(δ(\bmzn−1,zn))f(\bmzn−1,zn)\rdzn\rd\bmzn−1 =∫Rn−1∑j={−1,1}∫∞0L(δ(\bmzn−1,jzn))f(\bmzn−1,jzn)\rdzn\rd\bmzn−1 =∫Rn−1∑j={−1,1}∫∞0L(δ(\bmzn−1,jun/w))f(\bmzn−1,jun/w)unw2\rdw\rd\bmzn−1 zn=un/w (where un is positive % constant and w is variable) =∫Rn−1∑j={−1,1}∫∞0L(δ(\bmun−1/w,jun/w))1wn−1unw2f(\bmun−1/w,jun/w)\rd\bmun−1\rdw zi=ui/w (i=1,…,n−1)(where ui is variable and w is constant) =∫Rn−1un∑j={−1,1}{∫∞0L(δ(\bmun−1,jun)/w)f(\bmun−1/w,jun/w)wn+1\rdw}\rd\bmun−1.

Then the best equivariant estimator is

 ^σ0(\bmx)=\argminδ∫∞0L(δ/σ)σ−n−1f(\bmx/σ)\rdσ.

### 2.3 Estimation of covariance matrix

Let have a Wishart distribution . Let be the set of lower triangular matrices with positive diagonal entries. By the Cholesky decomposition, and can be written as

 \bmSi−1=\bmThT\bmTh and \bmV=\bmT\bmTT

for and . As in Theorem 7.2.1 of Anderson (2003)

, the probability density function of

is

 fW(\bmT|\bmTh)γ(\rd\bmT)=1C(p,n)∣∣\bmTh\bmT∣∣nexp[−12tr{(\bmTh\bmT)(\bmTh\bmT)T}]γ(\rd\bmT) (2.3)

where is a normalizing constant given by

 C(p,n)=2p(n−2)/2πp(p−1)/4p∏i=1Γ({n+1−i}/2) (2.4)

and is the left-invariant Haar measure on given by

 γ(\rd\bmT)=p∏i=1t−iii\rd\bmT. (2.5)

An estimator

is evaluated by the invariant loss function given by

 L(\bmTh\bmde\bmThT). (2.6)

Denote the risk function by

 R(\bmde,\bmSi)=∫T+L(\bmTh\bmde\bmThT)fW(\bmT|\bmTh)γ(\rd\bmT).

For all , the group transformation with respect to on a random matrix and a parameter matrix is defined by . The group operating on is transitive. Any equivariant estimator of

 \bmSi=(\bmThT\bmTh)−1=\bmTh−1(\bmTh−1)T

under the lower triangular group is of form given by

 \bmde(\bmA\bmT)=\bmA\bmde(\bmT)\bmAT. (2.7)
###### Theorem 2.3.

Let and let the loss be as in (2.6). Then the generalized Bayes estimator with respect to the prior

 π(\bmTh)=γ(\rd\bmTh), (2.8)

, is best equivariant under lower triangular group, that is,

 \bmde0(\bmT)=\argmin\bmde∫T+L(\bmTh\bmde\bmThT)fW(\bmT|\bmTh)γ(\rd\bmTh). (2.9)

Note that is the “left” invariant measure, which seems to contradict the general theory by Hora and Buehler (1966). However this seeming anomaly is due to our parameterization , and

 \bmSi=\bmTh−1(\bmTh−1)T. (2.10)

The general theory implies that

 ν(\rd\bmTh−1)=γ(\rd\bmTh) (2.11)

where is right invariant Haar measure on given by

 ν(\rd\bmZ)=p∏i=1z−(p−i+1)ii\rd\bmZ. (2.12)

In the proof below, in addition to the left invariance of , and the right invariance of , we use the fact that

 fW(\bmT|\bmTh)=fW(\bmTh\bmT|\bmI)=fW(\bmI|\bmTh\bmT). (2.13)
###### Proof of Theorem 2.3.

By (2.3) and (2.6), the risk of an equivariant estimator can be expressed as

 R(\bmde,\bmSi) =∫T+L(\bmTh\bmde(\bmT)\bmThT)fW(\bmT|\bmTh)γ(\rd\bmT) =∫T+L(\bmde(\bmTh\bmT))fW(\bmT|\bmTh)γ(\rd\bmT) =∫T+L(\bmde(\bmZ))fW(\bmZ|\bmI)γ(\rd\bmZ)(\bmZ=\bmTh\bmT, and left invariance of γ) =∫T+L(\bmde(\bmZ))fW(\bmI|\bmZ)p∏i=1z−iii\rd\bmZ (by the form of fW) =∫T+L(\bmde(\bmZ))fW(\bmI|\bmZ)p∏i=1zp−2i+1iiν(\rd\bmZ) =∫T+L(\bmde(\bmW\bmS))fW(\bmS|\bmW)p∏i=1(wiisii)p−2i+1ν(\rd\bmW) (\bmZ=\bmW\bmS, and right invariance of ν) =p∏i=1sp−2i+1ii∫T+L(\bmW\bmde(\bmS)\bmWT)fW(\bmS|\bmW)γ(\rd\bmW),(by (???) and the form of γ(\rdw))

Then the best equivariant estimator with respect to the group can be written by

 \bmde0(\bmT)=\argmin\bmde∫T+L(\bmTh\bmde\bmThT)fW(\bmT|\bmTh)γ(\rd\bmTh).

## 3 Minimaxity

In this section, we choose an appropriate sequence of priors whose support is the entirety of the parameter space and show that the Bayes risks converge to the constant risk of the best equivariant estimator . By a well-known standard result (see e.g. Lehmann and Casella (1998)), this implies minimaxity of . In order to deal with explicit expressions for minimax estimators as well as for somewhat technical reasons, in this section, we specify the loss functions to be standard choices in the literature. For the location and scale problem, the squared error loss and the entropy loss

 L(δ−μ)=(δ−μ)2,L(δ/σ)=δ/σ−log(δ/σ)−1

are used respectively. For estimation of covariance matrix, the so called Stein’s (1956) loss function given by

 L(\bmTh\bmde\bmThT)=tr\bmSi−1\bmde−log|\bmSi−1\bmde|−p=tr(\bmTh\bmde\bmThT)−log|\bmTh\bmde\bmThT|−p (3.1)

is used.

### 3.1 Estimation of location

In this section, we show the minimaxity of , the best location equivariant estimator under squared error loss. A point of departure from most proofs in the literature is that a smooth sequence of Gaussian densities simplifies the proof. It is also easily applied in the multivariate location family (See Remark 3.1).

Recall that the Bayes estimator corresponding to a (generalized) prior , under squared error loss, is given by

 δπ(\bmx) =\argminδ∫∞−∞L(δ−μ)f(\bmx−μ)π(μ)\rdμ (3.2) =∫μf(\bmx−μ)π(μ)\rdμ∫f(\bmx−μ)π(μ)\rdμunder L(t)=t2. (3.3)

Hence, by Theorem 2.1, the best equivariant estimator is given by

 ^μ0(\bmx)=∫μf(\bmx−μ)\rdμ∫f(\bmx−μ)\rdμ. (3.4)
###### Theorem 3.1.

Let have distribution (1.1) and let the loss be given by . Then the best equivariant estimator, , given by (3.4), is minimax, and the minimax constant risk is given by

 R0=∫L(^μ0(\bmx))f(\bmx)\rd\bmx=∫{^μ0(\bmx)}2f(\bmx)\rd\bmx.

Under the squared error loss, the Bayes estimator is explicitly written as (3.3), However, in the following proof, the implicit expression (3.2) is mainly used to indicate possible extension for more general loss functions. For the same reason, instead of is used.

###### Proof of Theorem 3.1.

Let

 ϕ(μ)=1√2πexp(−μ2/2) and ϕk(μ)=1kϕ(μ/k).

The Bayes risk of under the prior is given by

 rk(ϕk,δ(\bmx))=∬L(δ(\bmx)−μ)f(\bmx−μ)ϕk(μ)\rdμ\rd\bmx.

Also the corresponding Bayes estimator is given by

 δϕk(\bmx)=\argminδ∫∞−∞L(δ−μ)f(\bmx−μ)ϕk(μ)\rdμ.

Clearly

 rk(ϕk,δϕk)≤rk(ϕk,^μ0)=R0,

and therefore, to show , it suffices to prove

 liminfk→∞rk(ϕk,δϕk)≥R0.

Making the transformation yields

 rk(ϕk,δϕk)=∬L(δϕk(\bmz+μ)−μ)f(\bmz)ϕk(μ)\rdμ\rd\bmz

where

 δϕk(\bmz+μ)=\argminδ∫∞−∞L(δ−θ)f(\bmz+μ−θ)ϕk(θ)\rdθ.

Now, make the transformation . We then have

 δϕk(\bmz+μ)=\argminδ∫∞−∞L(δ−μ−t)f(\bmz−t)ϕk(t+μ)\rdt

or equivalently

 δ∗k(\bmz,μ):=δϕk(\bmz+μ)−μ=\argminδ∫∞−∞L(δ−t)f(\bmz−t)ϕk(μ+t)\rdt.

Hence, by change of variables, we have

 rk(ϕk,δϕk) =∬L(δ∗k(\bmz,μ))f(\bmz)ϕk(μ)\rdμ\rd\bmz =∬L(δ∗k(\bmz,kμ))f(\bmz)ϕ(μ)\rdμ\rd\bmz.

Note also and

 δ∗k(\bmz,kμ) =\argminδ∫∞−∞L(δ−t)f(\bmz−t)kϕk(kμ+t)\rdt =∫∞−∞tf(\bmz−t)ϕk(t/k+μ)\rdt∫∞−∞f(\bmz−t)ϕk(t/k+μ)\rdt (for squared % error loss L(t)=t2).

Since for any , the dominated convergence theorem implies

 limk→∞δ∗k(\bmz,kμ)=^μ0(\bmz) (3.5)

and hence

 limk→∞L(δ∗k(\bmz,kμ))=limk→∞{δ∗k(\bmz,kμ)}2={^μ0(\bmz)}2=L(^μ0(\bmz)). (3.6)

Hence by Fatou’s lemma, we obtain that

 liminfk→∞rk(ϕk,δϕk)=liminfk→∞∬L(δ∗k(\bmz,kμ))f(\bmz)ϕ(μ)\rdμ\rd\bmz≥∬liminfk→∞L(δ∗k(\bmz,kμ))f(\bmz)ϕ(μ)\rdμ\rd\bmz=∬L(^μ0(\bmz))f(\bmz)ϕ(μ)\rdμ\rd\bmz=R0. (3.7)

###### Remark 3.1.

In the multivariate case, suppose and

 {\bmx1,…,\bmxp}∼f(\bmx1−μ1,…,\bmxp−μp).

Let . Then the Pitman estimator of , the generalized Bayes estimator with respect to , is

 ^\bmmu(\bmx1,…,\bmxp)=∫Rp\bmmuf(\bmx1−μ1,…,\bmxp−μp)\rd\bmmu∫Rpf(\bmx1−μ1,…,\bmxp−μp)\rd\bmmu. (3.8)

Using

 πk(\bmmu)=p∏i=1ϕk(μi)=1(2πk2)p/2exp(−∥\bmmu∥22k2)

as the least favorable sequence of priors gives minimaxity under the quadratic loss of (3.8).

### 3.2 Estimation of scale

In this section, we show the minimaxity of the scale Pitman estimator under entropy loss given by

 L(δ/σ)=δ/σ−log(δ/σ)−1. (3.9)

Recall that the Bayes estimator corresponding to a (generalized) prior , under entropy loss (3.9), is given by

 δπ(\bmx) =\argminδ∫∞−∞L(δ/σ)σ−nf(\bmx/σ)π(σ)\rdσ (3.10) =∫σ−nf(\bmx/σ)π(σ)\rdσ∫σ−n−1f(\bmx/σ)π(σ)\rdσ. (3.11)

Hence the generalized Bayes estimator under , which is best equivariant as shown in Theorem 2.2, is given by

 ^σ0(\bmx)=∫σ−n−1f(\bmx/σ)\rdσ∫σ−n−2f(\bmx/σ)\rdσ. (3.12)

We have a following minimaxity result.

###### Theorem 3.2.

Let have distribution (1.4) and let the loss be given by . Then the best equivariant estimator, , given by (3.12), is minimax, and the minimax constant risk is given by

 R0=∫L(^σ0(\bmx))f(\bmx)\rd\bmx=∫{^σ0(\bmx)−log^σ0(\bmx)−1}f(\bmx)\rd\bmx.
###### Proof.

Assume or equivalently

 πk(σ)=1kϕ(logσ/k)1σ,

where is the pdf of . Then the Bayes estimator satisfies

 δπk=δπk(\bmx)=\argminδ∫∞0L(δ/σ)σ−nf(\bmx/σ)ϕk(σ)\rdσ

and the Bayes risk is given by

 rk(πk,δπk)=∬L(δ/σ)σ−nf(\bmx/σ)πk(σ)\rdσ\rd\bmx.

Clearly

 rk(πk,δπk)≤rk(πk,^σ0(\bmx))=R0,

and therefore, to show , it suffices to prove

 liminfk→∞rk(πk,δπk)≥R0.

Making the transformation yields

 rk(πk,δπk)=∬L(δπk(σ\bml)/σ)f(\bml)πk(σ)\rdσ\rd\bml

where

 δπk(σ\bml)=\argminδ∫∞0L(δ/z)z−nf(σ\bml/z)πk(z)\rdz.

Now, make the transformation . We then have

 δπk(σ\bml)=\argminδ∫∞0L(δ/(yσ))y−nf(\bml/y)πk(σy)\rdy

or equivalently

 δ∗k(\bml,σ):=δπk(σ\bml)σ=\argminδ∫∞0L(δ/y)y−nf(\bml/y)πk(σy)\rdy.

Hence

 rk(πk,δπk) =∬L(δ∗k(\bml,σ))f(\bml)πk(σ)\rdσ\rd\bml =∬L(δ∗k(\bml,ηk))f(\bml)π1(η)\rdη\rd\bml

where and is explicitly given as (when the loss is (3.9))

 δ∗k(\bml,ηk)=∫y−nf(\bml/y)πk(ηky)\rdy∫y−n−1f(\bml/y)πk(ηky)\rdy. (3.13)

Note

 kπk(ηky)=1ηky1√2πexp(−(logηk+logy)22k2).

Since

 limk→∞kηkπk(ηky)=1yϕ(logη)

for any , the dominated convergence theorem implies

 limk→∞δ∗k(\bml,ηk)=^σ0(\bml). (3.14)

Also the continuity of implies

 limk→∞L(δ∗k(\bml,ηk))=L(^σ0(\bml)). (3.15)

Hence by Fatou’s lemma, we obtain that

 liminfk→∞rk(πk,δπk)=liminfk→∞∬L(δ∗k(\bml,ηk))f(\bml)π1(η)\rdη\rd\bml≥∬liminfk→∞L(δ∗k(\bml,ηk))f(\bml)π1(η)\rdη\rd\bml≥∬L(^σ0(\bml))f(\bml)π1(η)\rdη\rd\bml=R0. (3.16)

###### Remark 3.2.

In the same way, we can consider the estimation of with and propose the corresponding result,

 ^σ0c(\bmx)=∫σ−n−1+cf(\bmx/σ)\rdσ∫σ−n−2+cf(\bmx/σ)\rdσ

is minimax and best equivariant for estimating under entropy loss

 L(δ/σc)=δ/σc−log(δ/σc)−1.

### 3.3 Estimation of covariance matrix

As we mentioned in the beginning of this section, we use the so called Stein’s (1956) loss function given by

 L(\bmTh\bmde\bmThT)=tr\bmSi−1\bmde−log|\bmSi−1\bmde|−p=tr(\bmTh\bmde\bmThT)−log|\bmTh\bmde\bmThT|−p. (3.17)

James and Stein (1961), in their Section 5, show that the best equivariant estimator is given by

 (3.18)

where is from the Cholesky decomposition of and