# Robust and Efficient Empirical Bayes Confidence Intervals using γ-Divergence

Although parametric empirical Bayes confidence intervals of multiple normal means are fundamental tools for compound decision problems, their performance can be sensitive to the misspecification of the parametric prior distribution (typically normal distribution), especially when some strong signals are included. We suggest a simple modification of the standard confidence intervals such that the proposed interval is robust against misspecification of the prior distribution. Our main idea is using well-known Tweedie's formula with robust likelihood based on γ-divergence. An advantage of the new interval is that the interval lengths are always smaller than or equal to those of the parametric empirical Bayes confidence interval so that the new interval is efficient and robust. We prove asymptotic validity that the coverage probability of the proposed confidence intervals attain a nominal level even when the true underlying distribution of signals is contaminated, and the coverage accuracy is less sensitive to the contamination ratio. The numerical performance of the proposed method is demonstrated through simulation experiments and a real data application.

## Authors

• 8 publications
• 6 publications
• 26 publications
04/07/2020

### Robust Empirical Bayes Confidence Intervals

We construct robust empirical Bayes confidence intervals (EBCIs) in a no...
02/07/2019

### Bias-Aware Confidence Intervals for Empirical Bayes Analysis

In an empirical Bayes analysis, we use data from repeated sampling to im...
12/08/2019

### Improved Multiple Confidence Intervals via Thresholding Informed by Prior Information

Consider a statistical problem where a set of parameters are of interest...
11/16/2021

### Bayesian, frequentist and fiducial intervals for the difference between two binomial proportions

Estimating the difference between two binomial proportions will be inves...
05/10/2019

### Confidence intervals with maximal average power

In this paper, we propose a frequentist testing procedure that maintains...
11/15/2019

### Assessing the uncertainty in statistical evidence with the possibility of model misspecification using a non-parametric bootstrap

Empirical evidence, e.g. observed likelihood ratio, is an estimator of t...
06/17/2018

### Nonparametric Empirical Bayes Simultaneous Estimation for Multiple Variances

The shrinkage estimation has proven to be very useful when dealing with ...

## Code Repositories

### GDCI

Robust empirical Bayes confidence intervals using gamma-divergence

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Empirical Bayes techniques to increase estimation accuracy by “borrowing strength” of information each other have been widely adopted in many scientific fields such as small area estimation (Rao and Molina, 2015), meta-analysis (DerSimonian and Laird, 2015), disease risk estimate (Lawson, 2018) and gene expression analysis (Efron et al., 2001)

. Although there are now many available techniques of empirical Bayes estimation, effective ways for uncertainty quantification of the empirical Bayes estimates are not fully discussed, especially when the assumed parametric prior (latent) distribution is subject to misspecification. In particular, when representative outlying subjects (so-called “hotspot”) exist whose signals are extremely larger than the other subjects, the standard normality assumption for the prior distribution is severely misspecified. In this case, the standard parametric confidence interval is highly affected even when the proportion of such outliers is small and fails to provide an efficient inference result.

To overcome this issue, we employ Tweedie’s formula (e.g. Efron, 2011)

and robust divergence techniques to produce a novel robust empirical Bayes confidence interval when the underlying distribution is contaminated. Our basic consideration is that the form of marginal likelihood fully characterizes the standard parametric empirical confidence intervals; both posterior mean and variance are obtained from the derivatives of the marginal likelihood through Tweedie’s formula, and the model parameters are estimated by maximizing the marginal likelihood. This observation gives us a natural way to propose a robust confidence interval by replacing the marginal likelihood with robust alternatives. Specifically, we here employ

-divergence (Jones et al., 2001; Fujisawa and Eguchi, 2008) which is known to hold strong robustness against outliers and produces a closed-form expression of the objective function under normality. The -divergence has a tuning parameter that controls its robustness. We propose a simple selection method by minimizing the average of posterior variances, ensuring that the proposed interval produces shorter intervals than the standard parametric intervals. For the theoretical justification, we first prove that the selection method for is consistent in the sense that the probability of , corresponding to the standard parametric confidence interval, is asymptotically one when the underlying distribution is not contaminated. We also prove that the strictly positive value of is selected asymptotically when the underlying distribution is contaminated, indicating an adaptive property of the proposed interval. Further, we show that the coverage probability of the proposed interval is approximately equal to the nominal level even when the true underlying distribution is subject to contamination. The proposed method will also be numerically shown to provide reasonable confidence intervals whose empirical coverage is close to the nominal level under a wide range of scenarios for the underlying distribution of signals.

We here review some relevant works. Empirical Bayes confidence intervals that are robust against misspecification of the prior (random effect) distribution have been studied since Kiefer and Wolfowitz (1956). Koenker and Mizera (2014) and Efron et al. (2019) proposed semi-parametric techniques to estimate underlying prior distribution. More recently, Armstrong et al. (2020)

developed a robust empirical Bayes confidence interval using only higher moment information. In the context of small area estimation, there exist many works concerned with accurate and efficient empirical Bayes confidence intervals and comprehensive asymptotic theory

(e.g. Chatterjee et al., 2008; Hall and Maiti, 2006; Yoshimori and Lahiri, 2014). However, all the results are based on the assumption that the parametric prior distribution is correctly specified; thereby, our framework in which the true prior distribution is subject to contamination is completely different from the conventional one. Hence, we need to develop a new theory for the proposed method. Tweedie’s formula is known to be a useful tool to characterize posterior moments (e.g. De Luca et al., 2021), and robust empirical Bayes techniques based on Tweedie’s formula has been considered in Gu and Koenker (2017), Liu et al. (2020), and Sugasawa (2020). Nevertheless, there have been no attempt to construct a robust empirical Bayes confidence interval using Tweedie’s formula.

This paper is organized as follows. In Section 2, we introduce new confidence intervals through Tweedie’s formula and -divergence. In Section 3, we provide asymptotic theory regarding the selection performance of the tuning parameter and coverage probability when the underlying distribution is contaminated. The numerical performance of the proposed methods is demonstrated through simulation studies and data analysis in Section 4 and 5, respectively. Conclusions and discussions are given in Section 6. All the technical proofs are given in the Appendix. The R code implementing the proposed method is available at GitHub repository (https://github.com/sshonosuke/GDCI).

## 2 Robust and efficient empirical Bayes confidence intervals

### 2.1 Two-stage normal models and the standard parametric confidence intervals

Suppose we observe independent normally distributed estimates

 yi|θi,σi∼N(θi,σ2i),    i=1,…,n, (1)

of the parameter vector

. Although is usually the estimated asymptotic variance of the estimate , we condition in what follows, following the existing works (e.g. Armstrong et al., 2020; Efron, 2016). Further, we often suppress the explicit dependency on if there is no confusion. A standard approach to improve the raw estimator is to assume that ’s are independent realizations from a normal population, namely,

 θi∼N(μ,τ2),    i=1,…,n, (2)

where and are unknown location and variance parameters, respectively. This setting results in a well-known two-stage normal model for .

Under the two-stage model, the posterior (conditional) distribution of given and the true parameters is , where

 ˜θi(μ,τ2)=yi−σ2iτ2+σ2i(yi−μ),      ˜s2i(μ,τ2)=τ2σ2iτ2+σ2i. (3)

Then, the empirical Bayes confidence interval for with nominal level is given by

 ˆCi,1−α=(ˆθi−zα/2ˆsi,ˆθi+zα/2ˆsi), (4)

where and , and is the upper

-quantile of the standard normal distribution. Here

and are estimators of and based on the observations ’s. The most standard approach to estimating these unknown parameters is to use the marginal likelihood of , given by

 L(μ,τ2)=−12n∑i=1log{2π(τ2+σ2i)}−12n∑i=1(yi−μ)2τ2+σ2i. (5)

The maximum marginal likelihood estimators of and are defined as the maximizer of (5). When the assumed normal prior (2) is approximately correct, the standard interval (4) provides an efficient interval estimation whose interval length is shorter than that of the crude confidence interval

 ˆCci,1−α=(yi−zα/2σi,yi+zα/2σi) (6)

based on the first stage model (1) (e.g. Morris, 1983).

However, when the true distribution for is contaminated, that is, there exist some representative outliers that cannot be captured by the assumed normal distribution (2), the standard parametric interval (4) is no more reasonable. We first demonstrate that the standard interval (4) is approaching to the crude interval (6) when the outlier location gets more extreme. To simplify the situation, we consider a situation that the absolute value of the first observation goes to infinity, namely, , while other observed values are fixed. Such framework is widely adopted in signal estimation (e.g. Carvalho et al., 2010) to investigate the robustness of shrinkage estimation. Under the framework, we prove the following property:

###### Proposition 1.

Assume that is fixed and let and be the maximum likelihood estimator based on the marginal likelihood (5). Then as , it holds that and for all , so the standard confidence interval (4) approaches to the crude confidence interval (6).

Proposition 1 implies that if there are some outliers in the observations, every standard empirical Bayes confidence intervals are affected by the outliers, and the intervals are close to the crude confidence intervals for all . This means that the resulting confidence intervals are inefficient by failing “borrowing strength” from the information of other subjects. This leads to poor performance of the confidence intervals infinite sample, as we can see in our simulation study. Our aim is to establish a new confidence interval that can successfully borrow information of other information among non-outlying subjects to produce an efficient interval.

### 2.2 New robust confidence intervals

We first note that the standard empirical Bayes confidence interval (4) is characterized by the posterior mean and variance of , and Tweedie’s formula (Efron, 2011) admit the following expressions based on the marginal distribution:

 ˜θi=yi+σ2iddyilogm(yi),      ˜s2i=σ2i+σ4id2dy2ilogm(yi), (7)

where is the marginal density of . Moreover, the estimator and is obtained through the marginal distribution of , which indicates that the marginal likelihood of completely characterizes the parametric interval (4).

When the distribution of is misspecified, the marginal distribution of is also misspecified, and this would be the main reason for the undesirable property of the standard interval (4) as demonstrated in Proposition 1. To overcome the problem, our proposal is very simple; we replace the marginal log-likelihood with a robust alternative against outliers. Specifically, we use an objective function based on -divergence (Jones et al., 2001; Fujisawa and Eguchi, 2008) given by

 Dγ(yi;ψ)≡1γϕ(yi;μ,τ2+σ2i)γ{∫ϕ(t;μ,τ2+σ2i)1+γdt}−γ/(1+γ)=1γϕ(yi;μ,τ2+σ2i)γcγ(τ2), (8)

where and is the density function of the normal distribution and is a tuning parameter related to the robustness for which a selection procedure will be discussed later. Note that , that is, the objective function is very similar the marginal log-likelihood under small . Hence, the objective function (8) is regarded as a natural generalization of the marginal log-likelihood.

Our main idea is to replace with in the Tweedie’s formula (7) to obtain robust posterior mean and variance. A straightforward calculation shows that

 ˜θ(γ)i≡yi+σ2iddyiDγ(yi;ψ)=yi−wγ(yi)σ2iτ2+σ2i(yi−μ),˜s2(γ)i≡σ2i+σ4id2dy2iDγ(yi;ψ)=σ2i+wγ(yi)σ4i(τ2+σ2i)2{γ(yi−μ)2−(τ2+σ2i)}, (9)

where . It can be easily checked that and under , so that the expression in (7) can be seen as robust generalization of (3). For the unknown parameters , we define the robust estimator as

 ˆψ(γ)=argmaxψn∑i=1Dγ(yi;ψ), (10)

where is given in (8). Note that reduces to the maximum likelihood estimator as the maximizer of (5) as . Since the above objective function is an analytic form of , it can be easily optimized, for example, by the Newton-Raphson algorithm. Then, we proposed a robust empirical Bayes confidence interval as

 ˆC(γ)i,1−α=(ˆθ(γ)i−zα/2ˆs(γ)i,ˆθ(γ)i+zα/2ˆs(γ)i), (11)

here and are obtained by replacing with in and , respectively. A notable property of the new interval (11) is that not only but depends on the observed value , which results in the following quite different property from one given in Proposition 1.

###### Proposition 2.

Let . Assume that is fixed. As , it holds that and while and for , where and are defined by replacing with , which is the maximizer of .

Proposition 2 implies two adaptive properties of the proposed confidence interval. For non-outlying signals, the proposed intervals produce efficient intervals by borrowing the strength of information from other non-outlying signals. In other word, the proposed interval can automatically identify the outlying subjects and successfully eliminate such data to provide a stable interval for non-outlying observations. The proposed interval reduces to the crude interval for outlying signals, which makes sense because there would be no information to borrow strength from non-outlying observations. Such property can be recognized as an extension of “tail robustness” (Carvalho et al., 2010) for the shrinkage estimation.

### 2.3 Tuning parameter selection

The choice of in (11) controls the trade-off between statistical efficiency without outliers and robustness against outliers. We here propose selecting a suitable value of to obtain intervals with minimum posterior variances. Let be a set of candidate values for , where . Then, we propose selecting minimizing the total robust variances, that is, we define the optimal as

 γop=argminγ∈Γn∑i=1aiˆs2(γ)i, (12)

where is a fixed weight, for example, . We set in our theoretical argument for simplicity, but the extension is quite straightforward.

### 2.4 Some extensions

We here discuss two important extensions of the proposed method. The first issue is that we have some auxiliary vector associated with . In this case, the prior distribution will be changed to , and all the results can be rewritten by replacing with . The asymptotic properties that will be developed in the subsequent section still hold as long as standard conditions for (e.g. Chatterjee et al., 2008; Yoshimori and Lahiri, 2014) are satisfied, and the dimension of is fixed. An example with auxiliary information is considered in Section 5.

The second issue is related to the zero-estimate of under which empirical Bayes confidence intervals cannot be obtained because of the zero-estimate of the posterior variance. To avoid such undesirable estimates, we can employ the adjustment maximum likelihood (Li and Lahiri, 2010). This can be easily implemented by adding to the objective function given in (10). Since the contribution of the adjustment term is negligible under , this modification does not affect the asymptotic properties given in the subsequent section.

## 3 Theoretical properties

In this section, we provide the asymptotic properties of the proposed method. In particular, we reveal selection performance for based on (12) and asymptotic coverage probability of the robust confidence intervals. For theoretical development, we consider two scenarios where the underlying distribution of is correctly specified or contaminated. We first discuss the former case (correctly specified case) in Section 3.1, corresponding to the justification under the conventional settings (e.g. Morris, 1983), and consider the latter case (contamination case) in Section 3.2, which is the first attempt to discuss coverage probability under contamination. Throughout this section, we assume that the first stage model (1) is correctly specified, that is, contamination does not occur in the first stage. This situation is very likely in practice because is an estimator or summary statistics of some quantities, and the normality assumption is approximately true. Throughout this section, we assume the following regularity conditions:

###### Assumption 1.

(Regularity conditions)

• is a sequence of independent and identically distributed random variables and there exist positive constants

and such that .

• is a compact parameter space of , where is the true parameter, and .

The uniform boundedness of in (C1) is widely adopted in this context (e.g. Armstrong et al., 2020; Yoshimori and Lahiri, 2014). The assumption that is a random variable in (C1) reflects the practical situation where ’s are estimated variance of ’s. The latter part in (C2) is satisfied when the lower bound of is not small. For example, if can be regarded -score of some estimand of interest, then for all and the condition is clearly satisfied.

### 3.1 Properties under correctly specified case

###### Theorem 1.

Suppose that , that is, the model (2) is correctly specified. Then, under Assumption 1, it holds that as .

Theorem 1 indicates that if the distribution of is correctly specified, namely, , then converge in probability to as

. This implies that our method is asymptotically consistent with the standard parametric empirical Bayes confidence intervals when the observations are not contaminated with some (non-Gaussian) distribution or do not include outliers.

We next consider asymptotic coverage probabilities of the proposed interval when the assumed model is correctly specified.

###### Theorem 2.

Suppose that . Then, under Assumption 1, we have

 P(θi∈ˆC(γop)i,1−α)=1−α+o(1) as n→∞.

Theorem 2 implies that if the data generating process is correctly specified, then our robust empirical Bayes confidence intervals using selected via a data-dependent manner using (12) have asymptotically correct coverage probabilities.

###### Remark 1.

Let be i.i.d. standard normal random variables, and for , let satisfy . Define by replacing with . If we can take for some , then applying almost the same argument in the proof of Theorem 2, we can also show as .

###### Remark 2.

Although we have not shown in detail, the coverage accuracy of the proposed interval is roughly , which can be improved by using, for instance, parametric bootstrap (e.g. Chatterjee et al., 2008; Hall and Maiti, 2006) to make the coverage error when is not very large.

### 3.2 Properties under contaminated case

We next consider a contaminated case. Suppose that are drawn from the density given by

 fψ∗(θi) =(1−ω)ϕ(θi;μ∗,τ2∗)+ωδ(θi).

where is a contamination ratio and is a contamination distribution. We define the following quantity:

 ρ ≡maxk∈{0,1,2}supσi∈[σL,σU]∫fδ(yi)ϕ(yi;μ∗,τ2∗+σ2i)γ|yi−μ∗|kdyi≡ρ(ψ∗,γ,δ), (13)

where

 fδ(yi)=∫ϕ(yi;θi,σ2i)δ(θi)dθi

is the marginal distribution of when . Note that (13) measures separability between the genuine distribution and contamination distribution ; is almost 0 when the two distributions are well separated, that is, the density mostly lies on the tail of the density . Let be the expectation with respect to the model that are drawn from the density , and let and denote the expectation with respect to and .

Define . Assume that and let be a root of , and let be a root of the estimating equation . As a result of Lemmas A4 and A5 in Appendix A, we can show

 ˆψ(γ)n =ψ(γ)†+Op(n−1/2)=ψ∗+O(ωρ)+Op(n−1/2).

This enables us to establish the results on the asymptotic properties of (Theorem 3) and the proposed confidence intervals (Theorem 4).

We first show the selection performance of .

###### Theorem 3.

Define

 Q(γ;ψ):=Eψ∗[wγ(yi)σ4i(τ2+σ2i)2{γ(yi−μ)2−(τ2+σ2i)}].

Assume that with , , , and . Further, suppose that Conditions (a)-(e) in Lemma A4 hold and Assumption 1 is satisfied. If there exists such that and is sufficiently small, then we have as .

From Theorem 3, the probability that a positive value is selected approaches 1 as if the true distribution of is misspecified. The condition implies that if is small, then should be close to . Practically, it is sufficient to take the number of candidates in sufficiently large. In order to make sufficiently small when is close to , the distribution should be well-separated from . Note that the condition is also related to the separability.

Next, we provide the asymptotic property of the proposed confidence intervals. In the following argument, we set to indicate that depends on .

###### Theorem 4.

Suppose that Conditions (a)-(e) in Lemma A4 hold and Assumption 1 is satisfied. Let . Assume that , as for some and . Then for , we have

 P(θi∈ˆC(γ)i,1−α)=1−α+O((ωρψ∗)1/4+γ) as n→∞.

The term in the approximation error of the confidence interval comes from the bias for estimating under the contaminated model . If is small, that is, the contamination density mostly lies on the tail of the density , then the term can be small even if is not small. In this case, the observations could be heavily contaminated. When is not small, can be small if the contamination ratio is small. The term reflects the effect of robustification using the -divergence for constructing instead of , the standard parametric empirical Bayes confidence interval. Hence, the term is likely to be positive since the posterior variance increases with (when ) and the resulting interval has wider length than the parametric interval. This means that the term is thought to work in the direction of increasing the coverage probability. Moreover, we will demonstrate that the effect of such coverage error is very limited in our simulation studies as long as the genuine distribution and contamination distribution are well separated.

## 4 Simulation study

We compare the performance of the proposed interval estimation with some existing methods through simulation studies. Throughout this study, we generate observations from with , where is the true signal. For the underlying generating process for , we adopted the following four scenarios:

 (i)  θi∼N(0,V),         (ii)  θi∼0.95N(0,V)+0.05N(7,V), (iii)  θi∼LN(0,V),       (% iv)  θi∼0.9N(0,(0.2)2)+0.1N(10V,(0.2)2),

where

denotes a log-normal distribution with log-mean

and log-variance . In each scenario, four values of are considered, , following Koenker (2020). Remark that scenario (i) corresponds to the situation where there are no outliers, and the assumed model is correctly specified to check the efficiency loss of the proposed method compared with the standard empirical Bayes confidence intervals. On the other hand, the underlying distribution of is quite different from a normal distribution in the other scenarios. In particular, signals in scenarios (iii) and (iv) produce some outlying values, which would make the standard empirical Bayes approach inefficient as the crude interval.

Based on the simulated dataset, we computed confidence intervals of using the proposed robust empirical Bayes confidence interval using -divergence (GD) and the standard parametric empirical Bayes confidence interval (EB). We also employed confidence intervals via empirical Bayes deconvolution by Efron (2016) and recently proposed robust confidence intervals by Armstrong et al. (2020), which are denoted by EF and AKM, respectively. We implemented AKM by using R packages “ebci” (Kolesar, 2021). Using 1000 simulated datasets, we computed coverage probability (CP) and average length (AL) of confidence intervals, averaged over all the subjects. The results under are shown in Table 1, and additional simulation results are provided in the Appendix. The averaged value of the selected in the proposed method is also reported in Table 1.

It is observed that the proposed GD interval provides CP around the nominal level (95%) and smaller AL than EB and AKM in almost all the scenarios, clearly showing the efficiency and robustness of GD. In scenario (i), the average value of is , so that the proposed GD method does not lose any efficiency when the assumed distribution is correct, which is consistent with our theoretical result given in Theorem 1. On the other hand, once the underlying distribution of is misspecified or contains outlying values, the optimal value of tends to be strictly positive, as proved in Theorem 3. In most scenarios, it is also observed that AKM tends to be inefficient, even more than EB. On the other hand, EF produces a very efficient confidence interval under a relatively simple underlying structure like scenario (ii), but its CP can be small under some scenarios such as (i) and (iv).

## 5 Example: crime count in Tokyo

We demonstrate the proposed methods using a dataset of the number of police-recorded crimes in the Tokyo metropolitan area, obtained from the University of Tsukuba and publicly available online (“GIS database of the number of police-recorded crime at O-aza, chome in Tokyo, 2009-2017”, available at https://commons.sk.tsukuba.ac.jp/data_en). This study focuses on the number of violent crimes in local towns in the Tokyo metropolitan area. The number of violent crimes per year is observed for nine years (from 2009 to 2017) in each town. We first excluded areas where no violent crime has occurred in the nine years, resulting in areas used in the analysis. We first computed the average number of crimes per unit km (denoted by ) and its sampling variances (denoted by ) for . The dataset contains five area-level auxiliary information, entire population density (PD), day-time population density (DPD), the density of foreign people (FD), percentage of single-person households (SH), and average year of living (AYL). Letting be a six-dimensional vector of intercept and the five covariates, we employ the following model:

 yi|θi∼N(θi,σ2i),    θi∼N(x⊤iβ,τ2),    i=1,…,n,

where is the true average number of crimes in the th area.

We first applied the proposed robust interval with -divergence (GD), where we searched the optimal value of from

 γ∈{0,0.005,0.01,…,0.195,0.200}.

The selection criterion (12) with weight for each value of is shown in the left panel in Figure 1. We can observe that the criterion has a unique minimizer and found that (bounded away from 0), indicating that the dataset contains some outliers and the normality assumption for the distribution of seems to be violated. The estimated values of the regression coefficients and the variance parameter () based on the GD method with the optimal value of are reported in Table 2, where the estimates from the empirical Bayes (EB) method are also reported for comparison. From Table 2, it can be seen that the estimates of and in EB tend to be larger than those in GD, because of the existence of outliers. In particular, the estimate of in EB is 20 times larger than that in GD, and such a large value of leads to inefficient interval estimation. For comparison, we also considered advanced approaches to robust confidence intervals. Since most existing robust methods such as Efron et al. (2019) are not directly applicable under the existence of covariates, we here employed the AMK method by Armstrong et al. (2020) using a weight in the parameter estimation. The estimated parameters of AMK are relatively similar to those of GD, except for , as shown in Table 2.

We computed confidence intervals of based on the three methods (GD, EB, and AKM) and computed its length . To see the efficiency of the interval estimation, we calculated average length, , which are reported in Table 2. The result shows that GD provides the most efficient interval among the three methods, and the order of interval length is consistent with that of the estimates of . To see more details of the obtained confidence intervals, we computed ratios of the interval of the three methods to that of the crude maximum likelihood (ML) confidence interval, , which are shown in the right panel in Figure 2. The figure shows that the three methods tend to produce more efficient confidence intervals than ML. Especially, the proposed GD method produces the most efficient intervals when the observed values are moderate, i.e., . On the other hand, the interval lengths of EB and AKM in areas with large observed values (i.e.,

) are much smaller than those of ML and GD, while the lengths of GD are almost the same as those of ML. However, this phenomenon of excessively short lengths of EB and AKM is caused by so-called “over-shrinkage” rather than the efficiency of empirical Bayes methods. To see this, we show

confidence intervals in areas having the top 30 largest (i.e., outlying areas), in the left panel of Figure 2, which reveals the undesirable properties of both EB and AKM; not only the interval length but also the interval location of EB and AKM are much smaller than the observation . On the other hand, the confidence intervals of GD are almost identical to those of ML, owing to the robustness property given in Proposition 2, while there still exist efficiency gain through “borrowing strength” from other areas in non-outlying areas, as shown in the right panel in Figure 2. Finally, we remark that the computation time of GD, including the selection step for was about 8 seconds while AKM using the R package “ebci” takes about 50 seconds on a standard laptop computer. This shows the computational efficiency of GD under large-scale situations.

## 6 Conclusions and Discussion

In this work, we developed a novel robust empirical Bayes confidence interval when the assumed prior distribution is subject to contamination. The proposed method uses Tweedie’s formula to compute posterior mean and variance where the marginal likelihood is replaced with a robust alternative based on -divergence. Under some regularity conditions, we gave an asymptotic evaluation of the coverage probability of the proposed interval when the assumed prior is correctly specified or contaminated.

Although we employed the -divergence as a robust alternative of the standard likelihood, it would be possible to adopt other robust divergences such as density power divergence Basu et al. (1998) and -divergence (Cichocki et al., 2011). The main reason to use -divergence is its strong robustness against contamination; that is, the robust estimator converges to a quantity with a small (latent) bias against the true value, which was the basis of our theoretical development. However, it would be interesting to compare several robust divergences in interval estimation, which leaves an interesting future study.

## Acknowledgement

This work is partially supported by the Japan Society for Promotion of Science (KAKENHI) grant numbers 20K13468 and 21H00699.

Appendix

## Appendix A1 Proofs for Section 2

### a1.1 Proof of Proposition 1

Note that the maximum likelihood estimators and based on the marginal likelihood (5) solve the equations

 n∑i=1yiτ2+σ2i =μn∑i=11τ2+σ2i, (A1) n∑i=1(yi−μ)2(τ2+σ2i)2 =n∑i=11τ2+σ2i. (A2)

From (A1), we have

Replacing with in (A2), we have

 n∑i=1(yi−~μ)2(τ2+σ2i)2 =y21(τ2+σ21)2⎛⎝1−(n∑i=11τ2+σ2i)−1(1τ2+σ21)⎞⎠2 +y21(τ2+σ2i)2(n∑i=21(τ2+σ2i)2)(n∑i=11τ2+σ2i)−2 −2y1τ2+σ21(n∑i=11τ2+σ2i)−1n∑i=2yi(τ2+σ2i)2+n∑i=2y2i(τ2+σ2i)2 =y21(τ2+σ21)2⎧⎨⎩1−(n∑i=11τ2+σ2i)−2n∑i=11(τ2+σ2i)2−2(n∑i=11τ2+σ2i)−11(τ2+σ21)⎫⎬⎭ +O(|y1|τ4)

as . In this case, (A2) implies

 (n∑i=11τ2+σ2i)3 =y21(τ2+σ21)2⎧⎨⎩(n∑i=11τ2+σ2i)2−n∑i=11(τ2+σ2i)2−2(n∑i=11τ2+σ2i)1τ2+σ21⎫⎬⎭ +O(|y1|τ4)(n∑i=11τ2+σ21)2.

Since the left hand side is and the right hand side is , we have . Therefore,

 ˆθi−yi =−σ2iˆτ2+σ2i(yi−ˆμ) =−σ2iˆτ2+σ2i⎧⎨⎩yi−(n∑i=11τ2+σ2i)−1(y1τ2+σ21)(1+o(1))⎫⎬⎭→0

and <