## 1 Introduction

Bayesian meta-analysis, which synthesizes the evidence from several already published studies, is an indispensable tool for evidence-based medicine. Usually, Bayesian meta-analysis is based on a Bayesian normal-normal hierarchical model (NNHM) (friede_roever_wandel_neuenschwander_OrphanDiseases_2017; friede_roever_wandel_neuenschwander_TwoStudies_2017; roever_2018; bender_friede_koch_kuss_schlattmann_schwarzer_skipka_2018). This is the simplest Bayesian hierarchical model (BHM), but presents problems typical in more complex BHMs (vehtari_gelman_simpson_carpenter_buerkner_2021).

If different values of the parameters in a likelihood map to a single data model, then these values are indeterminate, or in other words non-identified, and are not estimable from the likelihood alone (poirier_98; gustafson_15; lewbel_2019). The Bayesian approach is known to mitigate this problem, because a likelihood combined with the proper priors for the parameters that are not likelihood identifiable still yields posterior estimates (kadane_1975; gelfand_sahu_99; eberly_carlin_00; gustafson_15). This raises a legitimate question: to what extent are Bayesian posterior parameter estimates determined by the data?

In classical statistical models, non-identified parameters in the likelihood are known to cause problems (Skrondal_RabeHesketh_2004; lele_dennis_lutscher_2007; lele_2010; lele_nadeem_schmuland_2010; solymos_2010; lewbel_2019). One method that detects non-identified parameters in classical statistical models is data cloning (lele_dennis_lutscher_2007; lele_2010; lele_nadeem_schmuland_2010; solymos_2010)

. Data cloning increases the total sample size by artificially replicating the data and focuses on the standard errors of the parameters. If the standard error of a parameter does not decrease at a rate of

with increasing sample size , such a parameter is deemed to be non-identified (lele_dennis_lutscher_2007; lele_2010; lele_nadeem_schmuland_2010; solymos_2010; gustafson_15).In contrast, non-identification of parameters is not a problem for complex BHMs, because proper priors lead to proper posteriors (gelfand_sahu_99; eberly_carlin_00; gustafson_15). However, it is unknown to what extent the posterior estimates of such parameters are determined by the data. In complex BHMs, it can happen that the posterior is not determined by the data at all but is completely determined by the prior. Because the detection of non-identified parameters in BHMs is very challenging, carlin_louis_1996 and eberly_carlin_00 suggest that one should avoid BHMs for which the identifiability issues have not been clarified. For this reason, a formal method applicable to Bayesian NNHM and BHMs is needed that can quantify the empirical determinacy of the parameters, thus answering the question of to what extent the posterior estimates are determined by the data.

Recently, roos_hunanyan_bakka_rue_2020 considered Bayesian NNHM and proposed a two-dimensional method for sensitivity assessment based on the second derivative of the Bhattacharyya coefficient (BC). In one dimension, they used a factor

to scale the within-study standard deviation provided by the data. This scaling perturbed the total number of individual patient data provided for Bayesian meta-analysis, which is related to data cloning

(lele_nadeem_schmuland_2010; solymos_2010). Alternatively, perturbation of the total number of observations could be expressed by a weighted likelihood with (bissiri_holmes_walker_2016; holmes_walker_2017). An open question remains whether the applicability of the scaling proposed by roos_hunanyan_bakka_rue_2020 could be generalized to weighting other likelihoods, or extended to general-purpose Bayesian estimation techniques such as the Bayesian numerical approximation INLA (rue_martino_chopin_09), and Bayesian MCMC sampling with JAGS (plummer_jags_16).We expect that likelihood weighting will affect both the location and the spread of the marginal posterior distributions. Because the method proposed by roos_hunanyan_bakka_rue_2020 quantifies the total impact of the scaling on the posteriors without focusing on location and spread, a refined method is needed. Ideally, such a refined method should quantify what proportion of the total impact of the data on the posterior affects the location and what proportion affects the spread.

To provide a refinement, we consider the second derivative of the BC induced by the weighted likelihood, which quantifies the total empirical determinacy (TED) for each parameter. Based on the properties of both the BC and the rules for the computation of the second derivative, we split TED into two parts: one for location (EDL) and one for the spread (EDS). This enables the quantification of pEDL, the proportion of EDL to TED, and pEDS, the proportion of EDS to TED. This method is implemented in the R package ed4bhm and works with JAGS and INLA. The method proposed quantifies how much weighted data impact posterior parameter estimates and whether location or spread is more affected by the weighting.

To demonstrate how the proposed method works in applications, we consider two case studies and one simulation study. We use INLA and JAGS for the estimation. We consider both centered and non-centered parametrizations for Bayesian NNHM and demonstrate how TED, pEDL, and pEDS depend on the amount of pooling induced by the heterogeneity prior. Moreover, we consider both the Bayesian NNHM applied to

and a logit model. To demonstrate that the proposed method is useful in applications, for clarifying the empirical determinacy of parameters, we consider three variants of the NNHM combined with different prior assumptions in a simulation study.

In Section 2, we review the theoretical background behind our method and its implementation through R package ed4bhm. In Section 3, we review the case studies and the models assumed. Then, in Section 4 we present the results which demonstrate the performance of the measures TED, pEDL and pEDS in applications. We conclude with a discussion in Section 5.

## 2 Methods

In this section, we review the Bayesian NNHM, present the idea of weighting the likelihood, and define the TED measure, which quantifies the total empirical determinacy of posterior parameters. Moreover, we define the proportion of empirical determinacy for posterior location (pEDL) and spread (pEDS). In addition, we specify the implementation of TED, pEDL and pEDS in INLA and JAGS. Finally, a short description of the R package ed4bhm is provided.

### 2.1 Bayesian meta-analysis

The Bayesian normal-normal hierarchical model (NNHM) consists of three levels: the sampling model, the latent random-effects field, and the priors. We consider the Bayesian NNHM with data , within-study standard deviations and between-study standard deviation

(1) |

In this model, are fixed, is the overall effect, is the heterogeneity, and are the random effects. For case studies, we assume a weakly informative normal prior and either a half-normal (HN) or a half-Cauchy (HC) prior . We use HN and HC with domain on

that emerge after taking absolute value of normal and Cauchy distributions located at 0 with domain on

. We estimate marginal posterior distributions for all the parameters with both INLA (rue_martino_chopin_09) and JAGS (plummer_jags_16).### 2.2 Likelihood weighting

Likelihood weighting is applicable to BHMs with various primary outcomes. A base model is the model without weighting (). Posterior distribution in the base model can be obtained as

(2) |

where is the likelihood, is the latent field, is the assumed prior distribution, and is the set of all the parameters in the model except the random effects .

The posterior distribution in a model with weighted likelihood can be written as

(3) |

Note that, gives more weight to the likelihood (corresponds to the case of having more data) and gives less weight to the likelihood (corresponds to the case of having less data). For example, for the Bayesian NNHM given by (1), the posterior distribution from a weighted model reads as

where are random effects.

### 2.3 Affinity measure Bhattacharyya coefficient

Following roos_hunanyan_bakka_rue_2020, we consider the Bhattacharyya coefficient (BC) (bhattacharyya_43)

between two probability density functions, which is given by

(4) |

where . BC attains values in [0, 1]. Moreover, BC is 1 if and only if the two densities completely overlap and is 0 when the domains of the two densities have no overlap. BC has convenient numerical properties and is invariant under any one-to-one transformation (for example, logarithmic) (jeffreys_61; roos_held_11; roos_martins_held_rue_15; roos_hunanyan_bakka_rue_2020). Furthermore, BC is directly connected to the Hellinger distance (H), by .

For two normal densities, BC defined in Equation (4) reads

(5) |

where is the density of a distribution with

. In general, any density can be approximated to the first order by a normal distribution

(johnson_kotz_balakrishnan_94; hjort_glad_95). Following roos_hunanyan_bakka_rue_2020, we approximate the marginal posterior distribution for byand apply moment matching together with the Equation (

5) to obtain(6) |

For posteriors of precision parameters, we apply this approximation to log-transformed posteriors. This approach is justified by the invariance of BC.

### 2.4 Quantification of the total empirical determinacy

Following roos_hunanyan_bakka_rue_2020, we define the TED() measure as the negative second derivative of the Bhattacharyya coefficient (BC) between the base and weighted posterior distributions for each parameter of the model. Here,

(7) |

where is the marginal posterior distribution from the model with weighted likelihood with . By , TED evaluates the second derivative of the squared Hellinger distance . Note that, denotes the base model with the original likelihood, which induces the base marginal posterior density . We quantify the total empirical determinacy (TED) of the marginal posterior of the parameter. TED can be used to compare the empirical determinacy between different parameters and by computing a ratio TED/TED (roos_hunanyan_bakka_rue_2020).

Note that TED in Equation (7) measures the impact of likelihood weighting on the whole marginal posterior distribution. We refine this approach and consider the impact of likelihood weighting on location (L) and spread (S) of the marginal posterior distribution in the next section.

### 2.5 Empirical determinacy of location and spread

In this section, we show how to quantify the impact of likelihood weighting on posterior location (L) and spread (S). This gives rise to empirical determinacy of location (EDL) and spread (EDS).

Given the BC between two normal densities in Equation (5), we consider , where

and

Equations (2)–(5) of the Supplementary Material show that

(8) |

which is equivalent to

(9) |

where

(10) |

and

(11) |

for . Equation (9) facilitates computation of the proportion of the empirical determinacy of the location (pEDL), which is the proportion of EDL to TED, and the empirical determinacy of the spread (pEDS), which is the proportion of the EDS to TED (Equation (12))

(12) |

where .

### 2.6 Computational issues

In practice, we approximate the derivatives numerically by the second-order central difference quotient formula. Note that . Thus, for Equation (7), we obtain

(13) | ||||

(14) |

For computations, we consider weights , with . For precisions, we conduct all computations on the logarithmic scale.

#### 2.6.1 Implementation of TED in INLA

In order to implement in INLA, we use the INLA object from the base model (). The function inla.rerun fits an INLA model with weighted likelihood. We use this function for and . Moreover, we extract the summary statistics (mean and standard deviation) of marginal posterior distributions directly from INLA objects. For log-precision, we extract the estimates on internal scale provided in the INLA object.

#### 2.6.2 Implementation of TED in JAGS

For JAGS, we derive a general formula for weighting the likelihood in BHMs fit by MCMC sampling. The posterior distributions for the base and weighted models are given by Equations (2) and (3). From Equation (2) we obtain

(15) |

We then plug in the formula (15) into (3) to get

(16) |

Then,

(17) |

To estimate the posterior from a weighted model without re-running MCMC sampling we use Equation (17). To evaluate the likelihood values in JAGS, we activate the dic module, which monitors

(18) |

where . Hence,

(19) |

The MCMC samples from and values can be extracted from the JAGS object provided by JAGS. To estimate the mean and standard deviation of , we compute

(20) |

and

(21) |

where = (held_sabanesbove_2020). For log-precisions, we take the logarithmic transformation of the MCMC sample for the precision parameter and apply Equations (20) and (21) on that transformed sample.

### 2.7 Relative latent model complexity

In the Bayesian NNHM, the interplay between the within-study standard deviation values provided by the data from studies and the between-study standard deviation can be characterized by the effective number of parameters in the model (spiegelhalter_best_carlin_linde_02). Like roos_hunanyan_bakka_rue_2020, we consider the standardized ratio, the relative latent model complexity (RLMC)

(22) |

RLMC defined in Equation (22) attains values between 0 and 1 and expresses the amount of pooling induced by a heterogeneity prior (gelman_hill_07).

In Section 4.2, a grid of scale parameter values (4.1, 10.4, 18, 31.2, 78.4) for the heterogeneity prior

induced by the grid of RLMC values fixed at 0.05, 0.25, 0.5, 0.75, 0.95. Whereas HN(4.1), which corresponds to RLMC=0.05, assigns much probability mass to

values close to 0 and induces much pooling, HN(78.4), which corresponds to RLMC=0.95, assigns less probability mass to values of close to 0 and induces little pooling.### 2.8 Bayesian computation and convergence diagnostics

The MCMC simulations performed in this paper are based on four parallel chains, with a length of iterations. In each chain, we removed the first half of iterations as a burn-in period and from the remaining iterations we kept every 100th iteration in each of the four chains. Our choice of these parameters was guided by raftery.diag for Model C1 (Section 4.4) and supported by vehtari_gelman_simpson_carpenter_buerkner_2021.

To assess the convergence to a stationary distribution, we applied Convergence Diagnostics and Output Analysis implemented in the package coda (plummer_best_cowles_vines_96). Moreover, we implemented the rank plots proposed by vehtari_gelman_simpson_carpenter_buerkner_2021, which are histograms of posterior draws ranked over all chains and plotted separately for each chain. Nearly uniform rank plots for each chain indicate good mixing of chains. In addition, we used the function n.eff from the package stableGR (vats_knudson_2021)

, which calculates the effective sample size using the lugsail variance estimators and determines whether Markov chains have converged.

### 2.9 The R package ed4bhm

The open-access R package ed4bhm *Empirical determinacy for Bayesian hierarchical models* (https://github.com/hunansona/ed4bhm) uses the proposed method to quantify the empirical determinacy of BHMs implemented in INLA and in JAGS. The two main functions in this package are called ed.inla and ed.jags. These functions were used to generate the results reported in Sections 4.1 – 4.4, and in Sections 3 – 4 and 7 of the Supplementary Material.

## 3 Data and models

In this section, we review data and models for two case studies. Moreover, we review the design of a simulation study.

### 3.1 Eight schools

To quantify the effect of a coaching program on SAT-V (Scholastic Aptitude Test-Verbal) scores in eight high schools (Table 3 on page 18 of the Supplementary Material), data from a randomized study was pre-analyzed and used for a Bayesian meta-analysis. The data from these eight schools has been used to study the performance of the Bayesian NNHM and to demonstrate typical issues which arise for BHMs (gelman_hill_07; gelman_carlin_stern_dunson_vehtari_rubin_14; vehtari_gelman_simpson_carpenter_buerkner_2021).

We consider two parametrizations of the Bayesian NNHM: the centered and non-centered parametrizations (vehtari_gelman_simpson_carpenter_buerkner_2021). The model with centered parametrization is defined as

(23) |

where . This parametrization is used for both INLA and JAGS. On the other hand, the model with the non-centered parametrization (gelman_carlin_stern_dunson_vehtari_rubin_14; vehtari_gelman_simpson_carpenter_buerkner_2021) reads

(24) |

for , which we implemented in JAGS.

### 3.2 Respiratory tract infections

In this section, we review the meta-analysis data set including 22 randomized, controlled clinical trials on the prevention of respiratory tract infections (RTI) by selective decontamination of the digestive tract in intensive care unit patients (bodnar_etal_17) that are presented in Table 4 on page 19 of the Supplementary Material.

For the RTI data set, we consider two different models. First, a Bayesian NNHM model

(25) |

and second, a binomial model with logistic transformation

(26) |

where

is a stacked vector attaining value 1 for the treatment and 0 for the control group,

, , where f.t, f.c, n.t and f.c are vectors of length 22 (see in Table 4 of the Supplementary Material). Originally, the weakly informative HC(1) prior assumption for was suggested by bodnar_etal_17. ott_plummer_roos_2020 suggested a prior N(0, 16) for and used the HC(1) prior for . We assume priors for and similar to the prior for .### 3.3 Simulation study

Our simulation study extends the original simulation suggested by solymos_2010. We simulate a sample of random observations of size under NNHM with parameters . See Section 7 of the Supplementary Material for further details.

For the analysis of the simulated data, we use three types of models A, B, and C. Model A is a normal model that does not assume random effects. Models B and C assume a Bayesian NNHM with random effects. Model B assumes that the within-study standard deviation is known and assigns a prior to the between-study standard deviation . This defines the usual Bayesian NNHM. In contrast, model C assigns priors to both the within-study () and between-study () standard deviations. Note that the parameters and in model C are non-identified, because only the sum of the within-study and between-study variances is identified by the likelihood (bayarri_berger_04; lele_2010; solymos_2010; gelman_carlin_stern_dunson_vehtari_rubin_14).

For the parameter , all models A, B, and C assume a zero mean normal prior with variance fixed at . For model B, three different priors are assigned to the between-study standard deviation: exp(N(0, )) for B1 (solymos_2010), SqrtIG(0.001, 0.001) for B2, and SqrtIG(4, 1) for B3. For model C3, a SqrtIG(150, 6) prior is assigned to the within-study standard deviation . Models C1 and C3 assume identical priors for and . Table 5 on page 22 of the Supplementary Material reports the properties of all priors assumed for the standard deviations. Whereas exp(N(0, )) and SqrtIG(0.001, 0.001) show very large medians, SqrtIG(4, 1) and SqrtIG(150, 6) show medians close to both true parameters and chosen for the simulation.

## 4 Results

### 4.1 Motivating examples

Sections 3 and 4 of the Supplementary Material consider two motivating examples. Both examples use a normal likelihood with identified parameters. One example considers the posterior of the mean. The other example focuses on the posterior of the precision. Because the parameters in the likelihoods of both motivating examples are identified, the rate of the decrease of the sample size is close to , which matches the result provided by data cloning (lele_dennis_lutscher_2007; lele_2010; lele_nadeem_schmuland_2010; solymos_2010). In both motivating examples, the estimates of the pEDS are close to 1. This indicates that likelihood weighting affects mostly the spread of the parameters. In fact, both examples motivate that the rate used by cloning translates into the properties of the pEDS measure.

### 4.2 Eight schools example

param | method | ESS | mean | sd | q0.025 | q0.5 | q0.975 | TED | EDL | EDS | pEDL | pEDS |

INLA | 3.58 | 2.95 | -2.24 | 3.58 | 9.34 | 0.11 | 0.09 | 0.02 | 0.82 | 0.18 | ||

JAGSc | 1.5e+04 | 3.58 | 2.94 | -2.30 | 3.56 | 9.36 | 0.10 | 0.08 | 0.02 | 0.82 | 0.18 | |

JAGSnc | 1.8e+04 | 3.59 | 2.96 | -2.18 | 3.56 | 9.36 | 0.11 | 0.09 | 0.02 | 0.83 | 0.17 | |

INLA | -1.60 | 2.18 | -4.49 | -2.07 | 4.00 | 7e-04 | 7e-04 | 3e-05 | 0.96 | 0.04 | ||

JAGSc | 1.6e+04 | -1.60 | 2.22 | -4.48 | -2.07 | 4.06 | 7e-04 | 5e-04 | 2e-04 | 0.69 | 0.31 | |

JAGSnc | 1.6e+04 | -1.59 | 2.18 | -4.44 | -2.07 | 4.06 | 6e-04 | 5e-04 | 7e-05 | 0.88 | 0.12 |

We analyze the data from the eight schools (Table 3 on page 18 of the Supplementary Material) with INLA and JAGS and show the posterior descriptive statistics and estimates of the empirical determinacy for the parameters

and in Table 1 on page 1. For JAGS, both the centered (JAGSc, Equation (23)) and non-centered (JAGSnc, Equation (24)) parametrizations showed high values of ESS. Although the values of the posterior descriptive statistics provided by INLA, JAGSc, and JAGSnc are similar, it is unknown to what extent the posteriors of and are impacted by the data.This issue is addressed by the estimates of both the total empirical determinacy (TED) and the within-parameter empirical determinacy (EDL and EDS). For example, the values of TED() are larger than those of TED(). This indicates that the data impacts the posterior of more than the posterior of . pEDL and pEDS provide further details and demonstrate that for both and the location of the marginal posterior distribution is more impacted by the data than its spread. Whereas the estimates of pEDS() provided by INLA, JAGSc and JAGSnc are close to 0.18, those of pEDS() differ depending on the centered (JAGSc and INLA) and non-centered (JAGSnc) parametrization. This result indicates that the parametrization used for MCMC sampling may affect the impact of the data on the parameter estimates.

Figure 1 puts the results of Table 1 on page 1 in a broader context and shows a transition phase plot for the data of the eight schools (Table 3 on page 18 of the Supplementary Material) fit by INLA, JAGSc, and JAGSnc across a grid of HN heterogeneity priors with scale parameters equal to 4.1, 10.4, 18, 31.2, 78.4. As explained in Section 2.7, this grid specifies an RLMC grid (0.05, 0.25, 0.5, 0.75, 0.95) of pooling induced by the heterogeneity prior. The top panel of Figure 1 demonstrates that TED is always larger for (red) than for (blue) across both the grid of RLMC values and the estimation techniques (INLA, JAGSc, and JAGSnc). Independently of the amount of pooling induced by the heterogeneity prior, the posterior of is more determined by the data than is the posterior of .

The proportion of the estimates of the within-parameter empirical determinacy (pEDS) for the scale of (middle panel) and (bottom panel) of Figure 1 provide more insight. For HN heterogeneity priors with values of the scale parameter equal to 4.1, 10.4, 18, 31.2, 78.4 and for three estimation techniques (INLA, JAGSc, JAGSnc), the values of pEDS() remain at a low level of at most 20%. This indicates that the data determine the spread of the posterior of less than its location. In contrast, the estimates of pEDS() highly depend on the amount of pooling induced by the heterogeneity prior. The estimates of pEDS() increase from 10% to 90% across the values of RLMC. This means that heterogeneity priors that induce little pooling lead to posteriors of with spreads more determined by the data than for heterogeneity priors that induce much pooling. Although the estimates of pEDS() are similar for the three estimation techniques (INLA, JAGSc, JAGSnc), the values of pEDS() depend on the estimation technique.

### 4.3 RTI example

param | method | model | ESS | mean | sd | q0.025 | q0.5 | q0.975 | TED | EDL | EDS | pEDL | pEDS |

NNHM | INLA | -1.29 | 0.21 | -1.74 | -1.28 | -0.91 | 0.39 | 0.39 | 7e-04 | 1.00 | 2e-03 | ||

JAGS | 1.6e+04 | -1.29 | 0.21 | -1.74 | -1.28 | -0.91 | 0.40 | 0.40 | 6e-04 | 1.00 | 2e-03 | ||

INLA | 0.79 | 0.57 | -0.27 | 0.76 | 2.00 | 1.04 | 0.78 | 0.27 | 0.74 | 0.26 | |||

JAGS | 1.6e+04 | 0.79 | 0.59 | -0.27 | 0.76 | 2.04 | 1.23 | 0.78 | 0.45 | 0.63 | 0.37 | ||

logit | INLA | -0.59 | 0.26 | -1.11 | -0.60 | -0.07 | 0.02 | 0.02 | 0.01 | 0.75 | 0.25 | ||

JAGS | 1e+04 | -0.60 | 0.26 | -1.11 | -0.60 | -0.08 | 0.02 | 0.01 | 6e-03 | 0.61 | 0.39 | ||

logit | INLA | -1.50 | 0.38 | -2.25 | -1.49 | -0.76 | 0.03 | 0.03 | 3e-03 | 0.90 | 0.10 | ||

JAGS | 1e+04 | -1.49 | 0.38 | -2.26 | -1.48 | -0.74 | 0.02 | 0.02 | 3e-03 | 0.89 | 0.11 | ||

INLA | -0.30 | 0.27 | -0.85 | -0.30 | 0.21 | 0.33 | 0.32 | 6e-03 | 0.98 | 0.02 | |||

JAGS | 1.6e+04 | -0.31 | 0.27 | -0.86 | -0.30 | 0.21 | 0.31 | 0.30 | 0.01 | 0.96 | 0.04 |

Table 2 on page 2 provides posterior descriptive statistics and estimates of the empirical determinacy for the RTI data (Table 4 of the Supplementary Material) fit by INLA and JAGS. We considered two different models: a Bayesian NNHM defined in Equation (25) with a normal primary outcome and a Bayesian logit model defined in Equation (26) with a binomial primary outcome. The MCMC chains provided by JAGS show high values of ESS (Table 2 on page 2). The marginal posteriors provided by INLA and JAGS match well and provide similar descriptive statistics (see Figures 6 and 7 of the Supplementary Material). The descriptive statistics for the parameter differ between the NNHM and logit models. For both the NNHM and logit models, the estimates of TED indicate that the posterior of is more impacted by the data than are the posteriors of , , and . NNHM provides lower values of pEDS() than the values of pEDS() provided by the logit model. This indicates that the structure of the primary outcome and the model can have a great impact on the empirical determinacy of the parameters.

### 4.4 Simulation: empirical determinacy

0.15 | 0.16 | 6e-08 | 1.00 | 0.14 | 0.86 | |||||||

0.15 | 0.15 | 5e-04 | 1.00 | 0.09 | 0.91 | |||||||

1e-06 | 0.21 | 0.00 | 1.00 | 0.89 | 0.11 | |||||||

7e-04 | 0.21 | 0.35 | 0.65 | 0.92 | 0.08 | |||||||

8e-06 | 0.24 | 0.00 | 1.00 | 0.90 | 0.10 | |||||||

3e-05 | 0.27 | 0.05 | 0.95 | 0.89 | 0.11 | |||||||

2e-04 | 0.13 | 6e-08 | 1.00 | 0.93 | 0.07 | |||||||

3e-03 | 0.12 | 0.05 | 0.95 | 0.95 | 0.05 | |||||||

0.45 | 3e+03 | 2e+03 | 6e-08 | 1.00 | 0.90 | 0.12 | 0.88 | 0.13 | ||||

182.48 | 1e+04 | 9e+03 | 0.70 | 0.25 | 0.57 | 0.65 | 0.2 | 0.92 | ||||

0.01 | 1e+04 | 1e+04 | 6e-08 | 1.00 | 0.83 | 0.37 | 0.94 | 0.12 | ||||

0.06 | 6e+02 | 2e+02 | 0.8 | 0.2 | 0.67 | 0.34 | 0.87 | 0.13 | ||||

3e-04 | 1.02 | 0.18 | 6e-08 | 1.00 | 1.00 | 0.00 | 0.92 | 0.08 | ||||

4e-03 | 1.13 | 0.2 | 1e-03 | 1.00 | 0.99 | 0.01 | 0.97 | 0.03 |

A | 16000.00 | 16000.00 | ||

B1 | 16000.00 | 16000.00 | ||

B2 | 16000.00 | 16000.00 | ||

B3 | 16000.00 | 16000.00 | ||

C1 | 78.65 | 2203.84 | 2826.11 | |

C2 | 15429.11 | 9314.01 | 9079.20 | |

C3 | 16000.00 | 16000.00 | 15678.21 |

Table 3 on page 3 provides estimates of the empirical determinacy for the simulation study described in Section 3.3, which considers three types of NNHM models (A, B and C), which are fit by INLA and JAGS. The results for JAGS are based on MCMC samples with ESS reported in Table 4 on page 4.

For model A, the estimates of TED() and TED() in Table 3 on page 3 are similar. Moreover, pEDS() and pEDS() are high. For example, pEDS demonstrates that the spread of the posterior of is highly impacted by the data.

For models B1, B2 and B3, the values of TED() are larger than the values of TED() in Table 3 on page 3 and indicate that the posterior of is more impacted by the data than is the posterior of

. Similarly to the phase transition in the example of the eight schools discussed in Section

4.2, the values of TED() depend on the heterogeneity prior. Moreover, the large estimates of pEDS() demonstrate that a large proportion of the change in the posterior distribution of is due to the change in spread rather than in location. In contrast, the low estimates of pEDS() indicate that the posterior spread is not much determined by the data.Models of type C assume priors on the between-study standard deviation and on the within-study standard deviation. The estimation of models C1, C2 and C3 is very challenging. For example, the ESS of MCMC samples for model C1 provided by JAGS (Table 4 on page 4) are very small. Although the parameters and are non-identified in models C2 and C3, ESS is reasonably high. Table 3 on page 3 shows that TED of the posteriors of and is larger than that of the parameter . Again, the values of TED() and TED() depend on heterogeneity priors. For C1 and C2, the estimates of pEDS() differ between INLA and JAGS attaining large values for INLA and low values for JAGS. This indicates that numerical Bayesian approximation (INLA) and MCMC sampling (JAGS) can react differently to models with a non-identified likelihood. For model C1, pEDS() and pEDS() values shown by INLA and JAGS disagree. This is due to the small values of the ESS value of the MCMC samples provided by JAGS. In contrast, these estimates of pEDS agree well for models C2 and C3. The difference between the values of pEDS() and pEDS() for models C2 and C3 indicates that the prior assumptions impact the values of pEDS.

## 5 Discussion

We considered two case studies and one simulation study. For the well-known eight-school example we applied Bayesian NNHM, considering INLA analysis and both centered and non-centered parametrizations for JAGS. Moreover, we provided a transition phase plot for the estimates of TED, pEDL and pEDS across a grid of heterogeneity prior scale parameters, which govern the amount of pooling induced by the heterogeneity prior. This provided insights into how TED, pEDL and pEDS change depending on the properties of the heterogeneity prior. For the RTI data set, we used both the Bayesian NNHM applied to log(OR) and a logit model providing insights into how TED, pEDL and pEDS change depending on the primary outcome. To challenge the method proposed, we relaxed the assumption that the within-study standard deviation is known and assumed priors for both the within-study and between-study standard deviation. This provided an insight into how TED, pEDL and pEDS perform when the underlying model has two non-identified parameters for both INLA and JAGS. The proposed method provided novel insight and proved useful in clarifying the empirical determinacy of the parameters in the Bayesian NNHM.

The analysis of two simple motivating examples, normal mean and normal precision, translated the results provided by data cloning, which are based on the criterion, to the unified pEDS measure. They showed that for identified parameters of non-hierarchical likelihoods the spread of the posterior is mainly affected by the likelihood weighting and leads to pEDS estimates close to 1. We prefer the use of pEDS, because the application of the criterion to BHMs is not well justified (jiang_2017; lewbel_2019).

This method considerably refines and generalizes the original method proposed by roos_hunanyan_bakka_rue_2020. We proposed a unified method that quantifies what proportion of the total impact of likelihood weighting on the posterior is due to the change in the location (pEDL) and what proportion is due to the change in the spread (pEDS). This was achieved by matching posteriors moments with those of a normal distribution for the computation of the BC. Note that the normal distribution encapsulates the information about its location and spread in two unrelated parameters. This property proved useful for the definition of pEDL and pEDS. Because likelihood weighting affects both the location and the spread of the marginal posterior distributions, the proposed method is better suited for the quantification of empirical determinacy than other methods that are focused only on the total impact (roos_hunanyan_bakka_rue_2020; kallioinen_paananen_buerkner_vehtari_2021).

We successfully applied the proposed method to a non-normal likelihood. This shows that the proposed method can also be applied to other BHMs with different primary outcomes. Moreover, we implemented this method in general-purpose Bayesian estimation programs, such as INLA and JAGS. For JAGS, we developed and applied a technique for the fast and efficient computation of likelihood weighting, which dispenses with re-runs of MCMC chains. All these refinements and generalizations enable future extensions of the method proposed to complex BHMs and to other general-purpose Bayesian estimation programs, such as Stan (Stan_development_team_Manual_16). The estimates of the empirical determinacy will help researchers understand to what extent posterior estimates are determined by the data in complex BHMs.

Currently, the weighting approach focuses on weights that are very close to 1. This is very useful to study the impact of the total number of patients, which is induced by within-study sample size changes, on the posteriors of parameters in Bayesian NNHM. However, the data cloning approach, which changes the total number of studies included for the Bayesian meta-analysis, indicates that integer weights that are larger than 1 could also be very useful in assessing the impact of the data on BHMs. More work is necessary to extend the proposed method to cope with integer weights.

The proposed method is based on estimates of the mean and standard deviation for location and spread. These posterior descriptive statistics are provided by default by general-purpose software for Bayesian computation. Moreover, they are used by other modern approaches to quantify the impact of priors on posteriors (reimherr_meng_nicolae_2021). However, for stability reasons, vehtari_gelman_simpson_carpenter_buerkner_2021

recommend the use of location and spread estimates based on quantiles. In addition,

vehtari_gelman_gabry_2017 recommend Pareto smoothed importance sampling (PSIS) to regularize importance weights, which flow into the computation of the location and spread of the weighted posteriors. Therefore, anchoring the method proposed on quantiles and the use of PSIS are two additional goals that need to be addressed by our future research.This paper focuses mainly on Bayesian NNHM. However, the application of the method proposed to other complex BHMs would provide a further insight into the empirical determinacy of posterior parameter estimates in other applications. The openly accessible R package ed4bhm (https://github.com/hunansona/ed4bhm) conveniently facilitates the application of the proposed method in other settings.

## Acknowledgments

Support by the Swiss National Science Foundation (no. 175933) granted to Małgorzata Roos is gratefully acknowledged.