A Robust Bayesian Copas Selection Model for Quantifying and Correcting Publication Bias

The validity of conclusions from meta-analysis is potentially threatened by publication bias. Most existing procedures for correcting publication bias assume normality of the between-study heterogeneity. However, this assumption may not be valid, and the performance of these procedures may be highly sensitive to departures from normality. Further, there exist few measures to quantify the magnitude of publication bias based on selection models. In this paper, we address both of these issues. First, we introduce the robust Bayesian Copas (RBC) selection model. This model serves as a default prior that requires minimal problem-specific tuning, offers robustness to strong assumptions about the distribution of heterogeneity, and facilitates automatic inference of the unknown parameters. Second, we develop a new measure to quantify the magnitude of publication bias. Our measure is easy to interpret and takes advantage of the natural estimation uncertainty afforded by the posterior distribution. We illustrate our proposed approach through simulation studies and analysis of real data sets. Our methods are implemented in the publicly available R package RobustBayesianCopas.



There are no comments yet.


page 1

page 2

page 3

page 4


Copas' method is sensitive to different mechanisms of publication bias

Copas' method corrects a pooled estimate from an aggregated data meta-an...

Modelling publication bias and p-hacking

Publication bias and p-hacking are two well-known phenomena which strong...

Testing for publication bias in meta-analysis under Copas selection model

In meta-analyses, publication bias is a well-known, important and challe...

Infinite Diameter Confidence Sets in a Model for Publication Bias

There is no confidence set of guaranteed finite diameter for the mean an...

Using clinical trial registries to inform Copas selection model for publication bias in meta-analysis

Prospective registration of study protocols in clinical trial registries...

A Bayesian Selection Model for Correcting Outcome Reporting Bias With Application to a Meta-analysis on Heart Failure Interventions

Multivariate meta-analysis (MMA) is a powerful tool for jointly estimati...

Template shape estimation: correcting an asymptotic bias

We use tools from geometric statistics to analyze the usual estimation p...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Meta-analysis is a powerful technique for combining statistical evidence from multiple related studies. By synthesizing information from multiple studies, meta-analysis often has higher statistical power and precision than a single study [22]. A standard model in meta-analysis is the random effects model [19, 34], which is specified as


where and are independent and distributed as for all . In (1.1), is the reported treatment effect for the th study, is the population treatment effect of interest, the ’s are random effects for the between-study heterogeneity, and the ’s are within-study errors, further scaled by each

th study’s reported standard error

. The parameter quantifies the amount of between-study heterogeneity.

The validity of meta-analysis is greatly compromised by the potential presence of publication bias, or the tendency for journals to publish studies showing significant results [44, 20]. In the presence of such bias, the studies in the published literature are a biased selection of the research in that area, resulting in biased estimation and misleading inference about [21]. A great deal of effort has gone into developing statistical methods to detect and correct publication bias. Graphical methods, such as the funnel plot, are one popular approach to this problem. Funnel plots assess the asymmetry of a scatter plot of the treatment effects from individual studies against their corresponding precisions. The presence of asymmetry in a funnel plot indicates potential publication bias [9, 39, 38]. Statistical tests to formally detect scatter plot asymmetry, such as regression tests [1, 9, 39, 27] and rank-based tests [8], have also been introduced. Duval and Tweedie [8] further develop a “trim and fill” method for estimating and adjusting for the number and outcomes of missing studies in meta-analysis.

As an alternative to graph-based methods, selection models have also been developed. Unlike graphical methods, selection models directly address the issue of missing studies. The idea behind this approach is that the observed sample of studies is a biased sample of research done in a particular area, which was produced by a specific selection process. For example, Hedges [16], Givens et al. [13], and Rufibach [36] model the likelihood of a study being selected (i.e. published) as a function of the

-value obtained under the hypothesis that there is not a significant treatment effect. Copas and colleagues further introduced a flexible framework in which the probability of selection is modeled as a function of

both the effect size and its standard error [5, 3, 4, 6]. In addition to correcting for potentially biased estimates of in (1.1), several statistical tests for the Copas selection model have been developed, e.g. goodness of fit tests [4] and score tests [7]. A detailed description of the Copas selection model is provided in Section 2.

There is strong empirical evidence in support of using the Copas selection model for correcting publication bias. Based on 157 meta-analyses with binary outcomes, Schwarzer et al. [37] showed that the Copas selection model gave superior performance over the trim-and-fill method, with better results among the 22 meta-analyses with evidence of selection bias. Carpenter et al. [2] also showed that the Copas selection model gave a clear interpretation in 80 percent of the meta-analyses.

In spite of its benefits, the Copas selection model relies on the assumption that the heterogeneous random effects

, are distributed as standard normal. This is actually a rather strong assumption, and it cannot be justified using the Central Limit Theorem even when the number of studies is large

[18, 23, 43]. In Section 6

, we illustrate that the results from selection models that assume normally distributed heterogeneity can be highly sensitive to violations of normality. In addition, existing non-Bayesian inference procedures for the Copas selection model typically rely on asymptotic arguments to construct confidence intervals for the parameters or to perform tests for publication bias

[4, 33, 7]. These asymptotic approaches to inference can be potentially problematic in practice, because often times, meta-analysis is performed with a small number of studies, sometimes fewer than five [28]. For example, an analysis of 14,886 meta-analyses from the Cochrane Database of Systematic Reviews found that over 90% of meta-analyses featured fewer than 10 studies [42, 24]. Thus, many of the tests developed for graphical approaches or selection models may not have sufficient power for small samples.

Given these issues, Bayesian approaches for correcting publication bias have a distinct advantage. First, Bayesian methods can provide further robustness through a robust prior on the between-study heterogeneity. Second, Bayesian methods automatically allow for nonasymptotic inference about unknown parameters through their marginal posterior distributions. In this work, we introduce the robust Bayesian Copas selection (RBC) model which specifically addresses the issue of robustness in the Copas selection model. Unlike previous work on Bayesian selection models, e.g. Mavridis et al. [29], the RBC model also utilizes a default set of noninformative priors on all the unknown parameters. This allows RBC to serve as a good default prior with minimal need for tuning by the practitioner.

In standard meta-analysis (1.1) and without addressing the issue of publication bias, a number of authors have also raised concerns about the normality assumption for the heterogeneity (see, e.g. [18, 23, 43, 32] and references therein). In [32], the normal assumption on the ’s in (1.1

) is replaced by a skew-normal distribution to provide additional modeling flexibility. In this work, we similarly place heavy-tailed Cauchy priors on the

’s, except our results are placed within the context of correcting publication bias, not merely standard meta-analysis.

Finally, we are not aware of any methods to quantify publication bias using selection models. For graphical methods, such as the funnel plot, there have been several measures proposed, including Egger’s regression intercept [9] and the skewness of the collected studies’ distribution [25]. See Lin et al. [26] for a more detailed review. However, skewness and asymmetry in funnel plots can arise from causes unrelated to study selection, such as induced correlation between the effect size and standard error arising from clinical or methodological differences between studies [40, 7].

In this paper, we develop a new measure for quantifying publication bias based on the Copas selection model. Our measure quantifies the dissimilarity between estimates obtained under the Copas selection model (2.1) and those obtained under a standard meta-analysis model (1.1). A key benefit of our approach is that it not only takes the difference between two point estimates, but it also takes into account the estimation uncertainty afforded by the full posterior distribution. Our measure lies between zero and one, with smaller values indicating negligible publication bias and values close to one indicating a very strong magnitude of publication bias. Thus, our approach has a clear and intuitive interpretation.

The rest of this paper is structured as follows. In Section 2, we describe the Copas selection model. In Section 3, we introduce the RBC model. In Section 4, we introduce the measure for quantifying publication bias based on the RBC model. In Section 5, we discuss practical implementation of our model. In Section 6, we illustrate the robustness of our approach through simulation studies. In Section 7, we apply the proposed methodology to real data sets.

2 The Copas Selection Model

The Copas selection model [3, 6] is specified as follows. For all ,


where and are marginally distributed as and and are independent. By (2.1), the th study is assumed to be published only if an associated latent variable (also known as the propensity score) is greater than zero. The propensity score, i.e. the propensity to publish, is characterized by two parameters . The parameter controls the overall likelihood of a study being published, while the parameter characterizes how the chance of publication depends on sample size. In general, is positive so that studies with larger sample sizes are more likely to be published. The reported effects and the propensity scores are assumed to be correlated through , which controls how the probability of publication is influenced by the effect size of the study. When publication bias is present, i.e. , standard meta-analysis will lead to biased estimation of . On the other hand, if , then there is no publication bias, and the model (2.1) reduces to the standard random effects model (1.1).

For practitioners, the main parameter of interest is the population treatment effect , although there are five total unknown parameters in (2.1). In [6, 33], the unknown parameters are estimated using maximum likelihood estimation (MLE), conditionally on a given pair . Copas and Shi [6] recommend choosing using a grid search. Copas and Shi [4] also developed a goodness-of-fit test for for a given choice of values . On the other hand, Ning et al. [33] assume that the biased data generating mechanism contains an additional latent variable that accounts for the additional unpublished studies. They treat these variables as missing data and develop an EM algorithm to simultaneously obtain the MLEs for . Recently, Duan et al. [7] developed a score-based hypothesis test to formally test for the presence of publication bias under the Copas selection model (i.e. testing ).

In the Bayesian framework, Mavridis et al. [29] estimate the parameters in (2.1) by placing noninformative priors on and informative priors on the lower and upper bounds for , which act as a proxy for priors on . To derive informative priors on the bounds for , Mavridis et al. [29] recommend using “both external data and an elicitation process of expert opinion.” However, this could be potentially difficult to implement for meta-analyses when such prior information or expertise is difficult or impossible to attain. In our view, it is desirable to devise a default prior that can work well for a variety of problems and that avoids the need for problem-specific tuning by the practitioner. The present manuscript thus differs from the work of [29] in several ways. First, we place non-informative priors on directly. Second, unlike [29], we also dispense with the assumption that the random effects are normally distributed. Finally, we introduce a new measure for quantifying publication bias, an issue which has not been addressed by previous works on selection models, Bayesian or frequentist.

3 The Robust Bayesian Copas Selection Model

3.1 The RBC Prior Specification

Throughout this section and the rest of the paper, we let , , and . Our objective is to formulate a robust, default Bayesian model for (2.1

) by placing appropriate priors on the unknown parameters. Specifically, we aim to make the priors noninformative so that a default set of hyperparameters will work well for many different problems and situations.

In the RBC model, we first endow the population mean effect in (2.1) with a normal prior,


where is set to be a large value so that the prior on is fairly noninformative. As a default, we set .

To model the between-study heterogeneity in (2.1), we endow the variance with the inverse gamma prior,


where the hyperparameters are set to be small values in order to make the prior on fairly noninformative. We recommend as a default. The reason that we place the prior on rather than is described in Appendix A. To summarize briefly, endowing with the prior allows for closed form updates for

in the Markov Chain Monte Carlo (MCMC) algorithm, from which one can easily estimate

using the MCMC draws of .

Next, we consider the priors for the heterogeneity in (2.1). In conventional meta-analysis, we have . If one has strong a priori knowledge that the normality assumption holds, then our model can be implemented with standard normal priors on the random effects. As discussed earlier, however, this assumption may be inappropriate. To relax this assumption, we endow each of the ’s with the standard Cauchy prior,


The standard Cauchy distribution with density

is equivalent to the Student’s

-distribution with one degree of freedom. Thus, it has heavy tails and is appropriate to use in scenarios where the normality assumption may be difficult to justify

[18]. In the robust Bayesian modeling literature, the Cauchy distribution is frequently used as an alternative to the normal distribution. See, e.g. [11, 10, 12]. In this paper, we extend its use to selection models in meta-analysis. Because of its heavy tails, the prior (3.3

) can help to mitigate the effects of outliers, skewness, and other departures from normality in meta-analysis. At the same time, this prior is also robust in the sense that it gives good estimates

even when the random effects are truly distributed as standard normal, as assumed by [6]. In our R package RobustBayesianCopas, we provide software to implement our model using either the conventional normality assumption or the robust Cauchy prior (3.3). As a default, however, we recommend (3.3).

We note that we choose to model only the between-study heterogeneity with the Cauchy distribution and not the within-study random errors . This is because there is rarely enough information in the sample of collected studies for the meta-analysis practitioner to model the within-study errors for any individual study. On the other hand, the sample of collected studies typically does contain enough information to model the between-study heterogeneity.

Next, we endow the correlation parameter in (2.1) with the noninformative uniform prior,


Finally, we consider the priors for and in (2.1), which control the probability of publication. We will ultimately place noninformative uniform priors on these two quantities. To determine appropriate values for the hyperparameters in the priors on , we first note that for some values between zero and one, our model should satisfy

Letting and denote the smallest and largest reported standard errors respectively, Mavridis et al. [29] showed that when , the above inequality translates to


denotes the inverse cumulative distribution function (cdf) for the standard normal density. Since the standard normal density places most of its mass in the interval

, this suggests the following priors for :




This ensures that most of the mass for the ’s will lie between , leading to selection probabilities from 2.5% to 99.7%. As noted earlier, our prior specification (3.5)-(3.6) differs from the Bayesian selection model introduced by Mavridis et al. [29], who placed informative priors on the lower and upper bounds of the selection probabilities . Here, we place noninformative priors on directly, with the primary aim of avoiding the need to perform problem-specific tuning of hyperparameters.

3.2 Inference Under the RBC Model

Under the independent priors (3.1)-(3.6) on , the RBC model produces a posterior distribution over these unknowns. Using MCMC, we can easily approximate the marginal posteriors , , , , and . We can then use these marginal densities to construct nonasymptotic

posterior credible intervals for each of these parameters. This allows us to automatically quantify uncertainty in a coherent manner, in addition to obtaining point estimates using either the posterior mean or median.

Rather than conducting statistical tests for (as in [4]) or (as in [7]), inference about these parameters can also be conducted directly through their marginal posteriors. Instead of testing , for instance, the posterior can be used to deduce the presence of non-negligible publication bias. For example, as shown in the two plots on the left of Figure 4, the posteriors are peaked around and respectively. These figures suggest non-negligible publication bias. On the other hand, if is concentrated near zero, this suggests that there is negligible publication bias. Although the marginal posterior cannot be used to formally test if is true, we view uncertainty quantification for as a more critical task. Moreover, we can still infer if from the posterior density . The measure for quantifying publication bias that we introduce in Section 4 can also be used to infer if .

4 Quantifying Publication Bias with the RBC Model

4.1 The Measure

Apart from inference about , we may be interested in quantifying the magnitude of publication bias. To the best of our knowledge, this issue has not been addressed before for selection models, although there exist several procedures based on funnel plots [26]. Since the Copas selection model (2.1) explicitly models the selection mechanism, a natural measure for quantifying publication bias is the dissimilarity between the estimate of under the standard random effects model (1.1) (i.e. fixing in (2.1)) and the estimate for under the RBC model where is also estimated from the data. If in (2.1), then there is no publication bias, and the estimates for under the standard meta-analysis model (1.1) and the Copas selection model (2.1) should theoretically be the same. On the other hand, if in (2.1), then the estimate of under (1.1) is a biased estimate, whereas the estimator for under (2.1) corrects this bias.

In practice, due to the need to use numerical methods to estimate the parameters, the estimates for produced under the standard meta-analysis model (1.1) and the Copas selection model (2.1) are unlikely to be exactly identical. Nevertheless, when the true is close to zero, the estimation discrepancy for between models (1.1) and (2.1) should still be small. Thus, the dissimilarity between these two estimates allows us to quantify how publication bias affects our results in meta-analysis.

Let be the posterior for under the complete RBC model with priors (3.1)-(3.6), Let be the posterior for under the RBC model where we fix . When , the posterior does not depend on , and thus, it is the same as the posterior distribution under the standard random effects model (1.1) with only priors (3.1)-(3.3) on .

To utilize the posterior for in our quantification of publication bias, we propose using the Hellinger distance between and . The Hellinger distance between two densities and is defined as


The Hellinger distance is symmetric and is bounded by zero and one. The magnitude of the Hellinger distance also has a clear interpretation. Values close to zero indicate that and are nearly identical distributions, while values close to one indicate that the majority of the probability mass in does not overlap with that of .

In the present context, we may estimate the posterior for and by using MCMC samples of

(after a burn-in period) to obtain kernel density estimates,

and . We then use numerical integration to estimate the Hellinger distance (4.1) between and . In short, our measure for the magnitude of publication bias, based on the full posterior, is


Smaller values of () indicate a small or negligible magnitude of publication bias, while larger values of () indicates a strong magnitude of publication bias. We note that can also be used to quantify the publication bias in the heterogeneity parameter by computing the Hellinger distance between and . However, as meta-analysis practitioners are mainly interested in the treatment effect, we focus on .

Figure 1: The posterior distributions (solid line) and (dashed line) from two experiments where the true heterogeneity , is distributed as . In the top panel, the true and . In the bottom panel, the true and .

In Figure 1, we illustrate the benefits of using as a measure of publication bias magnitude. These were taken from two simulations in the empirical study in Appendix B where the heterogeneity , is distributed as . In the top panel, we have plotted the posterior for (the solid line) against the posterior for (the dashed line) when (or very low publication bias). We see that there is significant overlap between the two distributions, and thus, . On the other hand, we see in the bottom panel that when (i.e. significant publication bias), the posteriors for and give more distinctive estimates of . Moreover, gives greater uncertainty about than , since the posterior is for is taller and thinner. Here, we obtain .

Since the magnitude of usually depends on the magnitude of the correlation parameter , one may wonder what the advantage of using is over simply using the posterior to quantify publication bias. In our view, is a more intuitive measure since it directly characterizes the amount of publication bias in the mean treatment effect . As shown in Figure 1, quantifies how much the posterior for changes after a bias correction has been made by our RBC model.

Unlike measures based on funnel plot asymmetry, such as the one proposed by Lin and Chu [25], our proposed

measure cannot determine the direction of the potential publication bias. However, our approach has several advantages. First, as a divergence measure between two probability distributions,

automatically takes into account the estimation discrepancy and the variability in . Second, since is always bounded between zero and one, it has a clear interpretation. Finally, our measure quantifies the change in that can be solely attributed to selection bias (or how much the posterior changes because of ), whereas asymmetry may be due to factors unrelated to selection, such as methodological differences between studies [40]

. If the direction of the bias is of particular interest, the practitioner can examine scatter plots or the sign (positive or negative) of the sample skewness of the standardized deviates

, where are point estimates under the usual random effects model (1.1) [25, 31].

4.2 Interpreting the Measure

Unfortunately, the posteriors and are analytically intractable and therefore have to be approximated using MCMC. Using kernel density estimation and numerical integration to evaluate the Hellinger distance (4.1) also introduces round-off errors. Due to these approximations, we will not, in general, be able to obtain (which would indicate the complete absence of publication bias, or that and are identical). However, even MLE approaches to estimating the parameters in (2.1) and (1.1) are typically unable to produce exactly identical estimates for , because we need to perform numerical optimization on two different likelihood functions. We believe it is much more important to determine whether publication bias is negligible, as opposed to completely absent.

We recommend the following guidelines for interpreting .

Range Interpretation
Negligible publication bias
Moderate publication bias
High publication bias
Very high publication bias

As demonstrated by our analysis of real data sets in Section 7 and an empirical study of 2000 simulated meta-analyses detailed in Appendix B, these guidelines are often reasonable in practice. Nevertheless, we caution that the actual value of may vary depending on many factors, such as the magnitudes of , , and in (2.1), or the magnitude and direction of the reported effect sizes . Therefore, our recommendations for interpreting above should be used as rough guidelines, and “acceptable” or “unacceptable” values for should be determined within the context of the problem being studied. We also recommend that practitioners examine plots of against (such as the ones in Figures 1 and 4) to visualize the extent of the bias correction made by the RBC method.

5 Implementation of the RBC Model

Having specified the priors on through (3.1)-(3.6) with our stated choices for default hyperparameters, we implement the RBC model (3) using Gibbs sampling. Our model is implemented using the JAGS software and is available in the R package, RobustBayesianCopas.

To implement the RBC model efficiently, let , and let . An alternative way to write model (2.1) is then


where is a truncated bivariate normal distribution. To sample from in (5.1), we follow the approximation introduced in [29], where we first sample , and then we sample . With this approximation for (5.1) and the reparametrization of our model in terms of , we can easily implement a Gibbs sampler. We defer further details to Appendix A.

The R package RobustBayesianCopas also provides a Gibbs sampler for the RBC model when is fixed as . Since the observed treatment effects are independent of the propensity scores in (2.1) when , we do not need to estimate , and the model (2.1) reduces to the standard random effects model (1.1). Under this standard meta-analysis model, we place the priors (3.1)-(3.3) on and again reparametrize , where . Details for computing the posterior with Gibbs sampling are provided in Appendix A.

6 Evaluation of the Robustness of the RBC Model

6.1 Methodology

We evaluate the robustness of the RBC method under a variety of distributions for the heterogeneity. Here, we are mainly concerned with how sensitive our method’s performance is to departures from the standard normality assumption for . For a detailed simulation of the measure (4.2), we refer the reader to Appendix B. We compared four selection models, benchmarked against the standard meta-analysis model.

  1. RBC: the complete model with priors (3.1)-(3.6) on under (2.1)

  2. RBC-conv: the RBC model for conventional meta-analysis with standard normal errors for the random effects

  3. Copas: the original (frequentist) Copas selection model (2.1)

  4. CLS: the Copas-like selection model of [33]

  5. SMA: the standard meta-analysis model (1.1) that does not account for publication bias

RBC-conv, Copas, CLS, and SMA all assume that the heterogeneity is normally distributed. In particular, the RBC-conv model uses the same priors on as those for RBC, but replaces the Cauchy priors (3.3) on with normal priors . For RBC and RBC-conv, we ran the Gibbs sampling algorithms described in Section 5 and Appendix A for 20,000 iterations, discarding the first 10,000 samples as burn-in. We used the posterior mean for as the point estimate for . The posteriors for were also used to obtain 95% nonasymptotic posterior credible intervals for .

The Copas selection model was implemented using the R function copas in the R package metasens and uses a grid search for tuning . We bootstrapped the residuals to obtain an estimate of the standard error (s.e.) of . The CLS model uses an EM algorithm to compute the MLEs for simultaneously. To estimate the s.e.’s for under CLS, we used the inverse Hessian matrix. For both Copas and CLS, we constructed the 95% confidence intervals as . The SMA method uses MLE to obtain estimates for under model (1.1) without accounting for publication bias. Similarly to CLS, we used the inverse Hessian matrix to estimate standard errors and 95% confidence intervals for SMA. Functions to implement CLS and SMA are provided in the R package RobustBayesianCopas.

We considered four simulation settings for true distribution of the heterogeneity in (2.1):

  • Experiment 1 (heavy-tailed):

  • Experiment 2 (standard normal):

  • Experiment 3 (several outliers): ,

  • Experiment 4 (skewed right): , where denotes the asymmetric Laplace distribution with scale and asymmetry parameter

In Figure 7 of Appendix B, we provide a plot for these four different distributions for the heterogeneity.

Figure 2: The average bias and coverage for Experiment 1 (heavy-tailed heterogeneity) and Experiment 2 (standard normal heterogeneity) for the methods: RBC (), RBC-conv (), Copas (), CLS (), and SMA (). In Experiment 1, RBC has the lowest bias and the highest CP. In Experiment 2, RBC-conv has the lowest bias and the highest CP.

6.2 Simulation Results

For our synthetic experiments, we simulated a meta-analysis of studies under the model (2.1). We generated the within-study standard errors , from . We set , , , and . We varied the correlation , so that we could evaluate the methods in Section 6.1 under varying degrees of publication bias. For the four experiments detailed in Section 6.1, we repeated this for 100 replications and computed the average bias, , and the coverage probability (CP) (or the proportion of times the 95% posterior credible or confidence intervals contained ).

Figure 3: The average bias and coverage for Experiment 3 (several outliers) and Experiment 4 (skewed right) for the methods: RBC (), RBC-conv (), Copas (), CLS (), and SMA (). In these experiments, RBC has lower bias and much higher CP than the other methods. This demonstrates that RBC is the most robust to departures from normality.

In Figure 2, we plot the average bias and CP for Experiment 1 (heavy-tailed heterogeneity) and Experiment 2 (standard normal heterogeneity) for the five methods we described in 6.1. We see that in Experiment 1, the RBC method has the lowest bias and the highest coverage. In Experiment 2, RBC-conv has the lowest bias and highest CP. However, the default RBC method with Cauchy priors on the heterogeneity performs quite similarly to the other selection methods that assume normally distributed heterogeneity. Moreover, the default RBC model with Cauchy priors on maintains excellent coverage properties even when is truly distributed as standard normal. Unsurprisingly, SMA has the highest bias and the lowest CP as increases, because SMA fails to take into account the publication bias.

The advantage of the RBC method over the other methods becomes even more pronounced when there is heavy departure from normality for the heterogeneity. We see this in our results for Experiment 3, where follows a mixture of normal distributions that results in several outliers, and in Experiment 4, where the distribution of is skewed right. In Figure 3, we plot the average bias and CP for Experiment 3 (several outliers) and Experiment 4 (skewed right) for the five methods we described in 6.1. Under these scenarios, RBC not only has lower bias, but it also has significantly better coverage than the other methods. Our results illustrate that the default RBC method is much less sensitive to the presence of outliers or skewness of the heterogeneity. It is also worth noting that even though RBC-conv resulted in higher average bias, its CP was still better than Copas, CLS, or SMA. This suggests that for selection models, Bayesian posterior credible intervals can often produce better uncertainty quantification than frequentist confidence intervals constructed using asymptotic arguments or bootstrapping.

7 Real Data Applications

7.1 Relationship Between Second-Hand Tobacco Smoke and Lung Cancer

We first applied the proposed RBC method to a meta-analysis of studies on the relationship between second-hand tobacco smoke and lung cancer. Hackshaw et al. [14] previously analyzed the results from 37 studies that evaluated the risk of developing lung cancer in women who were lifelong non-smokers but whose husbands smoked, compared to women whose husbands had never smoked. In particular, Hackshaw et al. [14] fit a random effects meta-analysis model (1.1

), resulting in a pooled odds ratio (OR) of 1.24 and a 95% confidence interval of (1.13, 1.36).

Hackshaw et al. [14] concluded that married, non-smoker women who were exposed to secondhand smoke by their smoker husbands were 24% more likely to develop lung cancer than those whose husbands did not smoke.

Previous analysis of this data by [33] suggested some evidence of publication bias. We used the RBC model to estimate , the log-odds ratio for developing lung cancer. We first performed inference about the presence of publication bias using the posterior distribution for . The top-left panel of Figure 4 shows that is concentrated about 0.5, with a posterior median of . This suggests the presence of publication bias. Note that for , we use the posterior median as a point estimate rather than the mean, since the posterior for is often skewed (and hence the posterior mean is heavily influenced by a few extreme values near -1 and 1). While the posterior is useful for assessing the presence of publication bias, it is not as informative as the measure in quantifying how much the posterior changes once we have corrected publication bias.

Figure 4: Results from our meta-analyses in Sections 7.1 and 7.2. In the top-left panel, we plot the posterior distribution for the meta-analysis on the risk of developing lung cancer from second-hand smoke. In the top-right panel, we plot the posterior distributions for the log-odds ratio of developing lung cancer, (solid line) and (dashed line). In the bottom-left panel, we plot the posterior distribution for the meta-analysis on antidepressents. In the bottom-right panel, we plot the posterior distributions for the mean improvement in depression symptoms, (solid line) vs. (dashed line).

Next, we estimated . In the top-right panel of Figure 4, we plot the posterior distribution for (solid line) against the posterior for (dashed line). We computed a measure of , which suggests a moderate magnitude of publication bias. In order to compare our results to those of [14], we computed the odds ratio as , where is the posterior mean for . For uncertainty quantification, we took the 95% posterior credible interval , where is the 95% equal-tailed posterior credible interval for . The RBC model gave a posterior mean OR of 1.19, with a 95% credible interval of (1.03, 1.35). In short, our analysis suggests that married, non-smoker women who were exposed to second-hand smoke by their husbands still had a significant risk of developing lung cancer, albeit a slightly lower risk than previously concluded (i.e. about 19% more likely, as opposed to 24% more likely [14]).

7.2 The Efficacy of Antidepressants

Although antidepressents are among the world’s most widely prescribed drugs, there has been considerable controversy about their effectiveness. In 2008, Turner et al. [41] presented a comparison of effectiveness data on depressants published in journals with the corresponding results from trials submitted to the Food and Drug Administration (FDA) between 1987 and 2004 for licensing. Turner et al. [41] found evidence of bias towards results favoring active intervention. In particular, there were 73 studies with results as reported to the FDA (74 originally but two of them were subsequently combined), but only 50 (69%) of these studies were subsequently published.

We applied the RBC model to the meta-analysis of these 50 published studies. In these studies, the outcome is a quantitative measure for improvement in depression symptoms. Since studies reported their outcomes on different scales, effect sizes were all expressed as standardized mean differences by means of Hedges’ scores, accompanied by corresponding variances [30, 15]. This data set is available in the R package RobustBayesianCopas.

We first used the RBC model to perform inference about . In the bottom-left panel of Figure 4, we plot the posterior distribution for . We see that is highly concentrated on large values, with a posterior median of , which suggests very strong publication bias.

Next, we considered estimates for under the RBC model, compared to those obtained from a standard meta-analysis (1.1). Under the standard model, we estimated the MLE with a 95% confidence interval of . The posterior mean effectiveness under the RBC model, on the other hand, was only with a 95% posterior credible interval of . Our results suggest that the mean improvement in depression symptoms from antidepressents may be lower than previously reported.

In the bottom-right panel of Figure 4, we plot (solid line) against (dashed line). This plot shows that once we have corrected for the publication bias with the RBC model, we obtain significantly lower estimates of with greater uncertainty and very little overlap with the non-bias-corrected posterior. We computed an measure of , indicating a very high magnitude of publication bias. The contrast between the two plots on the right in Figure 4 shows that is a very useful measure for quantifying uncertainty bias.

7.3 The Prevalence of Publication Bias

Though the RBC model has shown promising performance in simulation studies and the meta-analyses done in Sections 7.1 and 7.2, one may be interested in assessing how prevalent the issue of publication bias is across multiple meta-analyses. In [17], Cochrane’s measure was developed as a measure of consistency of the results of studies in meta-analyses. Higgins et al. [17] evaluated the performance of on 509 meta-analyses of dichotomous outcomes in the Cochrane Database of Systematic Reviews.

Inspired by this, we computed the measure for the population treatment effect for a set of 1500 randomly selected meta-analyses of dichotomous outcomes from the Cochrane Database of Systematic Reviews where each meta-analysis contained at least eight studies. The sample sizes of these meta-analyses varied from to studies. In Figure 5, we plot the empirical histogram for the measure. We found that 984 (65.6%) of these meta-analyses had negligible publication bias (), 411 (27.4%) had moderate publication bias (), 91 (6.1%) had high publication bias (, and 14 (0.9%) had very high publication bias ().

Figure 5: The empirical histogram of the measure evaluated for 1500 meta-analyses of dichotomous outcomes in the Cochrane Database of Systematic Reviews. 984 (65.6%) of the meta-analyses had negligible publication bias, 411 (27.4%) had moderate publication bias, 91 (6.1%) had high publication bias, and 14 (0.9%) had very high publication bias.

8 Discussion

In this paper, we have introduced the robust Bayesian Copas (RBC) selection model for correcting publication bias in meta-analysis. Our method combines robust Cauchy priors on the heterogeneity with a set of default, noninformative priors on all the unknown parameters in the Copas selection model (2.1). This affords us greater modeling flexibility, avoids the need for problem-specific tuning of hyperparameters, and facilitates nonasymptotic inference. We also introduced the measure (4.2) for quantifying the magnitude of publication bias. Specifically, quantifies the amount of dissimilarity between a standard random effects meta-analysis (1.1) and a meta-analysis done with the Copas selection model (2.1). We illustrated that our method performs very well in a variety of simulation and real data settings. We have provided an R package RobustBayesianCopas to implement out method.

In this paper, we have focused only on univariate meta-analysis. However, there is an increasing need to explore multivariate and network meta-analysis methods, which simultaneously analyze multiple treatments or comparisons between different treatments. See e.g. [35, 22] for a review of different motivating applications and methods. For multivariate and network meta-analyses, the presence of publication bias will also lead to biased estimates of the treatment effects or differences between treatment effects. We are currently actively working to extend the RBC model to multivariate and network settings.


The bulk of this work was completed when the first author was a postdoc at the Perelman School of Medicine, University of Pennsylvania, under the supervision of the last two authors. Dr. Ray Bai and Dr. Mary Boland were funded in part by generous funding from the Perelman School of Medicine, University of Pennsylvania. Dr. Ray Bai and Dr. Yong Chen were funded by NIH grants 1R01AI130460 and 1R01LM012607.


  • Begg and Mazumdar [1994] Begg, C. B. and Mazumdar, M. (1994). Operating characteristics of a rank correlation test for publication bias. Biometrics, 50(4):1088–1101.
  • Carpenter et al. [2009] Carpenter, J. R., Schwarzer, G., Rücker, G., and Künstler, R. (2009). Empirical evaluation showed that the copas selection model provided a useful summary in 80% of meta-analyses. Journal of Clinical Epidemiology, 62(6):624 – 631.
  • Copas [1999] Copas, J. (1999). What works?: Selectivity models and meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society), 162(1):95–109.
  • Copas and Shi [2000] Copas, J. and Shi, J. Q. (2000). Meta-analysis, funnel plots and sensitivity analysis. Biostatistics, 1(3):247–262.
  • Copas and Li [1997] Copas, J. B. and Li, H. G. (1997). Inference for non-random samples. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(1):55–95.
  • Copas and Shi [2001] Copas, J. B. and Shi, J. Q. (2001). A sensitivity analysis for publication bias in systematic reviews. Statistical Methods in Medical Research, 10(4):251–265.
  • Duan et al. [2019] Duan, R., Piao, J., Marks-Anglin, A., Chu, H., Ning, J., and Chen, Y. (2019). Testing for publication bias in meta-analysis under Copas selection model. Biometrics (under revision).
  • Duval and Tweedie [2000] Duval, S. and Tweedie, R. (2000). A nonparametric “trim and fill” method of accounting for publication bias in meta-analysis. Journal of the American Statistical Association, 95(449):89–98.
  • Egger et al. [1997] Egger, M., Davey Smith, G., Schneider, M., and Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. BMJ, 315(7109):629–634.
  • Fúquene et al. [2009] Fúquene, J. A., Cook, J. D., and Pericchi, L. R. (2009). A case for robust Bayesian priors with applications to clinical trials. Bayesian Analysis, 4(4):817–846.
  • Gelman et al. [2008] Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4):1360–1383.
  • Ghosh et al. [2018] Ghosh, J., Li, Y., and Mitra, R. (2018).

    On the use of Cauchy prior distributions for Bayesian logistic regression.

    Bayesian Analysis, 13(2):359–383.
  • Givens et al. [1997] Givens, G. H., Smith, D. D., and Tweedie, R. L. (1997). Publication bias in meta-analysis: a Bayesian data-augmentation approach to account for issues exemplified in the passive smoking debate. Statistical Science, 12(4):221–250.
  • Hackshaw et al. [1997] Hackshaw, A. K., Law, M. R., and Wald, N. J. (1997). The accumulated evidence on lung cancer and environmental tobacco smoke. BMJ, 315(7114):980–988.
  • Hedges [1982] Hedges, L. V. (1982). Estimation of effect size from a series of independent experiments. Psychological Bulletin, 92(2):490–499.
  • Hedges [1992] Hedges, L. V. (1992). Modeling publication selection effects in meta-analysis. Statistical Science, 7(2):246–255.
  • Higgins et al. [2003] Higgins, J. P. T., Thompson, S. G., Deeks, J. J., and Altman, D. G. (2003). Measuring inconsistency in meta-analyses. BMJ, 327(7414):557–560.
  • Higgins et al. [2009] Higgins, J. P. T., Thompson, S. G., and Spiegelhalter, D. J. (2009). A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(1):137–159.
  • Higgins et al. [2001] Higgins, J. P. T., Whitehead, A., Turner, R. M., Omar, R. Z., and Thompson, S. G. (2001). Meta-analysis of continuous outcome data from individual patients. Statistics in Medicine, 20(15):2219–2241.
  • Hopewell et al. [2009] Hopewell, S., Loudon, K., Clarke, M. J., Oxman, A. D., and Dickersin, K. (2009). Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database of Systematic Reviews, John Wiley & Sons, Ltd.
  • Jackson [2007] Jackson, D. (2007). Assessing the implications of publication bias for two popular estimates of between-study variance in meta-analysis. Biometrics, 63(1):187–193.
  • Jackson et al. [2011] Jackson, D., Riley, R., and White, I. R. (2011). Multivariate meta-analysis: Potential and promise. Statistics in Medicine, 30(20):2481–2498.
  • Jackson and White [2018] Jackson, D. and White, I. R. (2018). When should meta-analysis avoid making hidden normality assumptions? Biometrical Journal, 60(6):1040–1058.
  • Kontopantelis et al. [2013] Kontopantelis, E., Springate, D. A., and Reeves, D. (2013). A re-analysis of the Cochrane Library data: The dangers of unobserved heterogeneity in meta-analyses. PLOS ONE, 8(7):1–14.
  • Lin and Chu [2018] Lin, L. and Chu, H. (2018). Quantifying publication bias in meta-analysis. Biometrics, 74(3):785–794.
  • Lin et al. [2020] Lin, L., Shi, L., Chu, H., and Murad, M. H. (2020). The magnitude of small-study effects in the Cochrane Database of Systematic Reviews: an empirical study of nearly 30 000 meta-analyses. BMJ Evidence-Based Medicine, 25(1):27–32.
  • Macaskill et al. [2001] Macaskill, P., Walter, S. D., and Irwig, L. (2001). A comparison of methods to detect publication bias in meta-analysis. Statistics in Medicine, 20(4):641–654.
  • Mathes and Kuss [2018] Mathes, T. and Kuss, O. (2018). A comparison of methods for meta-analysis of a small number of studies with binary outcomes. Research Synthesis Methods, 9(3):366–381.
  • Mavridis et al. [2013] Mavridis, D., Sutton, A., Cipriani, A., and Salanti, G. (2013). A fully Bayesian application of the copas selection model for publication bias extended to network meta-analysis. Statistics in Medicine, 32(1):51–66.
  • Moreno et al. [2011] Moreno, S. G., Sutton, A. J., Ades, A., Cooper, N. J., and Abrams, K. R. (2011). Adjusting for publication biases across similar interventions performed well when compared with gold standard data. Journal of Clinical Epidemiology, 64(11):1230–1241.
  • Murad et al. [2018] Murad, M. H., Chu, H., Lin, L., and Wang, Z. (2018). The effect of publication bias magnitude and direction on the certainty in evidence. BMJ Evidence-Based Medicine, 23(3):84–86.
  • Negeri and Beyene [2020] Negeri, Z. F. and Beyene, J. (2020). Skew-normal random-effects model for meta-analysis of diagnostic test accuracy (DTA) studies. Biometrical Journal (to appear).
  • Ning et al. [2017] Ning, J., Chen, Y., and Piao, J. (2017). Maximum likelihood estimation and EM algorithm of Copas-like selection model for publication bias correction. Biostatistics, 18(3):495–504.
  • Riley et al. [2011] Riley, R. D., Higgins, J. P. T., and Deeks, J. J. (2011). Interpretation of random effects meta-analyses. BMJ, 342:d549.
  • Riley et al. [2017] Riley, R. D., Jackson, D., Salanti, G., Burke, D. L., Price, M., Kirkham, J., and White, I. R. (2017). Multivariate and network meta-analysis of multiple outcomes and multiple treatments: rationale, concepts, and examples. BMJ, 358:j3932.
  • Rufibach [2011] Rufibach, K. (2011). Selection models with monotone weight functions in meta analysis. Biometrical Journal, 53(4):689–704.
  • Schwarzer et al. [2010] Schwarzer, G., Carpenter, J., and Rücker, G. (2010). Empirical evaluation suggests copas selection model preferable to trim-and-fill method for selection bias in meta-analysis. Journal of Clinical Epidemiology, 63(3):282–288.
  • Sterne et al. [2000] Sterne, J. A., Gavaghan, D., and Egger, M. (2000). Publication and related bias in meta-analysis: Power of statistical tests and prevalence in the literature. Journal of Clinical Epidemiology, 53(11):1119–1129.
  • Sterne et al. [2001] Sterne, J. A. C., Egger, M., and Smith, G. D. (2001). Investigating and dealing with publication and other biases in meta-analysis. BMJ, 323(7304):101–105.
  • Sterne et al. [2011] Sterne, J. A. C., Sutton, A. J., Ioannidis, J. P. A., Terrin, N., Jones, D. R., Lau, J., Carpenter, J., Rücker, G., Harbord, R. M., Schmid, C. H., Tetzlaff, J., Deeks, J. J., Peters, J., Macaskill, P., Schwarzer, G., Duval, S., Altman, D. G., Moher, D., and Higgins, J. P. T. (2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ, 343:d4002.
  • Turner et al. [2008] Turner, E. H., Matthews, A. M., Linardatos, E., Tell, R. A., and Rosenthal, R. (2008). Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine, 358(3):252–260.
  • Turner et al. [2012] Turner, R. M., Davey, J., Clarke, M. J., Thompson, S. G., and Higgins, J. P. (2012). Predicting the extent of heterogeneity in meta-analysis, using empirical data from the Cochrane Database of Systematic Reviews. International Journal of Epidemiology, 41(3):818–827.
  • Wang and Lee [2019] Wang, C.-C. and Lee, W.-C. (2019). Evaluation of the normality assumption in meta-analyses. American Journal of Epidemiology (to appear).
  • Whittington et al. [2004] Whittington, C. J., Kendall, T., Fonagy, P., Couttrell, D., Cotgrove, A., and Boddington, E. (2004). Selective serotonin reuptake inhibitors in childhood depression: systematic review of published versus unpublished data. Lancet, 363:1341–1345.

Appendix A Additional Computational Details for the RBC Model

Under the reparametrization in model (2.1) and using the approximation of the truncated bivariate normal density (5.1) by [29] to sample , our (approximate) model is:


Since the Cauchy distribution belongs to the location-scale family, the induced prior on under (3.3) is . That is, is a generalized Cauchy distribution with mean and scaling parameter . The density of is . Noting that can be rewritten as a scale-mixture density i.e. , our induced prior hierarchy under the RBC model (3.1)-(3.6) is:


From (A.1)-(A.2), it is clear that the parameters in our Gibbs sampling algorithm can be updated in closed form. In particular, the full conditional for each is a truncated normal density, the full conditionals for , and are normal densities, and the full conditionals for and are inverse-gamma densities. To update at each iteration, we can use either slice sampling or Metropolis-Hastings with a proposal density bounded between , e.g. a rescaled beta density whose mode is the previous MCMC draw for .

To approximate the posterior , note that when , the ’s are independent of the ’s in (A.1), and thus, the model (A.1) reduces to


As before, for all . Using the scale-mixture representation of the generalized Cauchy distribution, we now only need to place priors on as follows:


From (A.3)-(A.4), all the parameters can be updated in closed form, conditional on the others, in the Gibbs sampler.

Appendix B Empirical Study for the Measure

We use an empirical study to determine whether or not our guidelines for interpreting the measure (4.2) given in Section 4.2 of the main manuscript are reasonable. To summarize briefly, we randomly simulated 2000 meta-analyses, each with different sample sizes , different values for , and different distributions for the heterogeneity . To ensure that we covered a wide range of values for , we simulated 100 observations for in each of the 20 intervals .

Figure 6: Plot of the measures against the true for 2000 simulated meta-analyses.

In Figure 6, we plot against for our empirical study. Figure 6 confirms that larger values of are correlated with larger values of , whereas values of close to zero correspond to smaller values of . For values of , is generally less than 0.25. For values of , most values of are less than 0.5.

We used the below steps to generate the 2000 meta-analyses for our empirical study:

  1. Generate 100 values of in each of the intervals , , .

  2. For each of the 2000 ’s:

    1. Sample the meta-analysis size uniformly from and generate the within-study standard errors , from .

    2. Randomly generate , , , .

    3. Randomly choose one of the four distributions for the heterogeneity used in the experiments in Section 6.1 of the main manuscript and depicted in Figure 7.

    4. Generate the observed treatment effects according to the Copas selection model, with the given and choice of distribution for .

    5. Compute the posteriors and and the measure.

Figure 7: Plots of the four distributions for the heterogeneity considered in our experiments.

We caution that while we expect for to generally follows the v-shape given in Figure 6 (i.e. increases as increases), the actual magnitude of shows considerable variability in each of the intervals , , , . This is likely because also depends on things such as the magnitudes of or the magnitude and direction of the effect sizes . Therefore, we reiterate that our recommendations for interpreting in Section 4.2 should be used as rough guidelines, and it is best for to be interpreted within the context of the problem being studied. Nevertheless, our empirical results and Figure 6 offer some empirical evidence that our chosen guidelines for determine negligible vs. moderate vs. strong publication bias are reasonable.

To determine the guidelines for interpreting given in Section 4.2 of the main manuscript, we looked at the empirical histograms for for and , given in Figure 8. When , we see that is usual less than 0.25. When , we see that is usually less than 0.5. Based on these empirical histograms, we recommend using the following guidelines: means negligible publication bias, means moderate publication bias, and means high publication bias.

Figure 8: The empirical histograms for the measure when in the left panel, and when in the right panel.