Meta-analysis is the quantitative combination of results from different studies (lipsey2001practical)
. The purpose of a meta-analysis is to pool estimates across studies in order to reduce sampling error. Publication bias occurs when the published research literature is not representative of the all the research that has actually been done(rothstein2006publication). An important case of publication bias is when publication decisions are made based on p-values (sterling1959publication). Such p-value based publication bias (and its sister phenomenon, p-hacking (simmons2011false)) can cause seriously biased conclusions when they are not accounted for (simmons2011false; moss2019modelling).
One of the most important models for meta-analysis is the normal random effects model with normal likelihoods where the standard deviations are assumed to be known(hedges1998fixed). The likelihood for an observation in this model is , where is the normal density with mean and standard deviation . The parameter is the standard deviation of the random effects distribution, and is known as the heterogeneity parameter, while
are the study specific standard errors andis the mean of the effect size distribution. A modification of the random effects meta-analysis model essentially due to hedges1984estimation allows us to account for p-value based publication bias.
Let be the standard one-sided p
-value for the null hypothesis, and
a probability for each. The selection function meta-analysis model (hedges1984estimation; hedges1992modeling) based on the p-value is
The model is also known as the weighting function publication bias model. Here is an interpretation of it:
Alice is an editor who receives a study with p-value . Her publication decision is a random function of this p-value. That is, she will publish the study with some probability . Every study you will ever read in Alice’s journal has survived this selection mechanism, the rest are lost forever.
This is essentially describes a rejection sampling (von1951various; flury1990acceptance) procedure. The publication probability function is likely to be well approximated by a step function, with cutoffs such as and being especially important. To model this, let
be a vector withand be a non-negative vector in with . Now define . This is a step function where the value of on each step is the probability of acceptance for a study with a p-value falling inside the interval . I will call the density proportional to the step function publication bias model and denote it .
The step function publication bias model was first used by hedges1984estimation, who used an
distribution instead of a the normal distribution and a single step in. iyengar1988selection studied other choices of while citkowicz2017parsimonious used a beta density publication probability function instead of a step function. hedges1992modeling is an accessible paper about the model.
Frequentist estimation of the step function publication bias model is problematic, as noted by for instance mcshane2016adjusting. The purpose of this note is to formalize and prove exactly how frequentist estimatation of this model can be. I do this by proving there are no confidence sets of guaranteed finite diameter for the mean parameter and the heterogeneity parameter for any coverage . This is a problematic result for two reasons: (i) It would be hopeless to report confidence sets for like , as they have no practical value. (2) It shows that the automatic confidence sets procedures that are guaranteed to yield finite confidence sets of some positive nominal coverage, such as bootstrapped confidence sets, likelihood-ratio based confidence sets, and subsampling confidence sets never have true coverage greater than (see gleser996bootstrap).
The main result of the paper is
Let be the density of a independent observations from a step function publication bias model. Then has no almost surely finite diameter confidence set for or of non-zero coverage.
2 Definitions and Proofs
Let be a family of dominated probability measures with densities . Let for some set equipped with a positive function . Here maps to the parameter we wish to form a confidence set for. The function is a distance measure and could be a norm if is a vector space. Below it is the absolute value . A random set is a confidence set for with coverage probability if for all . An acceptance set with level has the property than . There is a well known duality between confidence sets and acceptance sets: For each confidence set there is a collection of acceptance sets satisfying . The classical reference for these concepts is (lehmann2006testing, chapter 3.5).
We will need terminology for the diameter of the confidence set according to the function
. The diameter is a random variable,. A confidence set has infinite diameter at with positive probability if . If for all , we will say that has infinite diameter with positive probability.
The main ingredient in the proof of theorem 1 is theorem 3, an extension and reformulation of the Gleser-Hwang theorem (gleser1987nonexistence), a result they used to prove the non-existence of confidence sets with guaranteed finite diameter for models such as the errors-in-variables regression model and Fieller’s ratio of means problem. berger1999integrated is a similar extension of the Gleser-Hwang theorem.
The problem with non-existence of confidence sets with guaranteed finite diameter for finite-dimensional parameters has not received much attention in the statistical literature, as lamented by gleser996bootstrap. Some references in this area are bahadur1956nonexistence, who studied infinitely large confidence sets in non-parametric estimation of the mean, romano2004non’s (2004) extension of their results, Donoho1988-hg and Pfanzagl1998-fe.
Let be a confidence set and its associated acceptance sets. If there exists a sequence with such that then the diameter of at is infinite with positive probability.
Assume without loss of generality that for each ; such a sequence can always be found by appropriately filtering the original . Since is a decreasing sequence of sets, for each . From it follows that , thus . ∎
This is the extension of the Gleser-Hwang theorem.
If there exists a sequence with such that pointwise and for each , then every confidence set with coverage has infinite diameter with positive probability.
Assume without loss of generality that . By a variant of the dominated convergence theorem (billingsley1995probability, exercise 16.4a),. Since , for all , thus . The result follows from . ∎
This an an extension of theorem 3 to mixture distributions.
Let be a finite or infinite sequence of densities, a countable probability vector and a mixture distribution. Assume there is a sequence with such that pointwise and for each . If there is a parameter satisfying , the mixture admits no almost surely finite diameter confidence set of coverage for .
Let be confidence set for with coverage . Since by assumption, must include a confidence for of some positive coverage. But by theorem 3, has infinite diameter with positive probability for all . Since , has infinite diameter with positive probability too. ∎
The following lemma is a proof of proposition 1 for the special case when no non-significant studies are published. Let denote the density of a normal variable with mean and standard deviation truncated to .
Let be a normal density truncated to with underlying mean and standard deviation . Then converges pointwise to , the density of a shifted exponential.
The formula for is . The normal density part equals
When is large compared to and , is negligible, hence
which equals . Since as grows (see e.g. borjesson1979simple), we end up with . ∎
This lemma gives a mixture representation of the publication bias model. The proof is omitted.
The density of an observation from the step function publication bias model with parameters is
Proof of theorem 1.
Using the representation 2.1 of lemma 6 and 4 it is enough to show the result for where . To this end, let . Then and does the trick by lemma 5. For the case with more than one observation, observe that
is mixture model. (Just expand it.) The component belonging to the mixture probability fulfills the demands of proposition 4 for when , as its density is a product of densities converging to shifted exponentials. ∎
I am grateful to Riccardo De Bin and Emil Aas Stoltenberg for reading through the manuscript and giving helpful comments.