Our interest lies in effects that simulation design choices may have on conclusions on the comparative merits of various methods, taking as an example meta-analysis of odds ratios. The basic data from studies involve binomial variables, for and or (for the Control or Treatment arm); those data underlie the odds-ratios for the meta-analysis.
A design specifies the number of studies, ; the sample sizes, ; the nuisance parameters (control-arm probabilities,
, or, equivalently, their logits,); the overall log-odds-ratio,
; and the between-study variance,. For each situation the simulation uses replications, where is typically large, say 10,000.
For simplicity, we consider equal arm-level sample sizes, . The control probabilities or their logits can be constant or generated from some distribution. Normal and uniform distributions are the typical choices. As in our previous report arXiv_LOR_simulation_equal_sample_sizes, we consider five possible generation mechanisms for control-arm probabilities and log-odds-ratios under the random-effects model of meta-analysis.
We consider two fixed-intercept random-effects models (FIM1 and FIM2) and two random-intercept random-effects models (RIM1 and RIM2), as in bakbergenuly2018GLMM. These models are equivalent to Models 2 and 4 (for FIM) and Models 3 and 5 (for RIM), respectively, of jackson2018comparison. Briefly, the FIMs include fixed control-arm effects (log-odds of the control-arm probabilities), and the RIMs replace these fixed effects with random effects. We also consider a model with uniformly distributed control-arm probabilities (URIM1).
Studies also vary in how they specify the sample sizes . In our previous report (arXiv_LOR_simulation_equal_sample_sizes) we set in all replications. Here we investigate the use of normal and uniform distributions to generate a new set of in each replication.
2 Generation of sample sizes
Several authors Cheng2016; bakbergenuly2018GLMM
use constant study-level sample sizes, either equal or unequal, in all replications. More often, however, authors generate sample sizes from a uniform or normal distribution.jackson2018comparison use (mostly with ) sample sizes from discrete . Langan_2018_RSM_1316 use either constant and equal sample sizes within and across studies, or sample sizes from and ; sidik2007 use ; and AboZaid2013 use and . viechtbauer2007confidence generates study-level sample sizes () from ( is the variance) with . In an extensive simulation study for sparse data, kuss2015statistical uses FIM1 and the corresponding model with , along with a large number of fitting methods; he generates both the number of studies and their sample sizes
from log-normal distributions: LN(0.65, 1.2) and LN(3.05, 0.97) forand LN(4.615, 1.1) for sample sizes.
In general, if mutually independent random variableshave a common distribution , and is independent of the , the sum has a compound distribution compound and . This variance is larger than the variance of the distribution. Therefore, random generation of sample sizes produces an overdispersed Binomial (compound Bernoulli) distribution for the control arm, and may also inflate, though in a more complicated way, the variance in the treatment arm.
In particular, when , the compound Bernoulli distribution has variance . And when , .
3 Variances of estimated log-odds-ratios for random sample sizes
The (conditional, given and
) variance of the estimated log-odds-ratio, derived by the delta method, is
estimated by substituting for . (We follow the particular method’s procedure for calculating .)
Under the binomial-normal random-effects model (REM), the true study-level effects, , follow a normal distribution: .
To calculate the variance of when sample sizes are random, we use the law of total variance:
The second term is , and the first term is obtained by substituting and in an expression for the variance of under fixed sample sizes.
For a random sample size , using the delta method,
is the coefficient of variation (i.e., the ratio of the standard deviation ofto its mean). Therefore, to order , random generation of sample sizes inflates the variance of if and only if the coefficient of variation of the distribution of sample sizes is of order . In the simulations of viechtbauer2007confidence, where , , so the variance is not inflated. In contrast, generating sample sizes from would result in and would inflate variance. (Use of such a combination of mean and variance, however, is unlikely to produce realistic sets of sample sizes, and the probability of generating a negative sample size exceeds 2%.)
The variance of a uniform distribution on an interval of width centered at is , and its CV is . Therefore, is of order 1 whenever the width of the interval is of the same order as its center. Hence, variance is inflated in simulations by jackson2018comparison, Langan_2018_RSM_1316, sidik2007, and AboZaid2013, who all use wide intervals for .
4 Design of the simulations for randomly distributed sample sizes
Our simulations keep the arm-level sample sizes equal and the control-arm probabilities and the log-odds-ratios independent. Table 1 shows the components of the simulations for normally and uniformly distributed sample sizes: parameters, data-generation mechanisms, and estimation targets. Our first report arXiv_LOR_simulation_equal_sample_sizes provides more details. We included the DerSimonian-Laird (DL), restricted maximum-likelihood (REML), Mandel-Paule (MP), and Kulinskaya-Dollinger (KD) estimators of with corresponding inverse-variance-weighted estimators of
and confidence intervals with critical values from the normal distribution.Bakbergenuly2020 studied those inverse-variance-weighted estimators in detail. We also included the SSW point estimator of , whose weights depend only on the studies’ arm-level sample sizes, and a corresponding confidence interval, which uses as the midpoint, in the estimate of its variance, and critical values from the distribution on degrees of freedom. Among the estimators, FIM2 and RIM2 denote the estimators in the corresponding GLMMs.
We generated the arm-level sample sizes, , from a normal or a uniform distribution centered at 40, 100, 250, and 1000.
In generating sample sizes from a normal distribution, we want negative sample sizes to have reasonably small probability. For our choice of this probability is . Unfortunately, we were still getting a small number of values below zero out of thousands of simulated values, so we additionally truncate the values generated from a normal distribution at 10. Truncation happens with probability .
To make uniform distributions of sample sizes comparable to the normal distributions, we centered them at the same value, , and equated their variances. If a normal distribution has variance , a uniform distribution with the same variance has interval width . We set , resulting in and a squared CV of . Therefore, by Equation (3.2), our simulations with random inflate variances and covariances by in comparison with simulations with constant . Wider intervals of would inflate variances more, but in generating sample sizes from a corresponding normal distribution, we wanted negative sample sizes to have reasonably small probability. For our choice of this probability is .
|5, 10, 30|
|40, 100, 250, 1000|
|0, 0.5, 1, 1.5, 2|
|Generation of and|
|FIM1||Fixed intercept models:|
|RIM1||Random intercept models:|
|bias in estimating||DL, REML, MP, KD, FIM2. RIM2|
|bias in estimating||DL, REML, MP, KD, FIM2, RIM2, SSW|
|coverage of||DL, REML, MP, KD, FIM2, RIM2,|
|SSW (with and critical values)|
5 Summary of the results
Our simulations explored two main components of design: the data-generation mechanism and the distribution of study-level sample sizes. Results of our simulations with normally distributed sample sizes are provided in Appendix A, and those with uniformly distributed sample sizes in Appendix B.
The five data-generation mechanisms (FIM1, FIM2, RIM1, RIM2, and URIM1) often produced different results for at least one of the measures of performance (bias of estimators of , bias of estimators of , and coverage of confidence intervals for ). In the most frequent pattern FIM2 and RIM2 yield similar results, and FIM1, RIM1, and URIM1 also yield results that are similar but different from those of FIM2 and RIM2. In some situations URIM1 stands apart.
We also expected the coverage of to suffer because random sample sizes increase the variance of generated log-odds-ratios. However, generation of sample sizes from normal and uniform distributions had essentially no impact, as can be seen by comparing the results from this report with those from our report arXiv_LOR_simulation_equal_sample_sizes on the simulations with constant sample sizes.
The explanation may lie in our choice of variance (not large enough) for the normal and uniform distributions of the sample sizes, causing an increase of just 10% in the variance of LORs, or in the rather low coverage, even under constant sample sizes, resulting from considerable biases of estimators of .