Exploring Consequences of Simulation Design for Apparent Performance of Statistical Methods. 2: Results from simulations with normally and uniformly distributed sample sizes

07/07/2020 ∙ by Elena Kulinskaya, et al. ∙ 0

This report continues our investigation of effects a simulation design may have on the conclusions on performance of statistical methods. In the context of meta-analysis of log-odds-ratios, we consider five generation mechanisms for control probabilities and log-odds-ratios. Our first report (Kulinskaya et al. 2020) considered constant sample sizes. Here we report on the results for normally and uniformly distributed sample sizes.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Our interest lies in effects that simulation design choices may have on conclusions on the comparative merits of various methods, taking as an example meta-analysis of odds ratios. The basic data from studies involve binomial variables, for and or (for the Control or Treatment arm); those data underlie the odds-ratios for the meta-analysis.

A design specifies the number of studies, ; the sample sizes, ; the nuisance parameters (control-arm probabilities,

, or, equivalently, their logits,

); the overall log-odds-ratio,

; and the between-study variance,

. For each situation the simulation uses replications, where is typically large, say 10,000.

For simplicity, we consider equal arm-level sample sizes, . The control probabilities or their logits can be constant or generated from some distribution. Normal and uniform distributions are the typical choices. As in our previous report arXiv_LOR_simulation_equal_sample_sizes, we consider five possible generation mechanisms for control-arm probabilities and log-odds-ratios under the random-effects model of meta-analysis.

We consider two fixed-intercept random-effects models (FIM1 and FIM2) and two random-intercept random-effects models (RIM1 and RIM2), as in bakbergenuly2018GLMM. These models are equivalent to Models 2 and 4 (for FIM) and Models 3 and 5 (for RIM), respectively, of jackson2018comparison. Briefly, the FIMs include fixed control-arm effects (log-odds of the control-arm probabilities), and the RIMs replace these fixed effects with random effects. We also consider a model with uniformly distributed control-arm probabilities (URIM1).

Studies also vary in how they specify the sample sizes . In our previous report (arXiv_LOR_simulation_equal_sample_sizes) we set in all replications. Here we investigate the use of normal and uniform distributions to generate a new set of in each replication.

2 Generation of sample sizes

Several authors Cheng2016; bakbergenuly2018GLMM

use constant study-level sample sizes, either equal or unequal, in all replications. More often, however, authors generate sample sizes from a uniform or normal distribution.

jackson2018comparison use (mostly with ) sample sizes from discrete . Langan_2018_RSM_1316 use either constant and equal sample sizes within and across studies, or sample sizes from and ; sidik2007 use ; and AboZaid2013 use and . viechtbauer2007confidence generates study-level sample sizes () from ( is the variance) with . In an extensive simulation study for sparse data, kuss2015statistical uses FIM1 and the corresponding model with , along with a large number of fitting methods; he generates both the number of studies and their sample sizes

from log-normal distributions: LN(0.65, 1.2) and LN(3.05, 0.97) for

and LN(4.615, 1.1) for sample sizes.

In general, if mutually independent random variables

have a common distribution , and is independent of the , the sum has a compound distribution compound

. A binomial distribution with a random number of trials is a compound Bernoulli distribution. The first two moments of such a distribution are

and . This variance is larger than the variance of the distribution. Therefore, random generation of sample sizes produces an overdispersed Binomial (compound Bernoulli) distribution for the control arm, and may also inflate, though in a more complicated way, the variance in the treatment arm.

In particular, when , the compound Bernoulli distribution has variance . And when , .

3 Variances of estimated log-odds-ratios for random sample sizes

The (conditional, given and

) variance of the estimated log-odds-ratio

, derived by the delta method, is


estimated by substituting for . (We follow the particular method’s procedure for calculating .)

Under the binomial-normal random-effects model (REM), the true study-level effects, , follow a normal distribution: .

To calculate the variance of when sample sizes are random, we use the law of total variance:

The second term is , and the first term is obtained by substituting and in an expression for the variance of under fixed sample sizes.

For a random sample size , using the delta method,


where CV

is the coefficient of variation (i.e., the ratio of the standard deviation of

to its mean). Therefore, to order , random generation of sample sizes inflates the variance of if and only if the coefficient of variation of the distribution of sample sizes is of order . In the simulations of viechtbauer2007confidence, where , , so the variance is not inflated. In contrast, generating sample sizes from would result in and would inflate variance. (Use of such a combination of mean and variance, however, is unlikely to produce realistic sets of sample sizes, and the probability of generating a negative sample size exceeds 2%.)

The variance of a uniform distribution on an interval of width centered at is , and its CV is . Therefore, is of order 1 whenever the width of the interval is of the same order as its center. Hence, variance is inflated in simulations by jackson2018comparison, Langan_2018_RSM_1316, sidik2007, and AboZaid2013, who all use wide intervals for .

4 Design of the simulations for randomly distributed sample sizes

Our simulations keep the arm-level sample sizes equal and the control-arm probabilities and the log-odds-ratios independent. Table 1 shows the components of the simulations for normally and uniformly distributed sample sizes: parameters, data-generation mechanisms, and estimation targets. Our first report arXiv_LOR_simulation_equal_sample_sizes provides more details. We included the DerSimonian-Laird (DL), restricted maximum-likelihood (REML), Mandel-Paule (MP), and Kulinskaya-Dollinger (KD) estimators of with corresponding inverse-variance-weighted estimators of

and confidence intervals with critical values from the normal distribution.

Bakbergenuly2020 studied those inverse-variance-weighted estimators in detail. We also included the SSW point estimator of , whose weights depend only on the studies’ arm-level sample sizes, and a corresponding confidence interval, which uses as the midpoint, in the estimate of its variance, and critical values from the distribution on degrees of freedom. Among the estimators, FIM2 and RIM2 denote the estimators in the corresponding GLMMs.

We generated the arm-level sample sizes, , from a normal or a uniform distribution centered at 40, 100, 250, and 1000.

In generating sample sizes from a normal distribution, we want negative sample sizes to have reasonably small probability. For our choice of this probability is . Unfortunately, we were still getting a small number of values below zero out of thousands of simulated values, so we additionally truncate the values generated from a normal distribution at 10. Truncation happens with probability .

To make uniform distributions of sample sizes comparable to the normal distributions, we centered them at the same value, , and equated their variances. If a normal distribution has variance , a uniform distribution with the same variance has interval width . We set , resulting in and a squared CV of . Therefore, by Equation (3.2), our simulations with random inflate variances and covariances by in comparison with simulations with constant . Wider intervals of would inflate variances more, but in generating sample sizes from a corresponding normal distribution, we wanted negative sample sizes to have reasonably small probability. For our choice of this probability is .

Parameter Values
5, 10, 30
40, 100, 250, 1000
0, 0.5, 1, 1.5, 2
.1, .4
0.1, 0.4
Generation of
Normal(, )
Generation of and
FIM1 Fixed intercept models:
RIM1 Random intercept models:
Estimation targets Estimators
bias in estimating DL, REML, MP, KD, FIM2. RIM2
bias in estimating DL, REML, MP, KD, FIM2, RIM2, SSW
coverage of DL, REML, MP, KD, FIM2, RIM2,
SSW (with and critical values)
Table 1: Components of the simulations for log-odds-ratio

5 Summary of the results

Our simulations explored two main components of design: the data-generation mechanism and the distribution of study-level sample sizes. Results of our simulations with normally distributed sample sizes are provided in Appendix A, and those with uniformly distributed sample sizes in Appendix B.

The five data-generation mechanisms (FIM1, FIM2, RIM1, RIM2, and URIM1) often produced different results for at least one of the measures of performance (bias of estimators of , bias of estimators of , and coverage of confidence intervals for ). In the most frequent pattern FIM2 and RIM2 yield similar results, and FIM1, RIM1, and URIM1 also yield results that are similar but different from those of FIM2 and RIM2. In some situations URIM1 stands apart.

We also expected the coverage of to suffer because random sample sizes increase the variance of generated log-odds-ratios. However, generation of sample sizes from normal and uniform distributions had essentially no impact, as can be seen by comparing the results from this report with those from our report arXiv_LOR_simulation_equal_sample_sizes on the simulations with constant sample sizes. The explanation may lie in our choice of variance (not large enough) for the normal and uniform distributions of the sample sizes, causing an increase of just 10% in the variance of LORs, or in the rather low coverage, even under constant sample sizes, resulting from considerable biases of estimators of .