# Two-sample aggregate data meta-analysis of medians

We consider the problem of meta-analyzing two-group studies that report the median of the outcome. Often, these studies are excluded from meta-analysis because there are no well-established statistical methods to pool the difference of medians. To include these studies in meta-analysis, several authors have recently proposed methods to estimate the sample mean and standard deviation from the median, sample size, and several commonly reported measures of spread. Researchers frequently apply these methods to estimate the difference of means and its variance for each primary study and pool the difference of means using inverse variance weighting. In this work, we develop several methods to directly meta-analyze the difference of medians. We conduct a simulation study evaluating the performance of the proposed median-based methods and the competing transformation-based methods. The simulation results show that the median-based methods outperform the transformation-based methods when meta-analyzing studies that report the median of the outcome, especially when the outcome is skewed. Moreover, we illustrate the various methods on a real-life data set.

Comments

There are no comments yet.

## Authors

• 5 publications
• 1 publication
• 6 publications
• 3 publications
03/25/2019

### Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis

Researchers increasingly use meta-analysis to synthesize the results of ...
01/29/2018

### Testing normality using the summary statistics with application to meta-analysis

As the most important tool to provide high-level evidence-based medicine...
10/01/2019

### Generalized inferential models for meta-analyses based on few studies

Meta-analysis based on only a few studies remains a challenging problem,...
04/05/2020

### ABCMETAapp: R Shiny Application for Simulation-based Estimation of Mean and Standard Deviation for Meta-analysis via Approximate Bayesian Computation (ABC)

Background and Objective: In meta-analysis based on continuous outcome, ...
10/12/2020

### Detecting the skewness of data from the sample size and the five-number summary

For clinical studies with continuous outcomes, when the data are potenti...
03/03/2020

### Optimally estimating the sample mean and standard deviation from the five-number summary

When reporting the results of clinical studies, some researchers may cho...
10/21/2020

### Simulations for a Q statistic with constant weights to assess heterogeneity in meta-analysis of mean difference

A variety of problems in random-effects meta-analysis arise from the con...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Meta-analysis is a statistical method for combining data from several scientific studies addressing a common research question. Researchers conduct meta-analyses to obtain more precise statistical estimates of the outcome of interest, investigate sources of heterogeneity, and generalize conclusions across studies. Over the past few decades, meta-analysis has gained considerable popularity in research synthesis and is widely considered to be the gold-standard of evidence for systematic review.

Researchers often meta-analyze two-group studies that evaluate the change in an outcome across two groups. One such study design includes clinical trials with a treatment and control group. Frequently, only summary measures of the outcome variable in each group are available from the primary studies, which typically include a point estimate of the outcome and its variance. Data analysts must then decide on the effect measure to use in the analysis, which quantifies the change of the outcome between the two groups. Standard meta-analytic methods estimate a pooled effect measure by a weighted mean of the study-specific effect measures, where the weight assigned to each study are inversely proportional to the variance of the point estimate. These methods are referred to as inverse variance approaches [4].

In this paper, we consider the case where the outcome variable is continuous and is measured on a scale that is meaningful and consistent across studies. In this context, two-group studies usually report the sample mean and standard deviation of the outcome in each group, and researchers typically use the difference of means as the effect measure of the primary studies. Then, researchers meta-analyze the difference of means in the two groups. Meta-analytic methods for pooling the difference of means are well-established in the literature [8].

However, when the distribution of the outcome variable is skewed, authors often report the sample median of the outcome to better represent the center of the data. In this case, to describe the spread of the data, authors frequently report the first and third quartiles, the minimum and maximum values, or both sets of measures (e.g., see

[21, 25]). Although the difference of medians is the most natural effect measure to use in this context for two-group studies, there are no available statistical methods to meta-analyze the difference of medians. The challenge of applying the inverse variance approach to pool the difference of medians is that the variance of the difference of medians is not reported by the primary studies. Furthermore, this quantity is difficult to estimate based on the summary measures provided by a primary study because it depends on the underlying distribution.

Consequently, studies that report the median of the outcome are either left out of meta-analyses (e.g., see [12]) or researchers transform the outcome so that well-established meta-analytic methods can be applied. More specifically, several authors have recently proposed methods estimate the sample mean and standard deviation from the median, sample size, and several commonly reported measures of spread [26, 15, 2, 18, 17, 19]. These methods are used to estimate the difference of means and its variance for each primary study that reports medians. Then, data analysts pool the estimated difference of means using the inverse variance method. Although these methods are widely used in practice, they suffer several limitations. First, in this analysis, one uses an estimate for the study-specific effect measure (i.e., the estimated difference of means) rather than one that can be directly calculated (i.e., the difference of medians), which introduces an additional source of error in the analysis. Moreover, the performance of these methods for estimating the sample mean and standard deviation has been shown to be heavily influenced by the skewness of the outcome distribution [18], which is likely to be highly skewed when authors report medians. Finally, regardless of the accuracy of these methods for estimating sample mean and standard deviation, sample medians may better represent the center of the data for skewed outcomes and therefore may be the preferred effect measure in meta-analysis of skewed continuous outcomes.

In this work, we develop methods to directly meta-analyze the difference of medians. We first consider several simple methods that are not based on the classical inverse variance approach. Specifically, we extend the methods proposed by McGrath et al [20] for meta-analyzing one-group studies that report the median to the two-sample context, and we consider applying linear quantile mixed models. In order to weight studies optimally, we develop methods to estimate the variance of the median in order to pool the difference of medians using the inverse variance approach.

We describe the existing the proposed methods to meta-analyze two-group studies that report medians in Section 2. In Section 3, we present the results of a simulation study evaluating the performance of the existing and proposed methods. In Section 4, we illustrate these approaches on a real-life data set. We conclude with a discussion in Section 5.

## 2 Methods

### 2.1 Standard Fixed Effect and Random Effects Models

We first describe the standard fixed effect and random effects models for the meta-analysis of continuous outcomes. For study where , let denote the estimated effect measure and let be its estimated variance. For instance, the effect measure may be the difference of means or difference of medians of the outcome in two arms of a trial. Note that although is estimated from the data, it is typically assumed to be a known quantity in meta-analysis. The fixed effect model posits that a single true effect measure, , underlies all studies. Under standard assumptions,

 yi=θ+ϵi,

where is random within-study variation. The pooled effect measure is then estimated by a weighted mean of ,

 ^θFE=k∑i=1yiwik∑i=1wi,ˆVar(^θFE)=1k∑i=1wi.

The weights for the estimates above are given by .

The random effects model, on the other hand, presumes that different studies have different true effect measures. The true effect measure of study

is sampled from a normal distribution with mean

and variance . That is,

 yi=θ+θi+ϵi,

where is random between-study variation, is random within-study variation, and and are independent. The pooled effect measure is estimated by

 ^θRE=k∑i=1yiw∗ik∑i=1w∗i,ˆVar(^θRE)=1k∑i=1w∗i.

The weights are given by where is an estimate of

. The method of moments estimator of DerSimonian and Laird

[9] is commonly used to estimate , which is given by

 ^τ2DL=max⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩0,Q−(k−1)k∑i=1wi−k∑i=1w2i/k∑i=1wi⎫⎪ ⎪ ⎪ ⎪ ⎪⎬⎪ ⎪ ⎪ ⎪ ⎪⎭,

where

 Q=k∑i=1wi(yi−^θFE)2.

Under the typical assumption that the variances are known, the fixed effect and random effects estimators are minimum variance unbiased estimators

[22].

### 2.2 Existing Methods: Transformation-Based Approaches

For group () in study , we use the following notation for summary measures that might be reported. Let be the number of subjects, be the median of the outcome, and be the first and third quartiles, respectively, and let and be the minimum and maximum values, respectively. Furthermore, let be the sample mean and be the sample standard deviation.

We consider the following sets of summary measures that studies may report. In scenario 1 (), a study reports the median, minimum and maximum values, and number of subjects. In scenario 2 (), a study reports the median, first and third quartiles, and number of subjects. Lastly, in scenario 3 (), a study reports the median, minimum and maximum values, first and third quartiles, and number of subjects.

We estimate the sample mean and standard deviation of the outcome in each group in each study. Then we estimate a fixed effect and random effects pooled difference of means and its 95% confidence interval (CI) from the estimated differences of means and their variances

[13]. For the random effects methods, we estimate using the method of DerSimonian and Laird [9]. We call these approaches transformation-based approaches. In the following subsections, we describe the methods for estimating the sample mean and standard deviation under scenarios , , and .

#### 2.2.1 Method of Wan et al

Wan et al [26] give an overview of methods for estimating the sample mean and standard deviation under scenarios , , and . Wan et al [26] recommend the following methods based on their simulation study. For , Wan et al [26] suggest using Hozo’s method [15] to estimate the sample mean and recommend using their proposed method to estimate standard deviation. These estimates are given by:

 ¯xij ≈{aij+2yij+bij4 if nij≤25yij if nij>25 sij ≈bij−aij2Φ−1(nij−0.375nij+0.25),

where

denotes the inverse of the standard normal cumulative distribution function. For

, Wan et al [26] develop the following methods to estimate the sample mean and standard deviation:

 ¯xij ≈q1,ij+yij+q3,ij3 sij ≈q3,ij−q1,ij2Φ−1(0.75nij−0.125nij+0.25).

Lastly, in , Wan et al [26] suggest using the method of Bland [2] to estimate the sample mean and recommend using their method to estimate the standard deviation, which are given by:

 ¯xij ≈aij+2q1,ij+2yij+2q3,ij+bij8 sij ≈bij−aij4Φ−1(nij−0.375nij+0.25)+q3,ij−q1,ij4Φ−1(0.75nij−0.125nij+0.25).

A brief outline of the derivation of these methods is given below. First, we consider estimating the sample mean in and . Hozo et al [15] and Bland [2] place upper and lower bounds on the ordered observations with the reported sample quantiles. By summing the inequalities and dividing by , one obtains bounds for the sample mean in and . Then, one estimates the sample mean by the average of the upper and lower bounds. Note that these methods make no distributional assumptions for the outcome variable.

To estimate the sample mean in , Wan et al [26] assume that the outcome distribution is normal. Wan et al [26] show that the mean of the normal distribution can be expressed in terms of the expected value of the median and first and third quartiles. Then, Wan et al [26] estimate the sample mean by replacing the expected value of the sample median and first and third quartiles with the observed sample values.

Similarly, Wan et al [26] estimate the sample standard deviation in , , and under the assumption that the outcome distribution is normal. Wan et al [26] show that the standard deviation of the normal distribution can be expressed in terms of the expected value of sample quantiles. One can then estimate the sample standard deviation by replacing the expected value of the sample quantiles with the observed sample quantiles.

#### 2.2.2 Method of Luo et al

We also consider the methods more recently recommended of Luo et al [19]. Under the assumption that the outcome variable is normally distributed, the methods of Luo et al [19] optimize the sample mean estimates recommended by Wan et al [26]. The formulas for estimating the sample mean from , , and are given below:

 ¯xij≈⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩(44+n0.75ij)aij+bij2+(n0.75ij4+n0.75ij)yijin S1(0.7+0.39nij)q1,ij+q3,ij2+(0.3−0.39nij)yijin S2(2.22.2+n0.75ij)aij+bij2+(0.7−0.72n0.55)q1,ij+q3,ij2+(0.3+0.72n0.55−2.22.2+n0.75)yijin S3

To estimate the standard deviation, Luo et al [19] recommend the methods previously described by Wan et al [26].

### 2.3 Proposed Methods

We propose several methods to directly meta-analyze the difference of medians when the primary studies present summary statistics of , , or . The target parameter in this case is the population difference of medians. These approaches, which we call median-based approaches, fall into two categories. In the first category, we consider simple approaches that do not apply the standard inverse variance method. The second category of approaches estimates the variance of the difference of medians and then pools the studies using the standard inverse variance method described in Section 2.1.

#### 2.3.1 Median of the Difference of Medians

McGrath et al [20] proposed methods for pooling one-group studies that report the median of the outcome. These methods use the median of the study-specific medians as the pooled estimate of the median and invert the sign test to construct a CI around the pooled estimate. We consider extending the methods proposed by McGrath et al [20] for pooling two-group studies as follows. We take a median of the study-specific differences of medians as the pooled estimate, and invert the sign test to construct the corresponding CI [3]

. Note that this method is nonparametric and the CI does not have coverage probability of exactly 95% because of the discreteness of the sign test. Therefore, the coverage probability of the interval is slightly higher than 95%, which depends on the number of studies included in the meta-analysis. We call this method the

median of the difference of medians (MDM) method.

If the study-specific difference of medians are drawn independently from a continuous distribution with median equal to the population difference of medians, the proposed method consistently estimates the target parameter. Moreover, the pooled estimator is a Hodges-Lehmann estimator in this case.

#### 2.3.2 Linear Quantile Mixed Models

Quantile regression models estimate the conditional quantile of the outcome variable. Linear quantile mixed models extend quantile regression models to include random effects [11]. We apply linear quantile mixed models (LQMM) to meta-analyze the studies, where the the study-specific difference of medians are random intercepts in the model. The estimated fixed effect intercept and its 95% CI are then used as the pooled estimate and its corresponding CI.

The LQMM models were fit using the ‘lqmm’ package in the R Statistical Software. Several numerical integration methods are available for model fitting, and different numerical integration methods lead to different distributional assumptions of the random intercepts. Because study-specific difference of medians are assumed to be normally distributed, Gauss-Hermite quadrature was chosen for numerical integration of the likelihood [11].

#### 2.3.3 Methods Based on Estimating the Variance of the Difference of Medians

We estimate the variance of the difference of medians of a given study using the asymptotic variance of the difference of medians. For group in study , let

be the probability density function of the underlying distribution,

be the study-population median, be the sample median, and be the number of subjects. The asymptotic distribution of is normal with mean and variance . It then follows that the asymptotic distribution of is

 yi∼N(mi1−mi2,14ni1fi1(mi1)2+14ni2fi2(mi2)2).

We estimate the difference of medians, , by replacing these quantities with the respective sample medians. To estimate the variance of the difference of medians, we use

 ˆVar(yi)=14⎛⎜⎝1ni1ˆfi1(mi1)2+1ni2ˆfi2(mi2)2⎞⎟⎠, (1)

where and are estimates of and , respectively. In Section 2.3.3.1, we propose a method to estimate these quantities for each study. After estimating the difference of medians and its variance for each study, we estimate a pooled difference of medians using the standard meta-analytic methods presented in Section 2.1. As with the transformation-based methods, we use the method of DerSimonian and Laird [9] to estimate heterogeneity.

For notational simplicity in the following section, we fix study and group and consequently drop the subscripts and from all relevant variables and functions. For instance, we denote the density function of the underlying distribution evaluated at the study-population median of group of study as instead of .

##### 2.3.3.1 Quantile Estimation

Given an independent group in a primary study that reports summary statistics of , , or , we propose the following method based on least squares to estimate the density function of the underlying distribution evaluated at the study-population median (i.e., ). First, we select a model parametrized by as the underlying distribution, which we denote as . Then, we fit the parameters of the distribution as follows. We define to be the sum of squares of the discrepancy between the theoretical quantiles of and the observed sample quantiles. That is, in , we use

 SP(θ)=(F−1(1/n)−a)2+(F−1(0.50)−y)2+(F−1(1−1/n)−b)2,

where , , and denote the observed sample minimum, median, and maximum, respectively. Moreover, denotes the sample size and denotes the cumulative distribution function of . In , is expressed as

 SP(θ)=(F−1(0.25)−q1)2+(F−1(0.50)−y)2+(F−1(0.75)−q3)2,

where and denote the observed first quartile and third quartile, respectively. Lastly, in ,

 SP(θ)=(F−1(1/n)−a)2+(F−1(0.25)−q1)2+(F−1(0.50)−y)2 +(F−1(0.75)−q3)2+(F−1(1−1/n)−b)2.

We then fit the parameters of by minimizing with respect to ,

 ^θ=argminθSP(θ).

We consider that the underlying parametric family of distributions may be the normal, log-normal, Weibull, or gamma distribution. These distributions were chosen to reflect a wide variety of possible outcome distributions. For each parametric family of distributions considered, we fit the parameters by minimizing

. The parametric family of distributions yielding the smallest value for is chosen to be the underlying parametric family of distribution, thereby allowing us to obtain . We call this method quantile estimation (QE).

To minimize with respect to , we use the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, denoted by L-BFGS-B, which is a quasi-Newton algorithm with box constraints [5]. For the log-normal and normal distributions, the study-population median was constrained to be between the minimum and maximum values in and between the first and third quartiles in and . We allowed a wide range of values for for the log-normal and normal distributions. Additionally, for the Gamma and Weibull distribution, we considered a wide range of values for the parameters. Table 1 summarizes the parameter constraints in , , and for all distributions.

We imposed suitable constraints for the parameter values in the L-BFGS-B algorithm (e.g., not allowing negative variance). The specific values for the constraints were chosen based on the work of Kwon and Reis [18]. Specifically, in the Bayesian transformation-based method of Kwon and Reis, the authors used these values for the uniform prior bounds when fitting the normal, log-normal, and Weibull distributions in , , and . For the gamma distribution, we set the constraints for the

parameter based on the uniform prior bounds of Kwon and Reis for fitting the exponential distribution in

, , and , and we allowed a wide interval for the parameter.

We implemented the described L-BFGS-B algorithm using the optim function available in ‘stats’ package in R Statistical Software. Convergence of the L-BFGS-B algorithm was defined as when the reduction of the objective function is within a factor of of machine tolerance, which corresponds to a tolerance of approximately . In all replications in the simulation study, the L-BFGS-B algorithm converged for at least one distribution. For each distribution, changes to the specific values of the constraints did not affect the solution provided the algorithm converged.

##### 2.3.3.2 Best-Case Scenario for Quantile Estimation

As a best-case scenario for the quantile estimation method, we use the true value of to estimate the variance of the difference of medians. That is, we estimate the variance of the difference of medians for each study using equation (1), where and are replaced with the true values of and , respectively. We denote this method by QE-BC.

### 2.4 Summary of Methods

We provide a brief summary of all methods considered to meta-analyze two-group studies that report the median of the outcome of interest. All methods fall into one of two categories: methods that meta-analyze the difference of means in order to estimate the population difference of means, and methods that meta-analyze the differences of medians in order to estimate the population difference of medians.

• Methods estimating the population difference of means:

• Wan et al: The mean and SD are estimated in each group in each study using the methods recommended by Wan et al [26]. Then, we meta-analyze the estimated difference of means using the inverse variance method described in Section 2.1.

• Luo et al: We use the method Luo et al [19] to estimate the mean and SD in each group in each study. We then pool the estimated difference of means using the inverse variance method.

• Methods estimating the population difference of medians:

• Median of the Difference of Medians (MDM): We take the median of the study-specific difference of medians as the pooled estimate, and we invert the sign test to construct a CI around it.

• Linear Quantile Mixed Models (LQMM): We fit a linear quantile mixed model using the study-specific difference of medians as random intercepts. The estimated fixed effect intercept and its 95% CI is used as the pooled estimate and the corresponding CI.

• Quantile Estimation (QE): We estimate the variance of the difference of medians in each study using equation (1). To estimate the density of the underlying distribution, we fit several distributions by minimizing the distance between the observed and theoretical quantiles and select the distribution with the best fit according to the proposed model selection procedure. Then, we use the inverse variance method to meta-analyze the difference of medians.

• Best-Case Scenario for Quantile Estimation (QE-BC): We estimate the variance of the difference of medians in each study using (1), where the density of the underlying distribution is known. Then, we use the inverse variance method to pool the difference of medians.

In Appendix A, we describe a Bayesian method for density estimation and show the simulation results for this method. In Appendix B, we describe several modifications to the proposed median-based methods and display their performance in the simulation study. These methods were excluded from the main paper because, in the simulation study, they did not considerably outperform the methods described in this section.

## 3 Simulation Study

### 3.1 Data Generation

We systematically compared the performance of the proposed median-based methods with the transformation-based methods via simulation study. We simulated data for meta-analyzing two-group studies by varying the number of studies, number of subjects per study, underlying outcome distribution, and between-study heterogeneity.

The number of studies was set to or . For a given study, the number of subjects was set to be equal across the two groups, which we denote as

. The value for the number of subjects was drawn from a log-normal distribution with median either

or and . When the median number of subjects was set to , we constrained the number of subjects to be between 10 and 500 by resampling until a value fell in the desired interval. Similarly, when the median number of subjects was set to , we constrained the number of subjects to be between 50 and 2,500.

We considered a total of three scenarios for the outcome distribution. In the first two scenarios, the outcomes for both groups were normally distributed and only differed by a location shift. For each study, we sampled values from the normal distribution with and two times independently, which we call the outcome values for groups 1 and 2. An effect size, denoted by , was then added to the outcome values in group 1 for all studies. We considered a null and moderate effect size. The moderate effect size was chosen to obtain power of 0.60 in a two-sample z-test of the difference of means with . For the third scenario, the group 1 outcome followed a moderately skewed mixture of normal distributions and the group 2 outcome followed a normal distribution with and for all studies in the meta-analysis. The mixture of normal distributions had approximately a mean of 41, median of 39, and variance of 60. No additional effect size was added to the group 1 outcome values. Figure 1 displays the densities of the normal and mixture of normal distributions and gives the parameters values used for the mixture of normal distributions.

Lastly, we considered that the outcome either had no heterogeneity (i.e., a fixed effect model) or we added heterogeneity, denoted by , which we generated from a normal distribution with mean and variance . When heterogeneity was included, we considered two levels of in order to obtain equal to 0.25 and 0.75—typically characterized as low and high levels of heterogeneity, respectively [14]. For each study in a meta-analysis, we sampled a value of heterogeneity and added it to the group 1 outcome values.

There were a total of combinations of data generation parameters, summarized in Table 2. For each of the 36 combinations, we simulated data for 1,000 meta-analyses.

In our simulations, all primary studies in a given meta-analysis reported the same set of summary measures. In the primary analysis, the primary studies reported summary measures of the outcome in the form of , , and , as described in Section 2.2.

### 3.2 Sensitivity Analysis: A Mix of Means and Medians

When meta-analyzing real-life data, some primary studies may report the median of the outcome and others may report the sample mean. We incorporated this scenario in our simulations as follows. We considered that studies reported the median of the outcome in both groups if the Shapiro-Wilk normality test [23] with significance level is rejected in at least one group and reported the mean in both groups otherwise. Additionally, we mimicked the scenario where authors do not consider the skewness of the outcome when choosing to report a mean or a median. Specifically, a random 25% of the primary studies reported the median and the remaining studies reported the mean of the outcome. When studies reported the median of the outcome, the first and third quartiles and the number of subjects were also reported (i.e., ). When the mean of the outcome was reported, the standard deviation and number of subjects were reported as well. For completeness, we also considered the scenario where all studies reported the sample mean, standard deviation, and number of subjects. This represents the best-case scenario for the transformation-based methods and the worst-case scenario for the median-based methods.

We applied the transformation-based approaches and median-based approaches in these scenarios as follows. For the transformation-based approaches, the sample mean and standard deviation were only estimated from studies that reported medians. When studies reported the sample mean, standard deviation, and number of subjects, these values were used to calculate the difference of means and its variance.

The quantile estimation method was applied as described in Section 2.3.3.1 when a study reported the median. When a study reported a mean, the method was applied in the following way. We assumed that the outcome distribution was normal when studies reported the mean and fit the parameters of the normal distribution using the maximum likelihood estimates (i.e., , ). Then, we estimated the variance of the difference of medians using equation (1).

For the MDM and LQMM methods, the effect measure of the study was taken to be the difference of means for studies reporting means and the difference of medians for studies reporting medians. Then, we took the median and its corresponding CI of the study-specific effect measures as the pooled median estimate for the MDM method. For LQMM method, the we applied the model as described in section 2.3.2, where the random intercepts are the study-specific effect measures. Note that in this scenario, these approaches make the assumption that the sample means well-approximate the medians.

### 3.3 Performance Measures

We estimated the relative error (RE), variance, and coverage of the CIs of the pooled estimates. Letting denote the estimate of the pooled effect measure and denote the true pooled effect measure, we define the relative error of as

 RE(^θ):=^θ−θθ×100.

We use the relative error of the pooled estimate as a performance measure rather than bias because, when the outcome follows a mixture of normals, the value of the target parameter (i.e., the population difference of means for the transformation-based methods and the population difference in medians for the median-based methods) depends on the method used. The variance was calculated over the 1,000 meta-analyses for each of the 36 combinations of data generation parameters.

Additionally, we calculated the bias of the estimate of heterogeneity. Letting and denote the estimate and true value of heterogeneity, respectively, the bias was defined as .

We note that in the scenario where both group outcomes were normally distributed and the added effect size was equal to zero, the true pooled effect measure was equal to zero. In this scenario, we used bias instead of relative error as a performance measure.

### 3.4 Results

The results of the simulation study are given when the number of studies was 10, the median number of subjects per study was 50, the value of was , and the added effect size in the normal distribution case was moderate. We found in preliminary analyses that these factors did not considerably affect the performance of the methods, and similar results hold for other fixed levels of these factors.

In Appendix C, we present the figures corresponding to the sensitivity analysis described in Section 3.2. Moreover, we note that in some contexts data analysts prefer to conduct fixed effect analyses. Therefore, we present the results of the primary analysis where we apply the inverse variance methods in a fixed effect analysis in Appendix D.

#### 3.4.1 Relative Error

Figure 2 displays the relative error of the pooled estimates by the outcome distribution and the summary measures reported in the primary analysis. The median-based methods have median relative error of nearly zero, regardless of the outcome distribution or summary measures given. The transformation-based methods also have median relative error of nearly zero for the normal distribution. However, the transformation-based methods display high relative error for the mixture of normals case. Although the method of Luo et al performs better than the method of Wan et al in when considering relative error, it does not offer a considerable improvement in or when the outcome distribution follows a mixture of normals.

Figure C.1 is the corresponding figure for the sensitivity analysis. When studies report the mean or median based on the skewness of the distribution, the relative error of all the approaches is similar to that of . In the scenarios where (i) studies randomly report the mean or median of the outcome and (ii) all studies report the mean, all methods perform well when the outcome is normally distributed and the transformation-based methods outperform the median-based methods in the mixture case.

#### 3.4.2 Variance

Table 3 gives the variance of the methods by the outcome distribution and the summary measures reported. We first consider the primary analysis. The QE method has the smallest variance amongst the median-based methods, nearly equal to the variance when using the true density (i.e., QE-BC). Moreover, the variance of the QE method is not highly affected by the summary measures reported. The transformation-based methods have variance comparable to the QE method in the investigated scenarios. Although the methods of Luo et al and Wan et al have similar variance in most scenarios, the method of Luo et al outperforms that of Wan et al in regards to variance in when the outcome distribution follows a mixture of normals.

In all scenarios considered in the sensitivity analysis, the QE method has similar variance as the transformation-based methods as well as the QE-BC method. The other median-based methods (i.e., MDM, and LQMM) yielded higher values for the variance. Moreover, the variance of all approaches was not considerably affected by the summary measures reported in the sensitivity analysis.

#### 3.4.3 Coverage

Figure 3 gives the coverage of the CIs of the methods by the outcome distribution and the summary measures reported in the primary anlysis. Note that coverage probability of the CI for the MDM method depends on the number of studies, and coverage probability of this method is approximately 97.85% for 10 studies. The median-based methods have nominal or near nominal coverage in all scenarios. For the transformation-based methods, their coverage is approximately 90%-92% in most scenarios where the outcome distribution was normal. However, the transformation-based method have poor coverage when the outcome distribution is a mixture of normals. The method of Luo et al only offers a considerable improvement to that of Wan et al when studies present and outcome distribution follows a mixture of normals.

Next, we consider the coverage of the methods in the sensitivity analysis, which is given in Figure C.2. In the scenario where studies report a mean or median based on the Shaprio-Wilk normality test, the coverage of all methods is similar to the scenario. When studies randomly report the mean or median, the transformation-based methods outperform the median-based methods in regards to coverage when the outcome is skewed. Specifically, when the outcome distribution is generated from the mixture of normals, the median-based methods have low coverage (i.e., less than 40%) but the the transformation-based methods obtain coverage of approximately 88%. The same conclusions hold for the scenario where all studies report sample means.

#### 3.4.4 Bias for Estimating Heterogeneity

Figure 4 illustrates the bias for estimating heterogeneity by the outcome distribution and the summary measures reported. Note that the MDM method is not included because it is not a random effects approach. The QE-BC method performs better than the QE method for estimating heterogeneity when the outcome follows a mixture of normals in and and both methods perform comparably otherwise. Similarly, for the transformation-based approaches, the method of Luo et al performs better than the method of Wan et al when the outcome follows a mixture of normals in and . When comparing the transformation-based methods to the median-based methods, the method of Luo et al performs either better than or comparably to the LQMM and QE methods for estimating heterogeneity in all scenarios in Figure 4.

Figure C.3 displays the results for the bias for estimating heterogeneity in the sensitivity analysis. When studies report a mix of means and medians, the Luo et al and LQMM methods are preferable for estimating heterogeneity when the outcome is normal and the QE method performs best in the mixture case. When all studies report means, the method directly pooling the difference of means estimates heterogeneity better than the transformation-based methods, as expected.

## 4 Example

To illustrate these approaches, we reanalyzed the meta-analysis conducted as part of the dissertation of Sohn [25]. The goal of the meta-analysis is to evaluate the impact of novel diagnostic tests for reducing diagnostic delays for patients diagnosed for drug susceptible or drug resistant tuberculosis (TB). Diagnostic delay is defined in [25] as the length of time between an individual’s first TB specific visit to a health care provider and the time of availability of diagnostic test results for clinical decision-making. The primary studies compared the delays experienced by two groups of patients, each receiving a different type of diagnostic test. One group received the Xpert MTB/RIF test, a cartridge-based nucleic acid amplification based test that can be performed at lower levels of the health system. Patients in the comparator group were diagnosed based on sputum smear microscopy, a century-old test widely used as the primary diagnostic test for TB.

The relevant data set was extrapolated from nine studies, all of which reported the median, first quartile, and third quartile of the diagnostic delay along with the sample sizes in the Xpert and smear groups (i.e., ). Since diagnostic delay was measured in days, patients who received same-day test results were recorded as having a diagnostic delay of 0 days. In order to avoid complications for the quantile estimation method when fitting parameters of distributions with a strictly positive support, we added a value of 0.5 to all diagnostic delay summary data. Table 4 displays the relevant study-specific summary data.

In the original analysis, the author estimated the sample mean and standard deviation from the summary data using the methods recommended by Wan et al [26]. Then, they calculated the difference of means and its variance for each study, and they meta-analyzed the difference of means in a random effects analysis.

In our analysis, we first apply transformation-based approaches to meta-analyze the data. We compare the methods recommended by Wan et al [26] with the method of Luo et al [19] for estimating the difference of means. We display the estimated study-specific difference of means and their 95% CIs in Table 5. We observe that the estimated difference of means and their 95% CIs obtained by the two methods are nearly identical.

We next considered using the difference of medians as the effect measure. In this case, we apply the MDM, LQMM, and QE methods to pool the difference of medians, as described in Section 2.3. Table 5 displays study-specific difference of medians along with their estimated 95% CIs using the QE method.

We display the pooled estimates and their 95% CIs of all considered methods in Table 6. We also report the statistic [14] and the –value of the test of heterogeneity [6] for the methods that pool studies with the inverse variance method (i.e., the Wan et al [26], Luo et al [19], and QE methods). Moreover, we report the estimate of heterogeneity for the random effect methods (i.e., the Wan et al [26], Luo et al [19], LQMM, and QE methods).

The estimated pooled difference of means of the two transformation-based methods are very similar and are centered at 2.1 days. The median-based methods estimate a pooled difference of medians of approximately 1 day. Moreover, the inverse variance methods all estimate significant heterogeneity.

## 5 Discussion

We proposed several methods to meta-analyze two-group studies that report the median of the outcome along with the sample size and several measures of spread. The proposed methods directly pool the difference of medians, whereas the existing transformation-based methods [26, 15, 2, 18, 17, 26, 19] are commonly used to pool estimates of the difference of means.

Based on the results of the simulation study, we make the following suggestions to data analysts meta-analyzing two-group studies that report the median of the outcome. If all or nearly all studies report the median of the outcome, the median-based methods are expected to perform better than or comparably to the transformation-based methods. Similarly, if all or nearly all studies report the mean, the transformation-based methods should be used. In the case that some studies report the mean and others report the median, several additional considerations must be made. If the study-level outcome distribution is approximately normal, then all the methods considered in this paper are expected to perform well. If the study-level outcome distribution is highly skewed in some of the primary studies, then the transformation-based methods may be better suited. In practice, the skewness of the outcome distribution may be evaluated by Bowleys’ coefficient of skewness [16]–provided that the primary studies report the appropriate summary measures–or authors may make assumptions of skewness based on a priori domain knowledge of the outcome.

Amongst the median-based methods, we recommend the quantile estimation method for most analyses. We found that this method performs better than the median of the difference of medians method and the linear quantile mixed model method in nearly all scenarios of the simulation study. Although we also implemented the Approximate Bayesian Computation method to improve density estimation by offering more well-established model selection and model averaging techniques (see Appendix A), we found that this approach performs nearly identically to the quantile estimation method in all scenarios in the simulation study. Therefore, in practice, we recommend the use of the quantile estimation method over the Approximate Bayesian Computation method because of its simplicity and low computational cost. Moreover, we note that the error induced by density estimation does not considerably affect the performance of the quantile estimation method, which can be seen by the fact that it performs similar to the approach using the true underlying density. The quantile estimation method offers several other advantages because it uses the inverse variance method to pool studies. First, under the standard assumption that the variances are known, the corresponding estimator is a minimum-variance unbiased estimator [22]. Second, it allows data analysts to perform standard follow-up analyses–such as heterogeneity modelling and cumulative meta-analyses–as well as graphical assessments–such as forest plots and funnel plots.

In the one-sample case for meta-analyzing medians studied by McGrath et al [20], they generated a skewed outcome in their simulations using a log-normal distribution. When generating the outcome in this manner, they found that meta-analytic methods using inverse variance weighting yielded highly biased estimates due to the correlation between the effect measures and their variances, which is a well-documented problem in the literature (e.g., see [24, 10]). Therefore, we chose to use a mixture of normal distributions to generate a skewed outcome in this work so that the effect measures are independent of their variances. When meta-analyzing data where this correlation is present, the median of the difference of medians and linear quantile mixed model method may be the preferred median-based methods because these methods weight studies equally. In future work, we intend to study more thoroughly the effect of correlation between effect sizes and their variance on the proposed median-based methods.

We also considered applying weighted versions of the median of the difference of medians and linear quantile mixed model methods, where studies were weighted proportionally to their sample size. As found in the one-sample case of McGrath et al [20], these weighted variations did not considerably improve the performance of the methods in the simulation study. Therefore, we excluded these approaches for parsimony.

Several authors have recently developed methods to estimate the sample mean and standard deviation from the median, sample size, and several measures of spread [26, 15, 2, 18, 17, 19]. We decided to use the methods recommended by Wan et al [26] and the subsequently optimized sample mean estimators of Luo et al [19] in our work for because they have been shown to clearly outperform the methods of Hozo [15] and Bland [2]. No known comparisons of the method of Luo et al [19] with those of Kwon and Reis [18, 17] have been performed. When comparing the methods recommended by Wan et al [26] with those of Kwon and Reis [18, 17], the skewness of the outcome distribution and summary measures reported have a stronger influence over the performance of the estimators compared to the choice of method. Since the method recommended by Wan et al [26] are more widely used in practice compared to those of Kwon and Reis [18, 17], we opted for the methods recommended by Wan et al [26] in this work.

Lastly, we revisit the data set originally collected and analyzed by Sohn [25]. Since all the primary studies reported the median of the outcome, the results of the simulation study suggest that the median-based methods are most suitable for meta-analyzing the data. The preferred quantile estimation method estimates a pooled difference of medians of approximately 1 day [95% CI: 0.2, 1.9] between the Xpert and smear groups. This method also indicate that the primary studies included in the meta-analysis are highly heterogeneous (), consistent with the conclusions obtained in the original analysis.

## References

• [1] Mark A. Beaumont, Wenyang Zhang, and David J. Balding. Approximate bayesian computation in population genetics. Genetics, 162(4):2025–2035, 2002.
• [2] Martin Bland. Estimating mean and standard deviation from the sample size, three quartiles, minimum, and maximum. International Journal of Statistics in Medical Research, 4(1):57–64, 2014.
• [3] M.V. Boldin, G.I. Simonova, and I.U.N. Ti_urin. Sign-based Methods in Linear Statistical Models. Translations of mathematical monographs. American Mathematical Society, 1997.
• [4] Michael Borenstein, Larry V. Hedges, Julian P.T. Higgins, and Hannah R. Rothstein. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods, 1(2):97–111, 2010.
• [5] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190–1208, 1995.
• [6] William G. Cochran. The combination of estimates from different experiments. Biometrics, 10(1):101–129, 1954.
• [7] W.J. Conover. Practical nonparametric statistics. Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley, 1980.
• [8] Jonathan J Deeks, Douglas G Altman, and Michael J Bradburn. Statistical Methods for Examining Heterogeneity and Combining Results from Several Studies in Meta-Analysis, pages 285–312. BMJ Publishing Group, 2008.
• [9] Rebecca DerSimonian and Nan Laird. Meta-analysis in clinical trials. Controlled clinical trials, 7(3):177–188, 1986.
• [10] John D. Emerson, David C. Hoaglin, and Frederick Mosteller. A modified random-effect procedure for combining risk difference in sets of 2×2 tables from clinical trials. Journal of the Italian Statistical Society, 2(3):269–290, Dec 1993.
• [11] Marco Geraci and Matteo Bottai. Linear quantile mixed models. Statistics and computing, 24(3):461–479, 2014.
• [12] Magnus Andersson Hagiwara, Anders Bremer, Andreas Claesson, Christer Axelsson, Gabriella Norberg, and Johan Herlitz. The impact of direct admission to a catheterisation lab/ccu in patients with st-elevation myocardial infarction on the delay to reperfusion and early risk of death: results of a systematic review including meta-analysis. Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine, 22(1):67, Nov 2014.
• [13] Larry V Hedges and Jack L Vevea. Fixed-and random-effects models in meta-analysis. Psychological methods, 3(4):486, 1998.
• [14] Julian Higgins and Simon G Thompson. Quantifying heterogeneity in a meta-analysis. Statistics in medicine, 21(11):1539–1558, 2002.
• [15] Stela Pudar Hozo, Benjamin Djulbegovic, and Iztok Hozo. Estimating the mean and variance from the median, range, and the size of a sample. BMC medical research methodology, 5(1):13, 2005.
• [16] John Francis Kenney. Mathematics of Statistics, Part 1. Van Nostrand, 3 edition, 1962.
• [17] D. Kwon and I. M. Reis. Approximate Bayesian computation (ABC) coupled with Bayesian model averaging method for estimating mean and standard deviation. ArXiv e-prints, July 2016.
• [18] Deukwoo Kwon and Isildinha M Reis. Simulation-based estimation of mean and standard deviation for meta-analysis via approximate bayesian computation (abc). BMC medical research methodology, 15(1):61, 2015.
• [19] Dehui Luo, Xiang Wan, Jiming Liu, and Tiejun Tong. Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range. Statistical methods in medical research, page 0962280216669183, 2016.
• [20] S. McGrath, X. Zhao, Z. Zhen Qin, R. Steele, and A. Benedetti. One-sample aggregate data meta-analysis of medians. ArXiv e-prints, September 2017.
• [21] Zhi Zhen Qin. Delays in diagnosis and treatment of pulmonary tuberculosis, and patient care-seeking pathways in China: a systematic review and meta-analysis. PhD thesis, McGill University, October 2016.
• [22] Andrew L Rukhin. Estimating heterogeneity variance in meta-analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(3):451–469, 2013.
• [23] Samuel Sanford Shapiro and Martin B Wilk. An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, 1965.
• [24] Jonathan J Shuster. Empirical vs natural weighting in random effects meta-analysis. Stat Med, 29(12):1259–1265, May 2010.
• [25] Hojoon Sohn. Improving tuberculosis diagnosis in vulnerable populations: impact and cost-effectiveness of novel, rapid molecular assays. PhD thesis, McGill University, February 2016.
• [26] Xiang Wan, Wenqian Wang, Jiming Liu, and Tiejun Tong. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC medical research methodology, 14(1):135, 2014.

## Appendix A Density Estimation via Approximate Bayesian Computation

In this section, we propose a Bayesian method for density estimation, which offers more well-established model selection and model averaging techniques. Using the same notation as in Section 2.3.3.1, we apply approximate Bayesian computation (ABC) to estimate from , , and . In standard Bayesian analyses, the posterior distribution is expressed as

 p(θ|y)∝p(y|θ)p(θ),

where denotes the likelihood and denotes the prior distribution. The main idea behind ABC is that a random sample drawn from a correctly specified likelihood should be close to the observed data. In ABC, one samples parameter values from , denoted by , and then draws a sample from . We consider to be a draw from the approximate posterior distribution if the distance between the observed data and the generated pseudodata is sufficiently small. Specifically, we implement the following algorithm, adapted from the methods of Kwon and Reis [18, 17].

We begin with specifying candidate parametric families of distributions for the outcome variable, which we denote as . Let model be parametrized by for . We select a parametric family of distributions from a multinomial distribution with probabilities where initially for . Suppose model chosen for some fixed . We sample from the specified prior distributions corresponding to model . Then we simulate a data set, , with observations from the likelihood and compute summary statistics in the same form of the original study (i.e., , , or ). If the distance between the summary statistics of and the study-specific summary statistics is sufficiently small, we retain

. After every 1,000 iterations, we update the vector

so that is the percentage of accepted parameter values for the model . After repetitions for some large , the distribution of the accepted approximate the posterior distribution of model . We fit using the means of their respective posterior distributions.

We consider applying ABC with (i) single distribution selection and (ii) Bayesian model averaging to estimate . As used by Kwon and Reis [17], we denote the ABC method with single distribution selection as and denote the ABC method with Bayesian model averaging as .

For single distribution selection, we use Bayes’ factor for model selection. That is, we select the distribution with the highest posterior model probability. Kwon and Reis

[18] show that the posterior model probability of model is well-approximated by the percentage of accepted parameter values for that model, . After selecting the parametric family of distributions with the highest posterior model probability, we estimate using the density function and distribution median of the fitted distribution.

For Bayesian model averaging, let denote the probability density function of the fitted distribution evaluated at its distribution median under model by applying the ABC algorithm. Then

 ˆf(m)=k∑i=1piˆf(m)(i),

where is the estimated posterior model probability of model .

Consistent with the methods of Kwon and Reis [18, 17], the distance between summary statistics of the simulated and observed data is measured in the norm. Moreover, in standard practice, one often uses a small acceptance rate of sampled parameter values rather than a minimum threshold to select candidate parameter values [1]. Specifically, we fix an acceptance rate of sampled parameters at 0.1% and set the number of iterations to 20,000 for a given parametric family of distributions, as recommended in this context by Kwon and Reis [18, 17]. That is, the 20 sample parameter values yielding summary statistics closest to the observed data were used to model the posterior distributions. As with the least squares method, we consider the following candidate distributions: the normal, log-normal, Weibull, and gamma distributions. Table A.1 displays the prior distributions under , , and . As with the box constraints used in the L-BFGS-B algorithm, these prior distribution were selected based on the ABC algorithms of Kwon and Reis [18, 17].

Because of the high computation cost of the ABC methods, we apply these methods to the first 500 (out of the 1,000) simulated data sets for each combination of data generation parameters. In Table A.2, we display the relative error of the pooled estimates, variance of the pooled estimates, coverage of the 95% CIs of the pooled estimates, and bias of the heterogeneity estimates when the ABC methods were applied to the data generated in the primary analysis of the simulation study. In all scenarios, the ABC methods perform comparably to the QE method and its best-case scenario (i.e., the QE-BC method).

## Appendix B Modifications to the Proposed Methods

In the following subsections, we discuss several modifications to the proposed median-based methods and evaluate the performance of the methods under these modifications.

### b.1 Median of the Difference of Medians

In the main paper, we invert the sign test to construct a CI around the pooled estimate of the MDM method. When the number of studies is sufficiently large, one can use the normality approximation of the binomial distribution to construct an approximate 95% CI around the pooled estimate. Specifically, let

denote the number of studies and denote the 0.975 quantile of the standard normal distribution. The and quantiles of the study-specific difference of medians are the lower and upper limits, respectively, of the approximate 95% CI of the MDM method [7]. We denote the MDM method with the normality approximation as MDM-N.

We applied the MDM-N method to the data generated in our primary simulation study, and we compare its coverage and CI lengths to the MDM method. Theoretically, the coverage probability of the MDM method is approximately 97.85% for 10 studies and 95.72% for 30 studies. In Table B.1, we present the coverage and mean length of the CIs of these methods. In all scenarios considered, the MDM-N had nearly equal coverage and smaller mean length of CIs when compared to to the MDM method.

### b.2 Modifications for Density Estimation Methods

In some contexts, authors may assume that the underlying outcome distribution of the two groups in a primary study only differ by a location shift. In this case, one can gain greater precision estimating the variance of the difference of medians by averaging the density estimates over the two groups. This method is described in the following paragraph.

Using the notation of Section 2.3.3, we assume that . Under the location shift assumption, a new estimate of the variance of the difference of medians is given by

 ˆVar(yi)=14ˆf(m)2i(1ni1+1ni2) (2)

where is given by

 ˆf(m)i=ni1ˆfi1(mi1)+n2ˆfi2(mi2)ni1+ni2.

However, we found in our simulations the accuracy of the pooled estimates of the density estimation methods using (2) did not considerably improve the accuracy of the methods (data not shown).

## Appendix C Sensitivity Analysis: A Mix of Means and Medians

In this section, we provide the figures corresponding to the sensitivity analysis. Figures C.1, C.2, and C.3 give the relative error of the pooled estimates, coverage of the 95% CIs of the pooled estimates, and the bias of the heterogeneity estimates, respectively.

## Appendix D Fixed Effect Methods

In this section, we apply the inverse variance methods in fixed effect analyses. This includes both transformation-based methods as well as the QE and QE-BC methods. Table D.1 gives the relative error, variance, and coverage of the 95% CIs of the inverse variances methods with fixed effect pooling. The results are similar to those of the random effects analyses, and the same overall conclusions and recommendations hold in the fixed effect case.