1 Introduction
Metaanalysis is a statistical method for combining data from several scientific studies addressing a common research question. Researchers conduct metaanalyses to obtain more precise statistical estimates of the outcome of interest, investigate sources of heterogeneity, and generalize conclusions across studies. Over the past few decades, metaanalysis has gained considerable popularity in research synthesis and is widely considered to be the goldstandard of evidence for systematic review.
Researchers often metaanalyze twogroup studies that evaluate the change in an outcome across two groups. One such study design includes clinical trials with a treatment and control group. Frequently, only summary measures of the outcome variable in each group are available from the primary studies, which typically include a point estimate of the outcome and its variance. Data analysts must then decide on the effect measure to use in the analysis, which quantifies the change of the outcome between the two groups. Standard metaanalytic methods estimate a pooled effect measure by a weighted mean of the studyspecific effect measures, where the weight assigned to each study are inversely proportional to the variance of the point estimate. These methods are referred to as inverse variance approaches [4].
In this paper, we consider the case where the outcome variable is continuous and is measured on a scale that is meaningful and consistent across studies. In this context, twogroup studies usually report the sample mean and standard deviation of the outcome in each group, and researchers typically use the difference of means as the effect measure of the primary studies. Then, researchers metaanalyze the difference of means in the two groups. Metaanalytic methods for pooling the difference of means are wellestablished in the literature [8].
However, when the distribution of the outcome variable is skewed, authors often report the sample median of the outcome to better represent the center of the data. In this case, to describe the spread of the data, authors frequently report the first and third quartiles, the minimum and maximum values, or both sets of measures (e.g., see
[21, 25]). Although the difference of medians is the most natural effect measure to use in this context for twogroup studies, there are no available statistical methods to metaanalyze the difference of medians. The challenge of applying the inverse variance approach to pool the difference of medians is that the variance of the difference of medians is not reported by the primary studies. Furthermore, this quantity is difficult to estimate based on the summary measures provided by a primary study because it depends on the underlying distribution.Consequently, studies that report the median of the outcome are either left out of metaanalyses (e.g., see [12]) or researchers transform the outcome so that wellestablished metaanalytic methods can be applied. More specifically, several authors have recently proposed methods estimate the sample mean and standard deviation from the median, sample size, and several commonly reported measures of spread [26, 15, 2, 18, 17, 19]. These methods are used to estimate the difference of means and its variance for each primary study that reports medians. Then, data analysts pool the estimated difference of means using the inverse variance method. Although these methods are widely used in practice, they suffer several limitations. First, in this analysis, one uses an estimate for the studyspecific effect measure (i.e., the estimated difference of means) rather than one that can be directly calculated (i.e., the difference of medians), which introduces an additional source of error in the analysis. Moreover, the performance of these methods for estimating the sample mean and standard deviation has been shown to be heavily influenced by the skewness of the outcome distribution [18], which is likely to be highly skewed when authors report medians. Finally, regardless of the accuracy of these methods for estimating sample mean and standard deviation, sample medians may better represent the center of the data for skewed outcomes and therefore may be the preferred effect measure in metaanalysis of skewed continuous outcomes.
In this work, we develop methods to directly metaanalyze the difference of medians. We first consider several simple methods that are not based on the classical inverse variance approach. Specifically, we extend the methods proposed by McGrath et al [20] for metaanalyzing onegroup studies that report the median to the twosample context, and we consider applying linear quantile mixed models. In order to weight studies optimally, we develop methods to estimate the variance of the median in order to pool the difference of medians using the inverse variance approach.
We describe the existing the proposed methods to metaanalyze twogroup studies that report medians in Section 2. In Section 3, we present the results of a simulation study evaluating the performance of the existing and proposed methods. In Section 4, we illustrate these approaches on a reallife data set. We conclude with a discussion in Section 5.
2 Methods
2.1 Standard Fixed Effect and Random Effects Models
We first describe the standard fixed effect and random effects models for the metaanalysis of continuous outcomes. For study where , let denote the estimated effect measure and let be its estimated variance. For instance, the effect measure may be the difference of means or difference of medians of the outcome in two arms of a trial. Note that although is estimated from the data, it is typically assumed to be a known quantity in metaanalysis. The fixed effect model posits that a single true effect measure, , underlies all studies. Under standard assumptions,
where is random withinstudy variation. The pooled effect measure is then estimated by a weighted mean of ,
The weights for the estimates above are given by .
The random effects model, on the other hand, presumes that different studies have different true effect measures. The true effect measure of study
is sampled from a normal distribution with mean
and variance . That is,where is random betweenstudy variation, is random withinstudy variation, and and are independent. The pooled effect measure is estimated by
The weights are given by where is an estimate of
. The method of moments estimator of DerSimonian and Laird
[9] is commonly used to estimate , which is given bywhere
Under the typical assumption that the variances are known, the fixed effect and random effects estimators are minimum variance unbiased estimators
[22].2.2 Existing Methods: TransformationBased Approaches
For group () in study , we use the following notation for summary measures that might be reported. Let be the number of subjects, be the median of the outcome, and be the first and third quartiles, respectively, and let and be the minimum and maximum values, respectively. Furthermore, let be the sample mean and be the sample standard deviation.
We consider the following sets of summary measures that studies may report. In scenario 1 (), a study reports the median, minimum and maximum values, and number of subjects. In scenario 2 (), a study reports the median, first and third quartiles, and number of subjects. Lastly, in scenario 3 (), a study reports the median, minimum and maximum values, first and third quartiles, and number of subjects.
We estimate the sample mean and standard deviation of the outcome in each group in each study. Then we estimate a fixed effect and random effects pooled difference of means and its 95% confidence interval (CI) from the estimated differences of means and their variances
[13]. For the random effects methods, we estimate using the method of DerSimonian and Laird [9]. We call these approaches transformationbased approaches. In the following subsections, we describe the methods for estimating the sample mean and standard deviation under scenarios , , and .2.2.1 Method of Wan et al
Wan et al [26] give an overview of methods for estimating the sample mean and standard deviation under scenarios , , and . Wan et al [26] recommend the following methods based on their simulation study. For , Wan et al [26] suggest using Hozo’s method [15] to estimate the sample mean and recommend using their proposed method to estimate standard deviation. These estimates are given by:
where
denotes the inverse of the standard normal cumulative distribution function. For
, Wan et al [26] develop the following methods to estimate the sample mean and standard deviation:Lastly, in , Wan et al [26] suggest using the method of Bland [2] to estimate the sample mean and recommend using their method to estimate the standard deviation, which are given by:
A brief outline of the derivation of these methods is given below. First, we consider estimating the sample mean in and . Hozo et al [15] and Bland [2] place upper and lower bounds on the ordered observations with the reported sample quantiles. By summing the inequalities and dividing by , one obtains bounds for the sample mean in and . Then, one estimates the sample mean by the average of the upper and lower bounds. Note that these methods make no distributional assumptions for the outcome variable.
To estimate the sample mean in , Wan et al [26] assume that the outcome distribution is normal. Wan et al [26] show that the mean of the normal distribution can be expressed in terms of the expected value of the median and first and third quartiles. Then, Wan et al [26] estimate the sample mean by replacing the expected value of the sample median and first and third quartiles with the observed sample values.
Similarly, Wan et al [26] estimate the sample standard deviation in , , and under the assumption that the outcome distribution is normal. Wan et al [26] show that the standard deviation of the normal distribution can be expressed in terms of the expected value of sample quantiles. One can then estimate the sample standard deviation by replacing the expected value of the sample quantiles with the observed sample quantiles.
2.2.2 Method of Luo et al
We also consider the methods more recently recommended of Luo et al [19]. Under the assumption that the outcome variable is normally distributed, the methods of Luo et al [19] optimize the sample mean estimates recommended by Wan et al [26]. The formulas for estimating the sample mean from , , and are given below:
To estimate the standard deviation, Luo et al [19] recommend the methods previously described by Wan et al [26].
2.3 Proposed Methods
We propose several methods to directly metaanalyze the difference of medians when the primary studies present summary statistics of , , or . The target parameter in this case is the population difference of medians. These approaches, which we call medianbased approaches, fall into two categories. In the first category, we consider simple approaches that do not apply the standard inverse variance method. The second category of approaches estimates the variance of the difference of medians and then pools the studies using the standard inverse variance method described in Section 2.1.
2.3.1 Median of the Difference of Medians
McGrath et al [20] proposed methods for pooling onegroup studies that report the median of the outcome. These methods use the median of the studyspecific medians as the pooled estimate of the median and invert the sign test to construct a CI around the pooled estimate. We consider extending the methods proposed by McGrath et al [20] for pooling twogroup studies as follows. We take a median of the studyspecific differences of medians as the pooled estimate, and invert the sign test to construct the corresponding CI [3]
. Note that this method is nonparametric and the CI does not have coverage probability of exactly 95% because of the discreteness of the sign test. Therefore, the coverage probability of the interval is slightly higher than 95%, which depends on the number of studies included in the metaanalysis. We call this method the
median of the difference of medians (MDM) method.If the studyspecific difference of medians are drawn independently from a continuous distribution with median equal to the population difference of medians, the proposed method consistently estimates the target parameter. Moreover, the pooled estimator is a HodgesLehmann estimator in this case.
2.3.2 Linear Quantile Mixed Models
Quantile regression models estimate the conditional quantile of the outcome variable. Linear quantile mixed models extend quantile regression models to include random effects [11]. We apply linear quantile mixed models (LQMM) to metaanalyze the studies, where the the studyspecific difference of medians are random intercepts in the model. The estimated fixed effect intercept and its 95% CI are then used as the pooled estimate and its corresponding CI.
The LQMM models were fit using the ‘lqmm’ package in the R Statistical Software. Several numerical integration methods are available for model fitting, and different numerical integration methods lead to different distributional assumptions of the random intercepts. Because studyspecific difference of medians are assumed to be normally distributed, GaussHermite quadrature was chosen for numerical integration of the likelihood [11].
2.3.3 Methods Based on Estimating the Variance of the Difference of Medians
We estimate the variance of the difference of medians of a given study using the asymptotic variance of the difference of medians. For group in study , let
be the probability density function of the underlying distribution,
be the studypopulation median, be the sample median, and be the number of subjects. The asymptotic distribution of is normal with mean and variance . It then follows that the asymptotic distribution of isWe estimate the difference of medians, , by replacing these quantities with the respective sample medians. To estimate the variance of the difference of medians, we use
(1) 
where and are estimates of and , respectively. In Section 2.3.3.1, we propose a method to estimate these quantities for each study. After estimating the difference of medians and its variance for each study, we estimate a pooled difference of medians using the standard metaanalytic methods presented in Section 2.1. As with the transformationbased methods, we use the method of DerSimonian and Laird [9] to estimate heterogeneity.
For notational simplicity in the following section, we fix study and group and consequently drop the subscripts and from all relevant variables and functions. For instance, we denote the density function of the underlying distribution evaluated at the studypopulation median of group of study as instead of .
2.3.3.1 Quantile Estimation
Given an independent group in a primary study that reports summary statistics of , , or , we propose the following method based on least squares to estimate the density function of the underlying distribution evaluated at the studypopulation median (i.e., ). First, we select a model parametrized by as the underlying distribution, which we denote as . Then, we fit the parameters of the distribution as follows. We define to be the sum of squares of the discrepancy between the theoretical quantiles of and the observed sample quantiles. That is, in , we use
where , , and denote the observed sample minimum, median, and maximum, respectively. Moreover, denotes the sample size and denotes the cumulative distribution function of . In , is expressed as
where and denote the observed first quartile and third quartile, respectively. Lastly, in ,
We then fit the parameters of by minimizing with respect to ,
We consider that the underlying parametric family of distributions may be the normal, lognormal, Weibull, or gamma distribution. These distributions were chosen to reflect a wide variety of possible outcome distributions. For each parametric family of distributions considered, we fit the parameters by minimizing
. The parametric family of distributions yielding the smallest value for is chosen to be the underlying parametric family of distribution, thereby allowing us to obtain . We call this method quantile estimation (QE).To minimize with respect to , we use the limitedmemory BroydenFletcherGoldfarbShanno (BFGS) algorithm, denoted by LBFGSB, which is a quasiNewton algorithm with box constraints [5]. For the lognormal and normal distributions, the studypopulation median was constrained to be between the minimum and maximum values in and between the first and third quartiles in and . We allowed a wide range of values for for the lognormal and normal distributions. Additionally, for the Gamma and Weibull distribution, we considered a wide range of values for the parameters. Table 1 summarizes the parameter constraints in , , and for all distributions.
We imposed suitable constraints for the parameter values in the LBFGSB algorithm (e.g., not allowing negative variance). The specific values for the constraints were chosen based on the work of Kwon and Reis [18]. Specifically, in the Bayesian transformationbased method of Kwon and Reis, the authors used these values for the uniform prior bounds when fitting the normal, lognormal, and Weibull distributions in , , and . For the gamma distribution, we set the constraints for the
parameter based on the uniform prior bounds of Kwon and Reis for fitting the exponential distribution in
, , and , and we allowed a wide interval for the parameter.We implemented the described LBFGSB algorithm using the optim function available in ‘stats’ package in R Statistical Software. Convergence of the LBFGSB algorithm was defined as when the reduction of the objective function is within a factor of of machine tolerance, which corresponds to a tolerance of approximately . In all replications in the simulation study, the LBFGSB algorithm converged for at least one distribution. For each distribution, changes to the specific values of the constraints did not affect the solution provided the algorithm converged.
2.3.3.2 BestCase Scenario for Quantile Estimation
As a bestcase scenario for the quantile estimation method, we use the true value of to estimate the variance of the difference of medians. That is, we estimate the variance of the difference of medians for each study using equation (1), where and are replaced with the true values of and , respectively. We denote this method by QEBC.
2.4 Summary of Methods
We provide a brief summary of all methods considered to metaanalyze twogroup studies that report the median of the outcome of interest. All methods fall into one of two categories: methods that metaanalyze the difference of means in order to estimate the population difference of means, and methods that metaanalyze the differences of medians in order to estimate the population difference of medians.

Methods estimating the population difference of means:

Luo et al: We use the method Luo et al [19] to estimate the mean and SD in each group in each study. We then pool the estimated difference of means using the inverse variance method.

Methods estimating the population difference of medians:

Median of the Difference of Medians (MDM): We take the median of the studyspecific difference of medians as the pooled estimate, and we invert the sign test to construct a CI around it.

Linear Quantile Mixed Models (LQMM): We fit a linear quantile mixed model using the studyspecific difference of medians as random intercepts. The estimated fixed effect intercept and its 95% CI is used as the pooled estimate and the corresponding CI.

Quantile Estimation (QE): We estimate the variance of the difference of medians in each study using equation (1). To estimate the density of the underlying distribution, we fit several distributions by minimizing the distance between the observed and theoretical quantiles and select the distribution with the best fit according to the proposed model selection procedure. Then, we use the inverse variance method to metaanalyze the difference of medians.

BestCase Scenario for Quantile Estimation (QEBC): We estimate the variance of the difference of medians in each study using (1), where the density of the underlying distribution is known. Then, we use the inverse variance method to pool the difference of medians.

In Appendix A, we describe a Bayesian method for density estimation and show the simulation results for this method. In Appendix B, we describe several modifications to the proposed medianbased methods and display their performance in the simulation study. These methods were excluded from the main paper because, in the simulation study, they did not considerably outperform the methods described in this section.
3 Simulation Study
3.1 Data Generation
We systematically compared the performance of the proposed medianbased methods with the transformationbased methods via simulation study. We simulated data for metaanalyzing twogroup studies by varying the number of studies, number of subjects per study, underlying outcome distribution, and betweenstudy heterogeneity.
The number of studies was set to or . For a given study, the number of subjects was set to be equal across the two groups, which we denote as
. The value for the number of subjects was drawn from a lognormal distribution with median either
or and . When the median number of subjects was set to , we constrained the number of subjects to be between 10 and 500 by resampling until a value fell in the desired interval. Similarly, when the median number of subjects was set to , we constrained the number of subjects to be between 50 and 2,500.We considered a total of three scenarios for the outcome distribution. In the first two scenarios, the outcomes for both groups were normally distributed and only differed by a location shift. For each study, we sampled values from the normal distribution with and two times independently, which we call the outcome values for groups 1 and 2. An effect size, denoted by , was then added to the outcome values in group 1 for all studies. We considered a null and moderate effect size. The moderate effect size was chosen to obtain power of 0.60 in a twosample ztest of the difference of means with . For the third scenario, the group 1 outcome followed a moderately skewed mixture of normal distributions and the group 2 outcome followed a normal distribution with and for all studies in the metaanalysis. The mixture of normal distributions had approximately a mean of 41, median of 39, and variance of 60. No additional effect size was added to the group 1 outcome values. Figure 1 displays the densities of the normal and mixture of normal distributions and gives the parameters values used for the mixture of normal distributions.
Lastly, we considered that the outcome either had no heterogeneity (i.e., a fixed effect model) or we added heterogeneity, denoted by , which we generated from a normal distribution with mean and variance . When heterogeneity was included, we considered two levels of in order to obtain equal to 0.25 and 0.75—typically characterized as low and high levels of heterogeneity, respectively [14]. For each study in a metaanalysis, we sampled a value of heterogeneity and added it to the group 1 outcome values.
There were a total of combinations of data generation parameters, summarized in Table 2. For each of the 36 combinations, we simulated data for 1,000 metaanalyses.
In our simulations, all primary studies in a given metaanalysis reported the same set of summary measures. In the primary analysis, the primary studies reported summary measures of the outcome in the form of , , and , as described in Section 2.2.
3.2 Sensitivity Analysis: A Mix of Means and Medians
When metaanalyzing reallife data, some primary studies may report the median of the outcome and others may report the sample mean. We incorporated this scenario in our simulations as follows. We considered that studies reported the median of the outcome in both groups if the ShapiroWilk normality test [23] with significance level is rejected in at least one group and reported the mean in both groups otherwise. Additionally, we mimicked the scenario where authors do not consider the skewness of the outcome when choosing to report a mean or a median. Specifically, a random 25% of the primary studies reported the median and the remaining studies reported the mean of the outcome. When studies reported the median of the outcome, the first and third quartiles and the number of subjects were also reported (i.e., ). When the mean of the outcome was reported, the standard deviation and number of subjects were reported as well. For completeness, we also considered the scenario where all studies reported the sample mean, standard deviation, and number of subjects. This represents the bestcase scenario for the transformationbased methods and the worstcase scenario for the medianbased methods.
We applied the transformationbased approaches and medianbased approaches in these scenarios as follows. For the transformationbased approaches, the sample mean and standard deviation were only estimated from studies that reported medians. When studies reported the sample mean, standard deviation, and number of subjects, these values were used to calculate the difference of means and its variance.
The quantile estimation method was applied as described in Section 2.3.3.1 when a study reported the median. When a study reported a mean, the method was applied in the following way. We assumed that the outcome distribution was normal when studies reported the mean and fit the parameters of the normal distribution using the maximum likelihood estimates (i.e., , ). Then, we estimated the variance of the difference of medians using equation (1).
For the MDM and LQMM methods, the effect measure of the study was taken to be the difference of means for studies reporting means and the difference of medians for studies reporting medians. Then, we took the median and its corresponding CI of the studyspecific effect measures as the pooled median estimate for the MDM method. For LQMM method, the we applied the model as described in section 2.3.2, where the random intercepts are the studyspecific effect measures. Note that in this scenario, these approaches make the assumption that the sample means wellapproximate the medians.
3.3 Performance Measures
We estimated the relative error (RE), variance, and coverage of the CIs of the pooled estimates. Letting denote the estimate of the pooled effect measure and denote the true pooled effect measure, we define the relative error of as
We use the relative error of the pooled estimate as a performance measure rather than bias because, when the outcome follows a mixture of normals, the value of the target parameter (i.e., the population difference of means for the transformationbased methods and the population difference in medians for the medianbased methods) depends on the method used. The variance was calculated over the 1,000 metaanalyses for each of the 36 combinations of data generation parameters.
Additionally, we calculated the bias of the estimate of heterogeneity. Letting and denote the estimate and true value of heterogeneity, respectively, the bias was defined as .
We note that in the scenario where both group outcomes were normally distributed and the added effect size was equal to zero, the true pooled effect measure was equal to zero. In this scenario, we used bias instead of relative error as a performance measure.
3.4 Results
The results of the simulation study are given when the number of studies was 10, the median number of subjects per study was 50, the value of was , and the added effect size in the normal distribution case was moderate. We found in preliminary analyses that these factors did not considerably affect the performance of the methods, and similar results hold for other fixed levels of these factors.
In Appendix C, we present the figures corresponding to the sensitivity analysis described in Section 3.2. Moreover, we note that in some contexts data analysts prefer to conduct fixed effect analyses. Therefore, we present the results of the primary analysis where we apply the inverse variance methods in a fixed effect analysis in Appendix D.
3.4.1 Relative Error
Figure 2 displays the relative error of the pooled estimates by the outcome distribution and the summary measures reported in the primary analysis. The medianbased methods have median relative error of nearly zero, regardless of the outcome distribution or summary measures given. The transformationbased methods also have median relative error of nearly zero for the normal distribution. However, the transformationbased methods display high relative error for the mixture of normals case. Although the method of Luo et al performs better than the method of Wan et al in when considering relative error, it does not offer a considerable improvement in or when the outcome distribution follows a mixture of normals.
Figure C.1 is the corresponding figure for the sensitivity analysis. When studies report the mean or median based on the skewness of the distribution, the relative error of all the approaches is similar to that of . In the scenarios where (i) studies randomly report the mean or median of the outcome and (ii) all studies report the mean, all methods perform well when the outcome is normally distributed and the transformationbased methods outperform the medianbased methods in the mixture case.
3.4.2 Variance
Table 3 gives the variance of the methods by the outcome distribution and the summary measures reported. We first consider the primary analysis. The QE method has the smallest variance amongst the medianbased methods, nearly equal to the variance when using the true density (i.e., QEBC). Moreover, the variance of the QE method is not highly affected by the summary measures reported. The transformationbased methods have variance comparable to the QE method in the investigated scenarios. Although the methods of Luo et al and Wan et al have similar variance in most scenarios, the method of Luo et al outperforms that of Wan et al in regards to variance in when the outcome distribution follows a mixture of normals.
In all scenarios considered in the sensitivity analysis, the QE method has similar variance as the transformationbased methods as well as the QEBC method. The other medianbased methods (i.e., MDM, and LQMM) yielded higher values for the variance. Moreover, the variance of all approaches was not considerably affected by the summary measures reported in the sensitivity analysis.
3.4.3 Coverage
Figure 3 gives the coverage of the CIs of the methods by the outcome distribution and the summary measures reported in the primary anlysis. Note that coverage probability of the CI for the MDM method depends on the number of studies, and coverage probability of this method is approximately 97.85% for 10 studies. The medianbased methods have nominal or near nominal coverage in all scenarios. For the transformationbased methods, their coverage is approximately 90%92% in most scenarios where the outcome distribution was normal. However, the transformationbased method have poor coverage when the outcome distribution is a mixture of normals. The method of Luo et al only offers a considerable improvement to that of Wan et al when studies present and outcome distribution follows a mixture of normals.
Next, we consider the coverage of the methods in the sensitivity analysis, which is given in Figure C.2. In the scenario where studies report a mean or median based on the ShaprioWilk normality test, the coverage of all methods is similar to the scenario. When studies randomly report the mean or median, the transformationbased methods outperform the medianbased methods in regards to coverage when the outcome is skewed. Specifically, when the outcome distribution is generated from the mixture of normals, the medianbased methods have low coverage (i.e., less than 40%) but the the transformationbased methods obtain coverage of approximately 88%. The same conclusions hold for the scenario where all studies report sample means.
3.4.4 Bias for Estimating Heterogeneity
Figure 4 illustrates the bias for estimating heterogeneity by the outcome distribution and the summary measures reported. Note that the MDM method is not included because it is not a random effects approach. The QEBC method performs better than the QE method for estimating heterogeneity when the outcome follows a mixture of normals in and and both methods perform comparably otherwise. Similarly, for the transformationbased approaches, the method of Luo et al performs better than the method of Wan et al when the outcome follows a mixture of normals in and . When comparing the transformationbased methods to the medianbased methods, the method of Luo et al performs either better than or comparably to the LQMM and QE methods for estimating heterogeneity in all scenarios in Figure 4.
Figure C.3 displays the results for the bias for estimating heterogeneity in the sensitivity analysis. When studies report a mix of means and medians, the Luo et al and LQMM methods are preferable for estimating heterogeneity when the outcome is normal and the QE method performs best in the mixture case. When all studies report means, the method directly pooling the difference of means estimates heterogeneity better than the transformationbased methods, as expected.
4 Example
To illustrate these approaches, we reanalyzed the metaanalysis conducted as part of the dissertation of Sohn [25]. The goal of the metaanalysis is to evaluate the impact of novel diagnostic tests for reducing diagnostic delays for patients diagnosed for drug susceptible or drug resistant tuberculosis (TB). Diagnostic delay is defined in [25] as the length of time between an individual’s first TB specific visit to a health care provider and the time of availability of diagnostic test results for clinical decisionmaking. The primary studies compared the delays experienced by two groups of patients, each receiving a different type of diagnostic test. One group received the Xpert MTB/RIF test, a cartridgebased nucleic acid amplification based test that can be performed at lower levels of the health system. Patients in the comparator group were diagnosed based on sputum smear microscopy, a centuryold test widely used as the primary diagnostic test for TB.
The relevant data set was extrapolated from nine studies, all of which reported the median, first quartile, and third quartile of the diagnostic delay along with the sample sizes in the Xpert and smear groups (i.e., ). Since diagnostic delay was measured in days, patients who received sameday test results were recorded as having a diagnostic delay of 0 days. In order to avoid complications for the quantile estimation method when fitting parameters of distributions with a strictly positive support, we added a value of 0.5 to all diagnostic delay summary data. Table 4 displays the relevant studyspecific summary data.
In the original analysis, the author estimated the sample mean and standard deviation from the summary data using the methods recommended by Wan et al [26]. Then, they calculated the difference of means and its variance for each study, and they metaanalyzed the difference of means in a random effects analysis.
In our analysis, we first apply transformationbased approaches to metaanalyze the data. We compare the methods recommended by Wan et al [26] with the method of Luo et al [19] for estimating the difference of means. We display the estimated studyspecific difference of means and their 95% CIs in Table 5. We observe that the estimated difference of means and their 95% CIs obtained by the two methods are nearly identical.
We next considered using the difference of medians as the effect measure. In this case, we apply the MDM, LQMM, and QE methods to pool the difference of medians, as described in Section 2.3. Table 5 displays studyspecific difference of medians along with their estimated 95% CIs using the QE method.
We display the pooled estimates and their 95% CIs of all considered methods in Table 6. We also report the statistic [14] and the –value of the test of heterogeneity [6] for the methods that pool studies with the inverse variance method (i.e., the Wan et al [26], Luo et al [19], and QE methods). Moreover, we report the estimate of heterogeneity for the random effect methods (i.e., the Wan et al [26], Luo et al [19], LQMM, and QE methods).
The estimated pooled difference of means of the two transformationbased methods are very similar and are centered at 2.1 days. The medianbased methods estimate a pooled difference of medians of approximately 1 day. Moreover, the inverse variance methods all estimate significant heterogeneity.
5 Discussion
We proposed several methods to metaanalyze twogroup studies that report the median of the outcome along with the sample size and several measures of spread. The proposed methods directly pool the difference of medians, whereas the existing transformationbased methods [26, 15, 2, 18, 17, 26, 19] are commonly used to pool estimates of the difference of means.
Based on the results of the simulation study, we make the following suggestions to data analysts metaanalyzing twogroup studies that report the median of the outcome. If all or nearly all studies report the median of the outcome, the medianbased methods are expected to perform better than or comparably to the transformationbased methods. Similarly, if all or nearly all studies report the mean, the transformationbased methods should be used. In the case that some studies report the mean and others report the median, several additional considerations must be made. If the studylevel outcome distribution is approximately normal, then all the methods considered in this paper are expected to perform well. If the studylevel outcome distribution is highly skewed in some of the primary studies, then the transformationbased methods may be better suited. In practice, the skewness of the outcome distribution may be evaluated by Bowleys’ coefficient of skewness [16]–provided that the primary studies report the appropriate summary measures–or authors may make assumptions of skewness based on a priori domain knowledge of the outcome.
Amongst the medianbased methods, we recommend the quantile estimation method for most analyses. We found that this method performs better than the median of the difference of medians method and the linear quantile mixed model method in nearly all scenarios of the simulation study. Although we also implemented the Approximate Bayesian Computation method to improve density estimation by offering more wellestablished model selection and model averaging techniques (see Appendix A), we found that this approach performs nearly identically to the quantile estimation method in all scenarios in the simulation study. Therefore, in practice, we recommend the use of the quantile estimation method over the Approximate Bayesian Computation method because of its simplicity and low computational cost. Moreover, we note that the error induced by density estimation does not considerably affect the performance of the quantile estimation method, which can be seen by the fact that it performs similar to the approach using the true underlying density. The quantile estimation method offers several other advantages because it uses the inverse variance method to pool studies. First, under the standard assumption that the variances are known, the corresponding estimator is a minimumvariance unbiased estimator [22]. Second, it allows data analysts to perform standard followup analyses–such as heterogeneity modelling and cumulative metaanalyses–as well as graphical assessments–such as forest plots and funnel plots.
In the onesample case for metaanalyzing medians studied by McGrath et al [20], they generated a skewed outcome in their simulations using a lognormal distribution. When generating the outcome in this manner, they found that metaanalytic methods using inverse variance weighting yielded highly biased estimates due to the correlation between the effect measures and their variances, which is a welldocumented problem in the literature (e.g., see [24, 10]). Therefore, we chose to use a mixture of normal distributions to generate a skewed outcome in this work so that the effect measures are independent of their variances. When metaanalyzing data where this correlation is present, the median of the difference of medians and linear quantile mixed model method may be the preferred medianbased methods because these methods weight studies equally. In future work, we intend to study more thoroughly the effect of correlation between effect sizes and their variance on the proposed medianbased methods.
We also considered applying weighted versions of the median of the difference of medians and linear quantile mixed model methods, where studies were weighted proportionally to their sample size. As found in the onesample case of McGrath et al [20], these weighted variations did not considerably improve the performance of the methods in the simulation study. Therefore, we excluded these approaches for parsimony.
Several authors have recently developed methods to estimate the sample mean and standard deviation from the median, sample size, and several measures of spread [26, 15, 2, 18, 17, 19]. We decided to use the methods recommended by Wan et al [26] and the subsequently optimized sample mean estimators of Luo et al [19] in our work for because they have been shown to clearly outperform the methods of Hozo [15] and Bland [2]. No known comparisons of the method of Luo et al [19] with those of Kwon and Reis [18, 17] have been performed. When comparing the methods recommended by Wan et al [26] with those of Kwon and Reis [18, 17], the skewness of the outcome distribution and summary measures reported have a stronger influence over the performance of the estimators compared to the choice of method. Since the method recommended by Wan et al [26] are more widely used in practice compared to those of Kwon and Reis [18, 17], we opted for the methods recommended by Wan et al [26] in this work.
Lastly, we revisit the data set originally collected and analyzed by Sohn [25]. Since all the primary studies reported the median of the outcome, the results of the simulation study suggest that the medianbased methods are most suitable for metaanalyzing the data. The preferred quantile estimation method estimates a pooled difference of medians of approximately 1 day [95% CI: 0.2, 1.9] between the Xpert and smear groups. This method also indicate that the primary studies included in the metaanalysis are highly heterogeneous (), consistent with the conclusions obtained in the original analysis.
References
 [1] Mark A. Beaumont, Wenyang Zhang, and David J. Balding. Approximate bayesian computation in population genetics. Genetics, 162(4):2025–2035, 2002.
 [2] Martin Bland. Estimating mean and standard deviation from the sample size, three quartiles, minimum, and maximum. International Journal of Statistics in Medical Research, 4(1):57–64, 2014.
 [3] M.V. Boldin, G.I. Simonova, and I.U.N. Ti_urin. Signbased Methods in Linear Statistical Models. Translations of mathematical monographs. American Mathematical Society, 1997.
 [4] Michael Borenstein, Larry V. Hedges, Julian P.T. Higgins, and Hannah R. Rothstein. A basic introduction to fixedeffect and randomeffects models for metaanalysis. Research Synthesis Methods, 1(2):97–111, 2010.
 [5] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190–1208, 1995.
 [6] William G. Cochran. The combination of estimates from different experiments. Biometrics, 10(1):101–129, 1954.
 [7] W.J. Conover. Practical nonparametric statistics. Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley, 1980.
 [8] Jonathan J Deeks, Douglas G Altman, and Michael J Bradburn. Statistical Methods for Examining Heterogeneity and Combining Results from Several Studies in MetaAnalysis, pages 285–312. BMJ Publishing Group, 2008.
 [9] Rebecca DerSimonian and Nan Laird. Metaanalysis in clinical trials. Controlled clinical trials, 7(3):177–188, 1986.
 [10] John D. Emerson, David C. Hoaglin, and Frederick Mosteller. A modified randomeffect procedure for combining risk difference in sets of 2×2 tables from clinical trials. Journal of the Italian Statistical Society, 2(3):269–290, Dec 1993.
 [11] Marco Geraci and Matteo Bottai. Linear quantile mixed models. Statistics and computing, 24(3):461–479, 2014.
 [12] Magnus Andersson Hagiwara, Anders Bremer, Andreas Claesson, Christer Axelsson, Gabriella Norberg, and Johan Herlitz. The impact of direct admission to a catheterisation lab/ccu in patients with stelevation myocardial infarction on the delay to reperfusion and early risk of death: results of a systematic review including metaanalysis. Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine, 22(1):67, Nov 2014.
 [13] Larry V Hedges and Jack L Vevea. Fixedand randomeffects models in metaanalysis. Psychological methods, 3(4):486, 1998.
 [14] Julian Higgins and Simon G Thompson. Quantifying heterogeneity in a metaanalysis. Statistics in medicine, 21(11):1539–1558, 2002.
 [15] Stela Pudar Hozo, Benjamin Djulbegovic, and Iztok Hozo. Estimating the mean and variance from the median, range, and the size of a sample. BMC medical research methodology, 5(1):13, 2005.
 [16] John Francis Kenney. Mathematics of Statistics, Part 1. Van Nostrand, 3 edition, 1962.
 [17] D. Kwon and I. M. Reis. Approximate Bayesian computation (ABC) coupled with Bayesian model averaging method for estimating mean and standard deviation. ArXiv eprints, July 2016.
 [18] Deukwoo Kwon and Isildinha M Reis. Simulationbased estimation of mean and standard deviation for metaanalysis via approximate bayesian computation (abc). BMC medical research methodology, 15(1):61, 2015.
 [19] Dehui Luo, Xiang Wan, Jiming Liu, and Tiejun Tong. Optimally estimating the sample mean from the sample size, median, midrange, and/or midquartile range. Statistical methods in medical research, page 0962280216669183, 2016.
 [20] S. McGrath, X. Zhao, Z. Zhen Qin, R. Steele, and A. Benedetti. Onesample aggregate data metaanalysis of medians. ArXiv eprints, September 2017.
 [21] Zhi Zhen Qin. Delays in diagnosis and treatment of pulmonary tuberculosis, and patient careseeking pathways in China: a systematic review and metaanalysis. PhD thesis, McGill University, October 2016.
 [22] Andrew L Rukhin. Estimating heterogeneity variance in metaanalysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(3):451–469, 2013.
 [23] Samuel Sanford Shapiro and Martin B Wilk. An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, 1965.
 [24] Jonathan J Shuster. Empirical vs natural weighting in random effects metaanalysis. Stat Med, 29(12):1259–1265, May 2010.
 [25] Hojoon Sohn. Improving tuberculosis diagnosis in vulnerable populations: impact and costeffectiveness of novel, rapid molecular assays. PhD thesis, McGill University, February 2016.
 [26] Xiang Wan, Wenqian Wang, Jiming Liu, and Tiejun Tong. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC medical research methodology, 14(1):135, 2014.
Appendix A Density Estimation via Approximate Bayesian Computation
In this section, we propose a Bayesian method for density estimation, which offers more wellestablished model selection and model averaging techniques. Using the same notation as in Section 2.3.3.1, we apply approximate Bayesian computation (ABC) to estimate from , , and . In standard Bayesian analyses, the posterior distribution is expressed as
where denotes the likelihood and denotes the prior distribution. The main idea behind ABC is that a random sample drawn from a correctly specified likelihood should be close to the observed data. In ABC, one samples parameter values from , denoted by , and then draws a sample from . We consider to be a draw from the approximate posterior distribution if the distance between the observed data and the generated pseudodata is sufficiently small. Specifically, we implement the following algorithm, adapted from the methods of Kwon and Reis [18, 17].
We begin with specifying candidate parametric families of distributions for the outcome variable, which we denote as . Let model be parametrized by for . We select a parametric family of distributions from a multinomial distribution with probabilities where initially for . Suppose model chosen for some fixed . We sample from the specified prior distributions corresponding to model . Then we simulate a data set, , with observations from the likelihood and compute summary statistics in the same form of the original study (i.e., , , or ). If the distance between the summary statistics of and the studyspecific summary statistics is sufficiently small, we retain
. After every 1,000 iterations, we update the vector
so that is the percentage of accepted parameter values for the model . After repetitions for some large , the distribution of the accepted approximate the posterior distribution of model . We fit using the means of their respective posterior distributions.We consider applying ABC with (i) single distribution selection and (ii) Bayesian model averaging to estimate . As used by Kwon and Reis [17], we denote the ABC method with single distribution selection as and denote the ABC method with Bayesian model averaging as .
For single distribution selection, we use Bayes’ factor for model selection. That is, we select the distribution with the highest posterior model probability. Kwon and Reis
[18] show that the posterior model probability of model is wellapproximated by the percentage of accepted parameter values for that model, . After selecting the parametric family of distributions with the highest posterior model probability, we estimate using the density function and distribution median of the fitted distribution.For Bayesian model averaging, let denote the probability density function of the fitted distribution evaluated at its distribution median under model by applying the ABC algorithm. Then
where is the estimated posterior model probability of model .
Consistent with the methods of Kwon and Reis [18, 17], the distance between summary statistics of the simulated and observed data is measured in the norm. Moreover, in standard practice, one often uses a small acceptance rate of sampled parameter values rather than a minimum threshold to select candidate parameter values [1]. Specifically, we fix an acceptance rate of sampled parameters at 0.1% and set the number of iterations to 20,000 for a given parametric family of distributions, as recommended in this context by Kwon and Reis [18, 17]. That is, the 20 sample parameter values yielding summary statistics closest to the observed data were used to model the posterior distributions. As with the least squares method, we consider the following candidate distributions: the normal, lognormal, Weibull, and gamma distributions. Table A.1 displays the prior distributions under , , and . As with the box constraints used in the LBFGSB algorithm, these prior distribution were selected based on the ABC algorithms of Kwon and Reis [18, 17].
Because of the high computation cost of the ABC methods, we apply these methods to the first 500 (out of the 1,000) simulated data sets for each combination of data generation parameters. In Table A.2, we display the relative error of the pooled estimates, variance of the pooled estimates, coverage of the 95% CIs of the pooled estimates, and bias of the heterogeneity estimates when the ABC methods were applied to the data generated in the primary analysis of the simulation study. In all scenarios, the ABC methods perform comparably to the QE method and its bestcase scenario (i.e., the QEBC method).
Scenario  Likelihood Distribution  

Normal  
LogNormal  
Gamma  
Weibull  
&  Normal  
LogNormal  
Gamma  
Weibull 
Relative Error  Variance  Coverage  Bias for Estimating Heterogeneity  

Scenario  Method  Normal  Mixture  Normal  Mixture  Normal  Mixture  Normal  Mixture 
QE  0.77 (11.80, 13.13)  0.92 (7.72, 9.64)  0.32  0.29  0.94  0.96  0.35 (0.49, 0.18)  0.49 (0.61, 0.35)  
1.14 (11.79, 12.42)  0.55 (7.64, 9.90)  0.33  0.29  0.92  0.95  0.26 (0.45, 0.61)  0.45 (0.58, 0.22)  
1.09 (11.71, 12.34)  0.58 (7.91, 9.92)  0.34  0.29  0.92  0.95  0.26 (0.45, 0.60)  0.46 (0.58, 0.22)  
QEBC  0.90 (11.71, 11.80)  1.47 (7.61, 10.68)  0.32  0.28  0.93  0.93  0.20 (0.44, 0.65)  0.16 (0.50, 0.63)  
QE  1.85 (12.07, 12.05)  0.44 (8.93, 9.26)  0.34  0.29  0.93  0.92  0.05 (0.41, 0.90)  0.17 (0.48, 0.58)  
1.54 (12.30, 11.09)  0.27 (9.08, 9.45)  0.34  0.29  0.93  0.92  0.11 (0.41, 0.84)  0.24 (0.50, 0.53)  
1.62 (12.08, 10.99)  0.24 (9.10, 9.42)  0.34  0.29  0.93  0.92  0.10 (0.41, 0.85)  0.24 (0.50, 0.56)  
QEBC  0.90 (11.71, 11.80)  1.47 (7.61, 10.68)  0.32  0.28  0.93  0.93  0.20 (0.44, 0.65)  0.16 (0.50, 0.63)  
QE  0.84 (11.51, 12.77)  0.78 (7.78, 9.80)  0.32  0.29  0.94  0.96  0.34 (0.49, 0.27)  0.48 (0.61, 0.34)  
1.26 (11.77, 12.35)  0.20 (7.84, 9.97)  0.34  0.29  0.92  0.95  0.18 (0.42, 0.76)  0.42 (0.56, 0.02)  
1.36 (11.72, 12.30)  0.15 (7.61, 9.93)  0.34  0.29  0.92  0.95  0.18 (0.41, 0.78)  0.42 (0.56, 0.01)  
QEBC  0.90 (11.71, 11.80)  1.47 (7.61, 10.68)  0.32  0.28  0.93  0.93  0.20 (0.44, 0.65)  0.16 (0.50, 0.63) 
Appendix B Modifications to the Proposed Methods
In the following subsections, we discuss several modifications to the proposed medianbased methods and evaluate the performance of the methods under these modifications.
b.1 Median of the Difference of Medians
In the main paper, we invert the sign test to construct a CI around the pooled estimate of the MDM method. When the number of studies is sufficiently large, one can use the normality approximation of the binomial distribution to construct an approximate 95% CI around the pooled estimate. Specifically, let
denote the number of studies and denote the 0.975 quantile of the standard normal distribution. The and quantiles of the studyspecific difference of medians are the lower and upper limits, respectively, of the approximate 95% CI of the MDM method [7]. We denote the MDM method with the normality approximation as MDMN.We applied the MDMN method to the data generated in our primary simulation study, and we compare its coverage and CI lengths to the MDM method. Theoretically, the coverage probability of the MDM method is approximately 97.85% for 10 studies and 95.72% for 30 studies. In Table B.1, we present the coverage and mean length of the CIs of these methods. In all scenarios considered, the MDMN had nearly equal coverage and smaller mean length of CIs when compared to to the MDM method.
b.2 Modifications for Density Estimation Methods
In some contexts, authors may assume that the underlying outcome distribution of the two groups in a primary study only differ by a location shift. In this case, one can gain greater precision estimating the variance of the difference of medians by averaging the density estimates over the two groups. This method is described in the following paragraph.
Using the notation of Section 2.3.3, we assume that . Under the location shift assumption, a new estimate of the variance of the difference of medians is given by
(2) 
where is given by
However, we found in our simulations the accuracy of the pooled estimates of the density estimation methods using (2) did not considerably improve the accuracy of the methods (data not shown).
Coverage  Mean Length of CIs  

N. Studies  Method  Normal  Mixture  Normal  Mixture 
10  MDM  0.95  0.94  3.37  3.24 
MDMN  0.95  0.94  2.82  2.66  
30  MDM  0.95  0.96  1.62  1.51 
MDMN  0.95  0.95  1.58  1.47 
Appendix C Sensitivity Analysis: A Mix of Means and Medians
Appendix D Fixed Effect Methods
In this section, we apply the inverse variance methods in fixed effect analyses. This includes both transformationbased methods as well as the QE and QEBC methods. Table D.1 gives the relative error, variance, and coverage of the 95% CIs of the inverse variances methods with fixed effect pooling. The results are similar to those of the random effects analyses, and the same overall conclusions and recommendations hold in the fixed effect case.
Relative Error  Variance  Coverage  

Scenario  Method  Normal  Mixture  Normal  Mixture  Normal  Mixture 
Wan et al  2.28 (13.02, 9.76)  32.22 (38.37, 26.49)  0.32  0.29  0.82  0.04  
Luo et al  0.18 (10.37, 12.53)  15.40 (21.59, 8.55)  0.29  0.32  0.81  0.40  
QE  0.13 (10.52, 12.62)  0.83 (8.01, 9.34)  0.33  0.29  0.91  0.95  
QEBC  0.10 (10.33, 11.80)  1.05 (7.96, 9.84)  0.31  0.28  0.90  0.89  
Wan et al  0.15 (9.84, 10.30)  21.73 (27.52, 16.67)  0.26  0.27  0.83  0.10  
Luo et al  0.16 (9.96, 10.49)  21.16 (27.00, 15.95)  0.26  0.27  0.84  0.11  
QE  0.14 (10.02, 11.22)  0.18 (9.16, 8.83)  0.31  0.30  0.88  0.88  
QEBC  0.10 (10.33, 11.80)  1.05 (7.96, 9.84)  0.31  0.28  0.90  0.89  
Wan et al  0.02 (10.81, 11.57)  11.55 (4.81, 18.67)  0.29  0.41  0.80  0.51  
Luo et al  0.17 (9.69, 10.30)  11.88 (17.59, 6.65)  0.24  0.27  0.85  0.49  
QE  0.08 (10.47, 12.60)  0.72 (8.13, 9.24)  0.32  0.29  0.91  0.95  
QEBC  0.10 (10.33, 11.80)  1.05 (7.96, 9.84)  0.31  0.28  0.90  0.89 
Comments
There are no comments yet.