Improving the Accuracy of Confidence Intervals and Regions in Multivariate Random-effects Meta-analysis

06/20/2019 ∙ by Tsubasa Ito, et al. ∙ 0

Multivariate random-effects meta-analyses have been widely applied in evidence synthesis for various types of medical studies. However, standard inference methods usually underestimate statistical errors and possibly provide highly overconfident results under realistic situations since they ignore the variability in the estimation of variance parameters. In this article, we propose new improved inference methods without any repeated calculations such as Bootstrap or Monte Carlo methods. We employ distributional properties of ordinary least squares and residuals under random-effects models, and provide relatively simple and closed form expressions of confidence intervals and regions whose coverages are more accurate than conventional methods such as restricted maximum likelihood methods. As specific applications, we consider two multivariate meta-analysis: bivariate meta-analysis of diagnostic test accuracy and multiple treatment comparisons via network meta-analysis. We also illustrate the practical effectiveness of these methods via real data applications and simulation studies.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In evidence-based medicine, meta-analysis is recognized as an essential tool for quantitatively summarizing multiple studies and producing integrated evidence. In general, the treatment effects from different sources for studies are heterogeneous due to various factors, which should be adequately addressed to avoid underestimation of statistical errors and misleading conclusions (Higgins and Green, 2011). To this end, random-effects models are widely adopted in most medical meta-analyses. The applications cover various types of systematic reviews, for example, conventional univariate meta-analysis (DerSimonian and Laird, 1986; Whitehead and Whitehead, 1991), bivariate meta-analysis of diagnostic test accuracy (Reitsma et al., 2005), network meta-analysis for comparing the effectiveness of multiple treatments (Salanti, 2012), and individual participant meta-analysis (Riley et al., 2010).

In random-effects meta-analyses, standard inference methods depend on large sample approximations for the number of studies synthesized, e.g., the DerSimonian-Laird methods (DerSimonian and Laird, 1986) and its extension (Chen et al., 2012; Jackson et al., 2010; Jackson et al., 2018) and restricted maximum likelihood (REML) estimation (Jackson et al., 2011)

, but the numbers of trials are often moderate or small. In this situation, validities of the inference methods can be violated, which would lead over-confidence results, that is, coverage probabilities of the confidence regions or intervals cannot retain their nominal confidence levels and also the type-I error probabilities of the corresponding tests can be inflated. Such problem with random-effects models was well recognized in the context of both univariate and multivariate meta-analysis, even when the models are completely specified

(Veroniki et al., 2019). Recently, several refined methods have been proposed to overcome this issue (e.g. Jackson and Riley, 2014; Hartung and Knapp, 2001; Noma et al., 2018; Noma et al., 2019; Sugasawa and Noma, 2019), but most existing methods are computationally intensive based on Monte Carlo or Bootstrap methods or lack of theoretical justification. On the other hand, Noma (2011) proposed Bartlett-type corrections and provided closed form expressions of refined confidence intervals whose coverage is second order accurate. However, this method is only applicable to the univariate meta-analysis and the derivation of such confidence intervals or regions under more complicated multivariate random-effects models would be quite difficult. Although Zucker et al. (2000)

proposed an improved likelihood test in linear mixed models based on asymptotic expansions of the (restricted) maximum likelihood estimators, the expression of the test statistics includes tedious algebraic expressions, which is not useful in practice.

In this paper, we consider a new improved inference method in general multivariate meta-analysis without using repeated calculations such as Monte Carlo and Bootstrap methods, so that the method has low computational complexity. Moreover, we pursue two additional properties, that is, the method has theoretical justification and has relatively simple expressions for confidence intervals or regions. To this end, we employ the distributional properties between the ordinary least squares estimator and residuals, and define a class of estimators of variance parameters in random-effects models. Then, we find a relatively simple formula for asymptotic approximation of the coverage probability of the crude Wald-type confidence intervals and regions, and construct a second order accurate confidence intervals and regions. For specific applications, we provide refined confidence regions in bivariate meta-analysis for diagnostic test accuracy and refined confidence intervals in network meta-analysis for multiple treatment comparison. In both cases, we carry out simulation studies to evaluate the finite sample performance of the proposed methods compared with the existing standard methods, and demonstrate the practical usefulness using datasets.

This paper is set out as follows. In Section 2, we describe the proposed confidence intervals and regions under general multivariate random-effects models. In Sections 3 and 4, the general results are applied to bivariate meta-analysis and network meta-analysis, in which simulation studies and applications to real detests are given. We conclude with a short discussion in Section 5.

2 Improved Inference in Random-effects Meta-analysis

2.1 Multivariate Random-effects models

We first consider the general random-effects model for multivariate meta-analysis. Let denote an estimator of the th outcome measure in the th study (;

). Typically, mean difference, risk difference, odds ratios and hazard ratios are used for effect measures, and the ratio measures are usually log-transformed to allow normal approximations. Let

be the

-dimensional vector of true overall average effects, and

be the corresponding random effects. In practical situation, each study often reports only a subset of outcomes, that is, a subset of is observed. Let be the number of observed outcomes and be the -dimensional vector of observed outcomes. Further, we define two design matrices and showing which outcomes are observed in the th study. The details will be discussed in specific applications given in Sections 3 and 4.

Here we consider the following multivariate random-effects model:


where , with known , and is a vector of unknown parameters in . Throughout the paper, we assume that the elements of ’s are uniformly bounded. The model (1) can be expressed as the following matrix form , where , , and and are defined in the same way.

Under the formulation, one is interested in inference on with a -dimensional vector or with matrix . For instance, when , , which is the average effect in the first element of . Since is a scalar and is a vector, confidence intervals and regions, respectively, are typically used for statistical inference on these parameters. We consider Wald-Type confidence intervals and regions with higher order accuracy.

The variance parameter in is a nuisance parameter, so that the variability of the estimation should be adequately taken into account for valid inference on and . Here we restrict the class for estimators of that satisfies the following three conditions:

  • is an even function of and translation invariant, that is, , and for any .

  • is -consistent and is second-order unbiased, namely and .

  • is a function of with .

The first condition (C1) is typically satisfied by typical estimators including (restricted) maximum likelihood estimator and moment-based estimators. The

-consistency in (C2) is also a standard condition, but second order unbiasedness of is not always satisfied. For example, the maximum likelihood (ML) estimator does not hold the property. The condition (C3) requires that the estimator should be function of residuals based on ordinary least squares estimator of , which is a key assumption of the proposed confidence intervals and regions in this paper. The condition enables to get a relatively simple form of corrected confidence intervals and regions. Note that the typical estimators (e.g. REML) does not satisfy the condition (C3). The detailed estimator which satisfies all the conditions are discussed later, which leads to a new estimator of . For given , can be estimated by the generalized lead squares estimator . We consider inference on or based on the estimator and with satisfying conditions (C1)(C3).

2.2 Improved confidence intervals and regions

We first consider confidence intervals of . Let be the variance of , that is, , noting that . Then, the Wald-type confidence interval of is given by for some , which we call naive confidence interval. For example, if with being the upper

-quantile of

, the coverage probability of the confidence intervals is approximately , so that the confidence interval is asymptotically valid. However, the coverage error is as demonstrated in the proof of Theorem 1, which cannot be negligible when is small or moderate as common in practical meta-analysis. To overcome the difficulty, we modify the confidence intervals by adequately calibrating . Such calibration can be done based on the asymptotic expansion of the coverage probability of the naive confidence interval, which usually has a tedious algebraic expression. On the other hand, it is shown that and are mutually independent under (C3) and the asymptotic bias of is negligible under condition (C2), both of which make the asymptotic formula for the coverage probability considerably simple. Based on the asymptotic expansion, we can define the corrected confidence interval


where and , noting that . The coverage probability of the corrected confided interval is correct up to the second order as shown in the following Theorem, where the proof is given in the Supplementary Material.

Theorem 1.

Under the conditions (C1)(C3), .

For using the corrected confided interval, we need to derive the analytical expression of , which can be calculated based on the (asymptotic) expression of up to . The detailed derivation is demonstrated in network meta-analysis in Section 4.

We next consider confidence regions of . The naive confidence region of can be defined as , and , the upper -quantile of distribution, leads to the approximate confidence region with the nominal level . However, it has considerable coverage error of as demonstrated in the proof of Theorem 2, we consider corrected coverage regions. To this end, we define the following three quantities:


where . Note that and are . Then, the proposed corrected confidence region is given by


where is the dimension of and

The coverage accuracy of the corrected confidence region is given in the following Theorem.

Theorem 2.

Under the conditions (C1)(C3), .

For using the confidence region (4), we need to find the approximation formula for and , which can be done in specific settings. We demonstrate the proposed confidence region in bivariate meta-analysis in Section 3, and provide detailed approximated expressions of the three quantities.

3 Bivariate meta-analysis for diagnostic test accuracy

3.1 Model settings

There has been increasing interest in systematic reviews and meta-analyses of data from diagnostic accuracy studies. For this purpose, a bivariate random-effect model (Reitsma et al., 2005; Harbord et al., 2007) is widely used. Following Reitsma et al. (2005), we define and

as the logit-transformed true sensitivity and specificity, respectively, in the

th study. Let and be the observed logit-transformed sensitivity and specificity, and and

are associated standard errors. The bivariate model assumes that


follow bivariate normal distributions:


where is a vector of the average logit-transformed sensitivity and specificity, and . The model (5) is a special case with in (1). Here is unstructured, so that it allows correlation between and . The unknown parameters are and .

3.2 Confidence region

For summarizing the results of the meta-analysis, we consider CRs of since sensitivity and specificity might be highly correlated. Reitsma et al. (2005) suggested the joint CR for as the interior points of the ellipse defined as


where for is the generalized least squares estimator of , is the variance-covariance matrix of , is the restricted maximum likelihood estimator and is the upper point of the distribution with degrees of freedom. The joint CR (6) is approximately valid, that is, the coverage error converges to as the number of studies goes to infinity. However, when is not sufficiently large, the coverage error is not negligible, and the region (6) would under-cover the true .

We consider the following moment-based estimator of :

for . Let be the non-negative definite bias corrected moment estimator, that is, with , which satisfies all the conditions (C1)(C3). Based on the simple form of , we can derive the approximation of , and in (3), that is, we can obtain which satisfy , where


for . The detailed derivation is given in the Supplementary Material. Using , the corrected confidence region is obtained from (4) with and .

3.3 Simulation study

We assessed the finite sample performance of the proposed confidence region (4) together with the approximate confidence region (6) by Reitsma et al. (2005). We set and . We used the between study variances of , and , and the between study correlations of and . Following, Jackson and Riley (2014), for each simulation, two sets of

within-study variances were simulated from a scaled chi-squared distribution with 1 degree of freedom, multiplied by 0.25, and truncated to lie within the interval

. We changed the number of studies over 8,12 and 16, and set the nominal level to

. In the 1000 simulations, we evaluated empirical coverage probabilities for 95% confidence regions of the true parameters. For simplicity, we evaluated coverage rates assessing rejection rates of the test of null hypothesis for the true parameters. Since areas of the corrected confidence region is approximately

times larger than those of naive ones, we also computed median values of .

The results of the simulations are shown in Table 1. The simulated coverage probabilities of the standard method, ACR, are seriously smaller than the nominal level (), especially in the case with the small number of studies (). Such undesirable results would come from the crude approximation in (6). On the other hand, the simulated coverage probabilities of the proposed CCR are around the nominal level in all the scenarios. Under high correlation (), the proposed CCR tends to be conservative, but it provides reasonable credible regions judging from the reported values of . Also it is reasonable to observe that decreases as the number of samples increases.

8 74.8 97.2 1.23 74.2 98.8 1.39 75.1 99.5 1.98
0.5 12 77.1 96.8 0.65 76.9 96.2 0.76 76.3 97.9 1.53
16 80.4 95.4 0.46 80.0 96.0 0.55 77.6 98.1 1.29
8 85.8 97.1 0.81 85.3 97.3 0.90 82.3 99.3 1.62
0.75 12 88.5 96.1 0.49 85.7 94.9 0.53 82.8 98.8 1.20
16 87.3 94.7 0.35 88.0 95.4 0.38 85.4 98.3 0.80
8 89.0 95.8 0.67 87.0 96.2 0.74 85.2 98.3 1.34
1 12 91.1 95.7 0.42 88.4 94.6 0.45 88.4 98.3 0.81
16 91.2 95.3 0.31 89.5 94.7 0.33 90.4 98.0 0.58
Table 1: Simulated coverage probabilities (%) the proposed corrected confidence region (CCR), and naive confidence region (NCR) with confidence level.

3.4 Example: screening test accuracy for alcohol problems

Here we provide a re-analysis of the dataset given in Kriston et al. (2008), including studies regarding a short screening test for alcohol problems. Following Reitsma et al. (2005), we used logit-transformed values of sensitivity and specificity, denoted by and , respectively, and associated standard errors and . For the bivariate summary data, we fitted the bivariate models (5), and computed CRs of based on NCR (6) given in Reitsma et al. (2005) and the proposed CCR. Following Reitsma et al. (2005), the obtained two CRs of were transformed to the scale , where and are the sensitivity and false positive rate, respectively. The obtained two CRs are presented in Figure 1 with a plot of the observed data, summary points , and the summary receiver operating curve. The approximate CR is smaller than the proposed CR, which may indicate that the approximation method underestimates the variability of estimating nuisance variance parameters.

Figure 1: The approximate and corrected CRs and summary receiver operating characteristics (SROC) curve.

4 Network meta-analysis

4.1 Model settings

Suppose there are treatments in contract to a reference treatment, and let be an estimator of relative treatment effect for the th treatment in the th study. In network meta-analysis, each study contains only treatments ( usually ranges from 2 to 5); thereby, several components in cannot be obtained. When the corresponding treatments are not involved in the th study, the corresponding components in and the within-study variance-covariance matrix are shrunk to the sub-vector and sub-matrix, respectively. Moreover, when the references treatment is not involved in the th study, we can adopt the data argumentation approach of White et al. (2012), in which a quasi-small data set is added to the reference arm, e.g. 0.001 events for 0.01 patients. To clarify the setting in which and are shrunk to the sub-vector and sub-matrix, respectively, we introduce an index , , representing the treatment estimates that are available in the th study, and define the -dimensional vector of ’s, excluding the th element that is equal to . Moreover, we define , and and are the shrunken -dimensional vector and matrix of and , respectively. Then the model can be expressed as the multivariate random effects model given in (1). This model is known as the contrast-based model (Salanti et al., 2008; Dias and Ades, 2016), which is commonly used in practice. Regarding the structure of between study variance , since there are rarely enough studies to identify the unstructured model of , the compound symmetry structure is used in most cases (White, 2015), where is a matrix with all diagonal elements equal to 1 and all off-diagonal elements equal to .

4.2 Confidence interval

Since and with . Then the crude moment estimator of is given by , where , and with and . When th study contains the quasi-small data, diagonal elements of have very large values, so that the corresponding diagonal element of is very likely to be negative and it would produce a negative value of . To avoid this problem, we introduce for matrix with being ’s diagonal element. Then, we use the following estimator:


Note that is a function of , so that satisfies the condition (C3). On the other hand, the second-order bias of is given by

where . Then the bias corrected estimator of is given by , which satisfies all the conditions (C1)(C3). Using the expression of , we can derive the following approximation of which satisfies :


The details of the derivation is given in the Supplementary Material. Using the expression (9), we can employ the corrected confidence interval of developed in Section 2.2.

4.3 Simulation study

We investigate the performance of the proposed Monte Carlo (MC) method under practical network meta-analysis scenarios. We compared the coverage probabilities of the MC method with those of widely used standard methods: the Wald-type confidence intervals based on REML estimates, the LR-based confidence interval. Throughout the experiments, we set the nominal level to . Following Noma et al. (2018), we considered a quadrangular network comparing treatments (A, B, C, and D, regarding A as a reference). The numbers of trials were set to and and the detailed designs of trials are presented in Table 2. To approximate practical situations of medical meta-analyses, we mimicked the simulation settings considered by Sidik and Jonkman (2007). We first generated binomial data from , where , and corresponds to the treatments A, B, C, and D, respectively. The response rate of treatment ,

, was generated from a continuous uniform distribution on

and we set for and 3, which means that is odds ratio (ORs) to the reference treatment A, i.e. . Also, the OR parameters were generated from a multivariate normal distribution , where is a vector of the true average treatment effects set to . The sample sizes were set to equal one another, for any and were drawn from a discrete uniform distribution on 20 and 200. From the generated binomial data ’s, we calculated trial-specific summary statistics and in the standard manner (Higgins and Green, 2011). In the 5000 simulations, we evaluated empirical coverage probabilities and average lengths of 95% confidence intervals of the true parameters.

The results of the simulations are shown in Table 3. In general, the coverage probabilities of the REML confidence intervals are sightly better than the LR confidence intervals. However, they showed under-coverage properties under moderate number of studies () and large heterogeneity (). On the other hand, the coverage probabilities of the proposed MC method were generally around the nominal level () in most cases. Under the small number of studies and large heterogeneity (), the coverage rates were relatively low, but even under these scenarios, they performed better than the ML and REML methods.

A vs. B 1 2 2
A vs. C 3 4 6
A vs. D 1 2 3
B vs. C 1
B vs. D 1 1
C vs. D 1 1 1
A vs. C vs. D 1 1 1
B vs. C vs. D 1 1 1
Table 2: Numbers of trials for each study design included in the simulation studies in Section 4.3.
Coverage Probability Average Length
0.3 92.3 90.8 97.8 1.22 1.14 1.79
0.3 92.0 90.3 97.7 0.76 0.71 1.13
0.3 92.5 90.7 97.8 0.92 0.86 1.37
0.4 90.9 88.8 96.4 1.41 1.30 1.88
8 0.4 90.9 89.2 96.7 0.88 0.82 1.19
0.4 90.8 88.9 96.8 1.07 0.99 1.43
0.6 90.8 88.3 93.6 1.85 1.69 2.09
0.6 90.8 88.5 93.7 1.17 1.07 1.33
0.6 90.4 88.1 93.7 1.41 1.29 1.60
0.3 92.3 91.2 98.0 0.89 0.85 1.28
0.3 92.5 91.5 98.2 0.67 0.64 0.97
0.3 92.6 91.3 98.1 0.76 0.72 1.08
0.4 92.0 90.7 97.1 1.05 0.99 1.36
12 0.4 90.7 89.5 96.3 0.79 0.75 1.02
0.4 92.6 91.2 96.8 0.88 0.84 1.15
0.6 92.3 90.9 94.0 1.39 1.31 1.51
0.6 92.4 91.1 94.1 1.04 0.99 1.14
0.6 92.9 91.3 94.2 1.17 1.11 1.28
0.3 93.0 92.2 97.9 0.81 0.78 1.14
0.3 92.5 91.6 98.1 0.57 0.55 0.80
0.3 92.9 92.3 98.3 0.68 0.65 0.96
0.4 92.2 90.8 96.8 0.95 0.91 1.20
16 0.4 92.6 91.9 97.0 0.67 0.64 0.84
0.4 92.9 91.8 97.1 0.80 0.77 1.01
0.6 93.0 91.9 94.5 1.25 1.21 1.34
0.6 93.0 91.9 94.3 0.88 0.85 0.95
0.6 92.5 91.7 94.4 1.05 1.01 1.13
Table 3: Simulated coverage probabilities (%) and average length of confidence intervals from the proposed CCI, REML and LR methods.

4.4 Example: Smoking cessation data

We considered network meta-analysis based on 24 studies that compared 4 smoking cessation counseling programs (no contact, self-help, individual counseling, and group counseling), which was used in Lu and Ades (2006) and Noma et al. (2018). The outcome was successful smoking cessation at 6 to 12 months, and the comparative efficacy was assessed using odds ratios for the response rates based on each treatment arm. We applied the standard methods using ML and REML as well as the proposed methods with multivariate random-effects model. The reference treatment was set to be “no contact.” In Table 4, we present confidence intervals based on the proposed method (denoted by CCI) as well as the REML method and the likelihood ratio (LR) method. It is observed that the proposed methods produce uniformly wider confidence intervals than the other methods, which shows that the proposed CCI method would adequately quantify the statistical errors as confirmed in the simulation study.

no contact v.s. REML LR CCI
self-help (-0.383, 0.951) (-0.365, 0.939) (-0.556, 1.21)
individual counseling (0.288, 1.09) (0.297, 1.09) (0.217, 1.27)
group counseling (-0.174, 1.35) (-0.157, 1.34) (-0.297, 1.65)
Table 4: confidence intervals based on the proposed method (denoted by CCI) and REML and likelihood ratio (LR) methods in the application to network-meta analysis of smoking cessation data.

5 Discussion

In this paper, we presented new inference methods for multivariate meta-analysis without using repeated calculations such as Monte Carlo or Bootstrap methods. The proposed confidence intervals and regions have relatively simple form and they are shown to have second order accurate coverage probabilities while the standard inference methods (e.g. REML and LR) have significant coverage errors. In simulation studies, we demonstrated that possible undercoverage properties of the standard methods under the small number of studies to be synthesized while the proposed method provides reasonable coverage properties.

Although we provided a general expression of the refined confidence intervals and regions in section 2, the expressions depend on the specific forms of estimators of variance parameters satisfying three conditions given in section 2. In Sections 3 and 4, we employed simple bias corrected estimators of variance parameters, but there might be another useful estimator satisfying the three conditions. However, the detailed investigation would extend the scope of this paper and we left it for a future study.

A possible limitation of the proposed method might be that the coverage accuracy still depends on the number of studies to be synthesized. On the other hand, inference methods that does not rely on large sample approximation have been recently proposed (e.g. Noma et al., 2019; Sugasawa and Noma, 2019), which are computationally intensive, so they would not be necessarily practical. Then, the proposed method would be regarded as a reasonable compromise between methods with exact empirical coverage and computational efficiency.


This research was supported by Japan Society of Promotion of Science KAKENHI (grant number: 18K12757).


  • Chen et al. (2012) Chen, H., A. K. Manning, and J. Dupuis (2012). A method of moments estimator for random effect multivariate meta-analysis. Biometrics 68, 1278–1284.
  • DerSimonian and Laird (1986) DerSimonian, R. and N. M. Laird (1986). Meta-analysis in clinical trials. Controlled Clinical Trials 7, 177–188.
  • Dias and Ades (2016) Dias, S. and A. E. Ades (2016). Absolute or relative effects? arm-based synthesis of trial data. Research Synthesis Methods 7, 23–28.
  • Harbord et al. (2007) Harbord, R. M., J. J. Deeks, H. Egger, P. Whiting, and J. A. C. Sterne (2007). A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 8, 239–251.
  • Hartung and Knapp (2001) Hartung, J. and G. Knapp (2001). A refined method for the meta-analysis of controlled clinical trials with binary outcome. Statistics in Medicine 20, 3875–3889.
  • Higgins and Green (2011) Higgins, J. P. T. and S. Green (2011). Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0. The Cochrane Collaboration.
  • Jackson et al. (2018) Jackson, D., S. Bujkiewicz, M. Law, R. D. Riley, and I. R. White (2018). A matrix-based method of moments for fitting multivariate network meta-analysis models with multiple outcomes and random inconsistency effects. Biometrics 74, 548–556.
  • Jackson et al. (2011) Jackson, D., R. Riley, and I. R. White (2011). Multivariate meta-analysis: potential and promise. Statistics in Medicine 30, 2481–2498.
  • Jackson and Riley (2014) Jackson, D. and R. D. Riley (2014). A refined method for multivariate meta-analysis and meta-regression. Statistics in Medicine 33, 541–554.
  • Jackson et al. (2010) Jackson, D., I. R. White, and S. G. Thompson (2010). xtending dersimonian and laird’s methodology to perform multivariate random effects meta-analyses. Statistics in Medicine 29, 1282–1297.
  • Kriston et al. (2008) Kriston, L., L. Höelzel, A. Weiser, M. Berner, and M. Haerter (2008). Meta-analysis: Are 3 questions enough to detect unhealthy alcohol use? Annals of Internal Medicine 149, 879–888.
  • Lu and Ades (2006) Lu, G. and A. E. Ades (2006). Assessing evidence consistency in mixed treatment comparisons. Journal of the American Statistical Association 101, 447–459.
  • Noma (2011) Noma, H. (2011). Confidence intervals for a random-effects meta-analysis based on bartlett-type corrections. Statistics in Medicine 30, 3304–3312.
  • Noma et al. (2019) Noma, H., K. Nagashima, and T. Furukawa (2019). Permutation inference methods for multivariate meta-analysis. arXiv:1809.03935.
  • Noma et al. (2018) Noma, H., K. Nagashima, K. Maruo, M. Gosho, and T. A. Furukawa (2018). Bartlett-type corrections and bootstrap adjustments of likelihood-based inference methods for network meta-analysis. Statistics in Medicine 37, 1178–1190.
  • Reitsma et al. (2005) Reitsma, J. B., A. S. Glas, A. W. S. Rutjes, R. J. P. M. Scholten, P. M. Bossuyt, and A. H. Zwinderman (2005). Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of Clinical Epidemiology 58, 982–990.
  • Riley et al. (2010) Riley, R. D., P. C. Lambert, and G. Abo-Zaid (2010). Meta-analysis of individual participant data: rationale, conduct, and reporting. BMJ 340, c221.
  • Salanti (2012) Salanti, G. (2012). Indirect and mixed-treatment comparison, network, or multiple- treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Research Synthesis Methods 3, 80–97.
  • Salanti et al. (2008) Salanti, G., J. P. Higgins, A. Ades, and J. P. Ioannidis (2008). Evaluation of networks of randomized trials. Statistical Methods in Medical Research 17, 279–301.
  • Sidik and Jonkman (2007) Sidik, K. and J. N. Jonkman (2007). A comparison of heterogeneity variance estimators in combining results of studies. Statistics in Medicine 26, 1964–1981.
  • Sugasawa and Noma (2019) Sugasawa, S. and H. Noma (2019). A unified method for improved inference in random effect meta-analysis. Biostatistics to appear.
  • Veroniki et al. (2019) Veroniki, A. A., D. Jackson, R. Bender, O. Kuss, D. Langan, J. P. T. Higgins, G. Knapp, and G. Salanti (2019). Methods to calculate uncertainty in the estimated overall effect size from a random-effects meta-analysis. Research Synthesis Methods 10, 23–43.
  • White (2015) White, I. R. (2015). Network meta-analysis. The State Journal 15, 951–985.
  • White et al. (2012) White, I. R., J. K. Barrett, D. Jackson, and J. P. Higgins (2012). Consistency and inconsistency in network meta-analysis: model estimation using multivariate meta-regression. Research Synthesis Methods 3, 111–125.
  • Whitehead and Whitehead (1991) Whitehead, A. and J. Whitehead (1991). A general parametric approach to the meta-analysis of randomized clinical trials. Statistics in Medicine 10, 1665–1677.
  • Zucker et al. (2000) Zucker, D. M., O. Lieberman, and O. Manor (2000). Improved small sample inference in the mixed linear model: Bartlett correction and adjusted likelihood. Journal of the Royal Statistical Society: Series B 62, 827–838.

S1 Key lemmas

We first introduce lemmas which play important roles in the proofs of Theorems 1 and Theorem 2. The first lemma is used for deriving the conditional distribution of and .

Lemma S1.

Under the conditions (C1)-(C3) given in the main document, is independent of for . Also, is a function of , and independent of .


The covariance of and is

which implies that is independent of from the normality assumption.

Now, we write as and as . Since and from (C3), we have

which implies that is invariance with respect to the translation . Moreover, is maximal invariant with respect to the translation since and implies that for . Then, is a function of from Theorem 2 in Berger (1985), p.403. ∎

In the next lemma, we show the first order bias of the plug-in estimator is approximately the same as the negative covariance of .

Lemma S2.

Under the conditions (C1)-(C3), it holds that


We will show the Lemma by directly comparing both sides of the equation in the Lemma. Noting that and for some non-singular matrices and , we have

Since and from the condition (C2), we have


Then, for we have


where the last equality holds since

is a second-order unbiased estimator of


Next, we evaluate the first term of the right side of the equation in the Lemma. We can write as

In order to approximate the covariance of up to the order , we expand and as

The straightforward calculation shows that

thereby we have