# A Unified Method for Exact Inference in Random-effects Meta-analysis via Monte Carlo Conditioning

Random-effects meta-analyses have been widely applied in evidence synthesis for various types of medical studies to adequately address between-studies heterogeneity. However, standard inference methods for average treatment effects (e.g., restricted maximum likelihood estimation) usually underestimate statistical errors and possibly provide highly overconfident results under realistic situations; for instance, coverage probabilities of confidence intervals can be substantially below the nominal level. The main reason is that these inference methods rely on large sample approximations even though the number of synthesized studies is usually small or moderate in practice. Also, random-effects models typically include nuisance parameters, and these methods ignore variability in the estimation of such parameters. In this article we solve this problem using a unified inference method without large sample approximation for broad application to random-effects meta-analysis. The developed method provides accurate confidence intervals with coverage probabilities that are exactly the same as the nominal level. The exact confidence intervals are constructed based on the likelihood ratio test for an average treatment effect, and the exact p-value can be defined based on the conditional distribution given the maximum likelihood estimator of the nuisance parameters via the Monte Carlo conditioning method. As specific applications, we provide exact inference procedures for three types of meta-analysis: conventional univariate meta-analysis for pairwise treatment comparisons, meta-analysis of diagnostic test accuracy, and multiple treatment comparisons via network meta-analysis. We also illustrate the practical effectiveness of these methods via real data applications.

## Authors

• 21 publications
• 14 publications
• ### Improving the Accuracy of Confidence Intervals and Regions in Multivariate Random-effects Meta-analysis

Multivariate random-effects meta-analyses have been widely applied in ev...
06/20/2019 ∙ by Tsubasa Ito, et al. ∙ 0

• ### Permutation inference methods for multivariate meta-analysis

Multivariate meta-analysis is gaining prominence in evidence synthesis r...
09/11/2018 ∙ by Hisashi Noma, et al. ∙ 0

• ### Exponential Random Graph Models with Big Networks: Maximum Pseudolikelihood Estimation and the Parametric Bootstrap

With the growth of interest in network data across fields, the Exponenti...
08/08/2017 ∙ by Christian S. Schmid, et al. ∙ 0

• ### Statistical evaluation of in-vivo bioassays in regulatory toxicology considering males and females

The separate evaluation for males and females is the recent standard in ...
11/10/2020 ∙ by Ludwig A. Hothorn, et al. ∙ 0

• ### Prediction intervals for random-effects meta-analysis: a confidence distribution approach

For the inference of random-effects models in meta-analysis, the predict...
04/03/2018 ∙ by Kengo Nagashima, et al. ∙ 0

• ### Conditional Information and Inference in Response-Adaptive Allocation Designs

Response-adaptive allocation designs refer to a class of designs where t...
09/18/2019 ∙ by Adam Lane, et al. ∙ 0

• ### Identification and Estimation of Average Marginal Effects in Fixed Effects Logit Models

This article considers average marginal effects (AME) in a panel data fi...
05/03/2021 ∙ by Laurent Davezies, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In evidence-based medicine, meta-analysis has been an essential tool for quantitatively summarizing multiple studies and producing integrated evidence. In general, the treatment effects from different sources of evidence are heterogeneous due to various factors such as study designs, participating subjects, and treatment administration. Therefore, such heterogeneity should be adequately addressed when combining evidence sources, otherwise statistical errors may be seriously underestimated and possibly result in misleading conclusions (Higgins and Green, 2011). The random-effects model has been widely used to address these heterogeneities in most medical meta-analyses. The applications cover various types of systematic reviews, for example, conventional univariate meta-analysis (DerSimonian and Laird, 1986; Whitehead and Whitehead, 1991), meta-analysis of diagnostic test accuracy (Reitsuma et al., 2005), network meta-analysis for comparing the effectiveness of multiple treatments (Salanti, 2012), and individual participant meta-analysis (Riley et al., 2010).

However, in random effect meta-analysis, most existing standard inference methods for average treatment effect parameters underestimate statistical errors under realistic situations of medical meta-analysis, for example, the coverage probabilities of standard inference methods are usually smaller than the nominal confidence levels, even when the model is completely specified (Brockwell and Gordon, 2001; 2007). This may lead to highly overconfident conclusions. The main reason for this notable problem is that random-effects models typically include heterogeneity variance-covariance parameters other than the average treatment effect, and most inference methods depend on large sample approximations for the number of trials to be synthesized, which is usually small or moderate in medical meta-analysis. In other words, the variability of the estimation of nuisance parameters is possibly ignored and the total statistical error can be underestimated when constructing confidence intervals. Recently, several confidence intervals that aim to improve the undercoverage property have been developed, for example, by Brockwell and Gordon (2007), Henmi and Copas (2010), Jackson and Bowden (2009), Jackson and Riley (2014), Noma (2011), Noma et al. (2017), Guolo (2012), and Sidik and Jonkman (2002). Although coverage properties are relatively improved under realistic situations, the validity of these methods is substantially guaranteed using large sample approximations. Moreover, most of these methods were developed in the context of traditional direct pairwise comparisons; therefore, the methods have limited applicability in recent, more advanced types of meta-analysis that use the complicated multivariate models noted above.

In this paper, we develop a unified method for constructing accurate confidence intervals for random effect models so that their coverage probabilities would equal to the nominal level regardless of the number of studies. To effectively circumvent the effects of nuisance parameters, we consider the likelihood ratio test (LRT) for the average treatment effect, and we define its

-value based on the conditional distribution given the maximum likelihood estimator of the nuisance parameters rather than the unconditional distribution of the test statistic. For computing the

-value, we adopt the Monte Carlo conditioning technique proposed by Lindqvist and Taraldsen (2005), and the confidence interval can be derived by inverting the LRT. As a result, the derived confidence intervals are shown to have accurate coverage probabilities and the proposed method can be generally applied to various types of meta-analysis involving complicated multivariate random effect models.

In Section 2, we first provide an algorithm for computing the -value of the LRT and derive an accurate confidence interval under general statistical models. In Section 3, the proposed method is applied to the simplest univariate random-effects meta-analysis for direct pairwise comparisons. We conduct simulations for evaluating the empirical performances of the proposed confidence interval, and illustrate our method using the well-known meta-analysis of intravenous magnesium for suspected acute myocardial infarction (Teo et al., 1991). In Sections 4 and 5, we discuss the applications to meta-analysis for diagnostic test accuracy and network meta-analysis, respectively, with illustrations using real datasets. Discussion is provided in Section 6.

## 2 Algorithm for confidence interval

We suppose are independent and each has the density or probability mass function with parameter of interest and nuisance parameter . For example, in the univariate meta-analysis described in Section 3, and correspond to the number of studies and estimated treatment effect in the th study, respectively, and we use the model with known to estimate the average treatment effect , so that and in this case. In general, , and could be multivariate, but we assume in this section that all of them are one-dimensional in order to make our presentation simpler. Multivariate cases are considered in Sections 4 and 5

. The likelihood ratio test (LRT) statistic for testing null hypothesis

is

 Tϕ0(Y)=−2{maxψL(Y,ϕ0,ψ)−maxϕ,ψL(Y,ϕ,ψ)},

where , and is the log-likelihood function. Under some regularity conditions, the asymptotic distribution of under is as . However, when the sample size is not large as is often the case in meta-analysis, the approximation is not accurate enough. The main reason is that there is an unknown nuisance parameter and its estimation error is not ignorable when is not large. To overcome this problem, we calculate the -value of the statistic based on the conditional distribution , where is the maximum likelihood estimator of under . For computing the -value, we adopt the Monte Carlo conditioning developed in Lindqvist and Taraldsen (2005).

To describe the general methodology, we further assume that can be expressed as for some function

whose distribution is completely known. For example, when , it holds that with . Now, the maximum likelihood estimator under satisfies the following equation:

 Lψ(Y,ϕ0,ˆψ)=0,

where is the partial derivative of the likelihood function with respect to the nuisance parameter . Under , can be expressed as ; thereby, the above equation can be rewritten as

 δ(U,ˆψ,ψ)≡Lψ(H(U,ϕ0,ψ),ϕ0,ˆψ)=0.

We define as the solution of the above equation with respect to . Using the result in Lindqvist and Taraldsen (2005), the -value can be expressed as

 E[I{T(Y)≥t}|ˆψ]=E[I{T(H(U,ϕ0,ψ∗(U)))≥t}w(U)]E[w(U)], (1)

where the expectation is taken with respect to the distribution of , and

 w(U)=∣∣∣π(ψ)∂ˆψ/∂ψ∣∣∣ψ=ψ∗(U) (2)

with some function of . As noted in Lindqvist and Taraldsen (2005), the choice of controls the efficiency of the Monte Carlo approximation in (1). However, the detailed discussion of this issue would extend of the scope of this paper; thus, we consider in this paper only . The algorithm for computing the -value for testing H is given as follows.

###### Algorithm 1.

(Monte Carlo method for the -value of LRT)

• Compute the LRT and the constrained maximum likelihood estimator of under .

• For with large , generate a random sample, , and compute , and from (2).

• The Monte Carlo approximation of the -value is given by

 ∑Bb=1I{Tϕ0(Y(b)∗)≥Tϕ0(Y)}w(U(b))∑Bb=1w(U(b)). (3)

Using the -value of the LRT of , the confidence interval of with nominal level can be constructed as the set of such that the -value of the LRT of is larger than . Although the confidence limits cannot be expressed in closed form, they can be computed by simple numerical methods, for example the bisectional method that repeatedly bisects an interval and selects a subinterval in which a root exists until the process converges numerically, see Section 2 in Burden and Faires (2010). When or

are multivariate (vector-valued) parameters, this method can be easily extended to the case. When

is multivariate, which is typical in many applications, the absolute value symbol in the weight (2) should be recognized as determinant since is a matrix. On the other hand, when is multivariate, we need to construct a confidence region (CR) rather than a confidence interval. In this case, the bisectional method cannot be directly applied, and methods for CRs would depend on each setting. In Section 4, we present a diagnostic meta-analysis in which a CR is traditionally used, and provide a feasible algorithm to compute a CR.

## 3 Univariate random-effects meta-analysis

### 3.1 The random-effects model

The univariate random-effects model has been widely used in meta-analysis due to its parametric simplicity. However, the accuracy of the inference is poor when the number of studies is small. We consider solving this problem using the LRT and confidence interval introduced in Section 2. We assume that there are clinical trials and that are the estimated treatment effects. We consider the random-effect model:

 yi=θi+ei,    θi=μ+εi,   i=1,…,n, (4)

where is the true effect size of the th study, and is the average treatment effect. Here and are independent error terms within and across studies, respectively, assumed to be distributed as and . The within-studies variances s are usually assumed to be known and fixed to their valid estimates calculated from each study. On the other hand, the across variance is an unknown parameter representing the heterogeneity between studies. Under these settings, Hardy and Thompson (1996) considered the likelihood-based approach for estimating the average treatment effect .

### 3.2 Exact confidence intervals of model parameters

We first consider a confidence interval of by using Algorithm 1 in the previous section. To begin with, we consider the null hypothesis with nuisance parameter . Since under the model (4), the LRT statistic can be defined as

 Tμ0(Y)=minμ,τ2L(μ,τ2)−minτ2L(μ0,τ2),

where

 L(μ,τ2)=n∑i=1log(τ2+σ2i)+n∑i=1(yi−μ)2τ2+σ2i. (5)

The minimization can be achieved in standard ways such as the iterative method in Hardy and Thompson (1996).

From (5), the constrained maximum likelihood estimator of under H satisfies the following equation:

 n∑i=11ˆτ2c+σ2i−n∑i=1(yi−μ0)2(ˆτ2c+σ2i)2=0.

Let be random variables that which independently follow , then can be expressed as under H. Substituting the expression for in the above equation, we have

 G(U,τ2,ˆτ2c)≡n∑i=11ˆτ2c+σ2i−n∑i=1(τ2+σ2i)u2i(ˆτ2c+σ2i)2=0,

where . The above equation can be easily solved with respect to and the solution is given by

 τ2∗(U)={n∑i=1u2i(ˆτ2c+σ2i)2}−1{n∑i=1ˆτ2c+σ2i(1−u2i)(ˆτ2+σ2i)2}.

Regarding the weight (2), using the implicit function theorem, it holds that

 w(U) =∣∣∣∂G(U,τ2,ˆτ2c)/∂ˆτ2c∂G(U,τ2,ˆτ2c)/∂τ2∣∣∣τ2=τ2∗(U) ={n∑i=1u2i(ˆτ2c+σ2i)2}−1∣∣ ∣∣n∑i=12{τ2∗(U)+σ2i}u2i−(ˆτ2c+σ2i)(ˆτ2c+σ2i)3∣∣ ∣∣,

and thereby we can compute the -value of the LRT for H from Algorithm 1, and the confidence interval of by inverting the LRT.

A confidence interval of can be derived as well. By a similar derivation to that for , the -values of the LRT of H can be computed from Algorithm 1 with

 μ∗(U)=ˆμ−(n∑i=11τ20+σ2i)−1n∑i=1ui√τ20+σ2i,    and,   w(U)=1,

so that the confidence interval of can be similarly constructed.

### 3.3 Simulation study of coverage accuracy

We evaluated the finite sample performances of the proposed confidence interval of via Monte Carlo (MC) algorithm together with existing methods widely used in practice. We considered the restricted maximum likelihood (REML) method, the DirSimonian and Laird (DL) method (DirSimonian and Laird, 1986), and the likelihood ratio (LR) method (Hardy and Thompson, 1996). When implementing the MC method, we used Monte Carlo samples to compute the -value. We fixed the true average treatment effect at , and the heterogeneity variance at and . Following Rockwell and Gordon (2001), within-study variances

s were generated from a scaled chi-squared distribution with

degree of freedom, multiplied by , and truncated to lie within the interval . We changed the number of studies over and , and set the nominal level to . Based on simulation runs, we calculated the coverage probabilities (CP) and average lengths (AL) of the four confidence intervals. The results, shown in Table 1, indicate that the confidence intervals from the three methods other than MC are too liberal to achieve the appropriate nominal level even when . On the other hand, the MC method produces reasonable confidence intervals with appropriate coverage probabilities even when is .

### 3.4 Example: treatment of suspected acute myocardial infarction

Here we applied the proposed method to a meta-analysis of the treatment of suspected acute myocardial infarction with intravenous magnesium (Teo et al., 1991), which is well-known as it yielded conflicting results between meta-analyses and large clinical trials (LeLorier et al., 1997). For the dataset, we constructed a confidence interval of the average treatment effect of intravenous magnesium using the proposed MC method (with Monte Carlo samples in evaluating the -value) as well as the REML, DL and LR methods considered in the previous section. Moreover, we also applied Peto’s fixed effect (PFE) method (Yusuf, 1985). The results are given in Table 2. The confidence intervals from the DL, LR, REML, and PFE methods were narrower than that of the MC method since the first four methods cannot adequately account for the additional variability in estimating the between study variance . Therefore, the confidence intervals from the first four methods did not cover , which does not change the interpretation of the results. On the other hand, the proposed MC method produced a longer confidence interval than the other four methods while also covering , that is, the corresponding test for was not significant with a significant level.

## 4 Bivariate Meta-analysis of Diagnostic Test Accuracy

### 4.1 Bivariate random-effects model

There has been increasing interest in systematic reviews and meta-analyses of data from diagnostic accuracy studies. For this purpose, a bivariate random-effect model (Reitsma et al., 2005 and Harbord et al., 2007) is widely used. Following Reitsma et al. (2005), we define and

as the logit-transformed true sensitivity and specificity, respectively, in the

th study. The bivariate model assumes that

follows a bivariate normal distribution:

 (μAiμBi)∼N((μAμB),Σ)    with   Σ=(σ2AρσAσBρσAσBσ2B), (6)

where and are the average logit-transformed sensitivity and specificity, and and

are standard deviations of

and , respectively. Here the parameter allows correlation between and . The unknown parameters are and . Let and be the observed logit-transformed sensitivity and specificity, and we assume that

 (yAiyBi)∼N((μAiμBi),Ci)    with   Ci=(s2Ai00s2Bi). (7)

For summarizing the results of the meta-analysis, the CR of would be more valuable than separate confidence intervals since sensitivity and specificity might be highly correlated. Reitsma et al. (2005) suggested the joint CR for as the interior points of the ellipse defined as

 μA=ˆμA+cαˆsAcost,    μB=ˆμB+cαˆsBcos(t+arccosˆρ),    t∈[0,2π), (8)

where and are the maximum likelihood estimates of and , and

are estimated standard errors of

and , respectively, and is the square root of the upper point of the distribution with degrees of freedom. The joint CR (8) is approximately valid; specifically, the coverage error converges to as the number of studies goes to infinity. However, when is not sufficiently large, the coverage error is not negligible, and the region (8) undercovers the true .

### 4.2 Exact CR of sensitivity and specificity

We consider a CR of under the models (6) and (7) based on the Monte Carlo method given in Section 2. Let be a vector of nuisance parameters, and write rather than to clarify the dependence of . From (6) and (7), the LRT statistic of the null hypothesis H is given by

 Tμ0(Y)=minμ,ψL(μ,ψ)−minψL(μ0,ψ),

where

 L(μ,ψ)=n∑i=1log|Vi(ψ)|+n∑i=1(yi−μ)tVi(ψ)−1(yi−μ),

with .

Under H, the constrained maximum likelihood estimator satisfies the following equations:

 n∑i=1tr{Vi(ˆψc)−1Jk}−n∑i=1(yi−μ0)tVi(ˆψc)−1JkVi(ˆψc)−1(yi−μ)=0,    k=1,2,3,

where

 J1=(1000),    J2=(0001),    J3=(0110).

Under H, the observed data can be expressed as with and being the Cholesky decomposition of , that is, . Then the above equation can be rewritten as

 n∑i=1tr{Vi(ˆψc)−1Jk}−n∑i=1utiTi(ψ)tVi(ˆψc)−1JkVi(ˆψc)−1Ti(ψ)ui=0,    k=1,2,3,

and we define as the solution of the above equation with respect to . The solution can be numerically obtained by minimizing the sum of squared values of three equations with respect to . Moreover, concerning the weight (2), we may use the numerical derivative given . Hence, we can compute the -value of the LRT of H from Algorithm 1.

From the LRT of H, the CR of can be defined as , where denotes the -value of the test statistic . Since is two-dimensional in this case, the computing boundary is not straightforward. The most feasible procedure is to approximate the boundary with a sufficiently large numbers of points. To this end, we first divide the interval by points . For each , we compute satisfying

 p(ˆμ+(rmcostm,rmsintm))=α,

which can be carried out via numerical methods (e.g. the bisectional method).

### 4.3 Example: screening test accuracy for alcohol problems

Here we provide a re-analysis of the dataset given in Kriston et al. (2008), including studies regarding a short screening test for alcohol problems. Following Reitsuma et al. (2005), we used logit-transformed values of sensitivity and specificity, denoted by and , respectively, and associated standard errors and . For the bivariate summary data, we fitted the bivariate models (6) and (7), and computed CRs of based on the approximated CR of the form (8) given in Reitsma et al. (2005). Moreover, we computed the proposed CR with Monte Carlo samples for calculating -values of the LRT, and evaluation points that were smoothed by a 7-point moving average for the CR boundary. Following Reitsuma et al. (2005), the obtained two CRs of were transformed to the scale , where and are the sensitivity and false positive rate, respectively. The obtained two CRs are presented in Figure 1 with a plot of the observed data, summary points , and the summary receiver operating curve. The approximate CR is smaller than the proposed CR, which may indicate that the approximation method underestimates the variability of estimating nuisance variance parameters.

## 5 Network Meta-analysis

### 5.1 Multivariate random-effects model

Suppose there are treatments in contract to a reference treatment, and let be an estimator of relative treatment effect for the th treatment in the th study. We consider the following multivariate random-effects model:

 yi∼N(θi,Si),     θi∼N(β,Σ),     i=1,…,n. (9)

where , is a vector of true treatment effects in the th study, is a vector of average treatment effects, and is the within-study variance-covariance matrix. Here we focus on the model (9) known as the contrast-based model (Salanti et al., 2008; Dias and Ades, 2016), which is commonly used in practice.

In network meta-analysis, each study contains only treatments ( usually ranges from 2 to 5); thereby, several components in cannot be defined. When the corresponding treatments are not involved in the th study, the corresponding components in and are shrunk to the sub-vector and sub-matrix, respectively, in the model (9). Moreover, when the references treatment is not involved in the th study, we can adopt the data argumentation approach of White et al. (2012), in which a quasi-small data set is added to the reference arm, e.g. 0.001 events for 0.01 patients. To clarify the setting in which and are shrunk to the sub-vector and sub-matrix, respectively, we introduce an index , , representing the treatment estimates that are available in the th study, and define the -dimensional vector of ’s, excluding the th element that is equal to . Moreover, we define , and and are the shrunken -dimensional vector and matrix of and , respectively. The model (9) can be rewritten as

 yi∼N(Xiθi,Si),     θi∼N(β,Σ),    i=1,…,n. (10)

Regarding the structure of between study variance , since there are rarely enough studies to identify the unstructured model of , the compound symmetry structure is used in most cases (White, 2015), where is a matrix with all diagonal elements equal to 1 and all off-diagonal elements equal to .

We define , , , and with independently for each . The hierarchical model (10) can be expressed as the following random-effects model:

 y=Xβ+Zu+ε, (11)

where with . The unknown parameters in (11) are and . The log-likelihood of the model (11) is given by

 L(β,τ2)=−12log|V(τ2)|−12(y−Xβ)tV(τ2)−1(y−Xβ),

where and denotes the Kronecker product. Note that is an -dimensional vector and is the total number of comparisons.

### 5.2 Confidence interval of the average treatment effect

In network meta-analysis, we are interested in not only the average treatment effects in contrast to the reference treatment, but also the treatment differences . To handle these issues in a unified manner, we focus on the linear combination with known vector . Define a full-rank matrix such that the first element of is . The parameter is equivalent to when we use instead of in the model (11), so that it is sufficient to consider a confidence interval of .

Define and to be and matrices such that , and . The model (11) can be rewritten as

 y=W1β1+W2ω+Zu+ε. (12)

We first consider testing of H, noting that and are nuisance parameters. The LRT statistics can be defined as

 Tβ10(Y)=minβ1,ω,τ2L(β1,ω,τ2)−minω,τ2L(β10,ω,τ2),

where

 L(β1,ω,τ2)=log|V(τ2)|+(y−W1β1−W2ω)tV(τ2)−1(y−W1β1−W2ω).

Under H, the constrained maximum likelihood estimator and satisfy the following equations:

 Wt2V(ˆτ2c)−1r(β10,ˆωc)=0tr{V(ˆτ2c)−1Q}−r(β10,ˆωc)tV(ˆτ2c)−1QV(ˆτ2c)−1r(β10,ˆωc)=0, (13)

where and . Under H, it holds that for and is the Cholesky decomposition of such that . The equation (13) can be rewritten as

 G1(ˆωc,ˆτ2c,ω,τ2,u)≡Wt2V(ˆτ2c)−1r(ω,ˆωc,τ2,u)=0G2(ˆωc,ˆτ2c,ω,τ2,u)      ≡tr{V(ˆτ2c)−1Q}−r(ω,ˆωc,τ2,u)tV(ˆτ2c)−1QV(ˆτ2c)−1r(ω,ˆωc,τ2,u)=0, (14)

with . The solution of the first equation with respect to is given by

 ω∗(u,τ2)=ˆωc−{Wt2V(ˆτ2c)−1W2}−1Wt2V(ˆτ2c)−1A(τ2)u.

By replacing with in the second equation in (14), we obtain the following equation for :

 tr{V(ˆτ2c)−1Q}−utA(τ2)tB(ˆτ2c)V(ˆτ2c)−1QV(ˆτ2c)−1B(ˆτ2c)A(τ2)u=0,

where , and we define be the solution of the above equation with respect to . Hence, the solutions of (14) with respect to and are given by and , respectively. Concerning the weight function , from the implicit function theorem, it follows that

 w(U)=∣∣∣det(∂G/∂ˆωtc,∂G/∂ˆτ2)det(∂G/∂ωt,∂G/∂τ2)∣∣∣ω=ω∗(u),τ2=τ2∗(u),

where . From (14), we have

 ∂G1∂ωt=−∂G1∂ˆωtc=Wt2V(ˆτ2c)−1W2 ∂G2∂ωt=−∂G2∂ˆωtc=Wt2V(ˆτ2c)−1QV(ˆτ2c)−1{W2(ω−ˆωc)+A(τ2)u}.

On the other hand, because derivation of analytical expressions of the partial derivatives with respect to or requires tedious algebraic calculation, we can use numerical derivatives instead. Therefore, we can carry out Algorithm 1 in Section 2 to compute the -value of LRT, and the confidence interval can be obtained as well by inverting the LRT.

### 5.3 Example: Schizophrenia data

Ades et al. (2010) carried out a network meta-analysis of antipsychotic medication for prevention of relapse of schizophrenia; this analysis includes 15 trials comparing eight treatments with placebo. In each trial, the outcomes available were the four outcome states at the end of follow-up: relapse, discontinuation of treatment due to intolerable side effects and other reasons, not reaching any of the three endpoints, and still in remission. We here considered the last outcome and adopted the odds ratio as the treatment effect measure.

We applied the multivariate random-effects model (11) with a compound symmetry structure of . The reference treatment was set to “Placebo”. The estimates of between-studies standard deviation were 0.28 for the ML and 0.52 for the REML estimation methods, respectively, which shows that there is substantial heterogeneity between studies.

In Table 3 we present the results of three confidence intervals based on the MC method, the LR-based method with -value calculated by the asymptotic distribution, and the REML method. The number of Monte Carlo samples in applying the MC method was consistently set to . In this analysis, the confidence intervals of the MC method were much wider than those of the LR method, and indicated larger uncertainty in the inference of average treatment effects. On the other hand, the confidence intervals of the REML method were wider than those of the MC method in the last three treatments although the REML method produced narrower intervals than the MC method in the first five treatments; this may be due to the difference between the ML and REML estimates of between study standard deviation (0.28 for ML and 0.52 for REML).

## 6 Discussions

We developed a unified method for constructing confidence intervals of the average treatment effects in random-effects meta-analysis. The proposed confidence intervals are based on the LRT, and we proposed a Monte Carlo method to compute its -value. In terms of specific applications, we discussed three types of meta-analysis, univariate meta-analysis, diagnostic meta-analysis, and network meta-analysis, and demonstrated the usefulness of the proposed method. The developed inference method would be adapted to a variety of applications, e.g., the multivariate individual participant data meta-analysis (Burke et al., 2016). A limitation of the proposed methods might be rigorous justification of the exactness of the proposed inference methods. Although the maximum likelihood estimator of the variance parameters are sufficient statistics in the case that all the within-study variances are the same, this property might not hold rigorously, under general conditions. Although there were no theoretical proofs concerning the sufficiency of this estimator under general conditions, but we could clearly demonstrate the proposed methods could provide almost exact confidence intervals in the simulation studies. We guess the sufficiency property would be hold under more general settings, and the theoretical justifications would be one of relevant further issues.

In addition, the numerical results from our simulations and the illustrative examples suggest that statistical methods in the random-effects models should be selected carefully in practice. Historically, there have been many discrepant results between meta-analyses and subsequent large randomized clinical trials (LeLorier et al, 1997), and in these cases the meta-analyses have typically tended to provide false results as in the magnesium example in Section 3.4. Many systematic biases, for example, publication bias (Begg and Berlin, 1988; Easterbrook et al., 1991), might be important sources of these discrepancies, but we should also be aware of the risk of providing overconfident and misleading interpretations caused by the statistical methods based on large sample approximations. Considering these risks, accurate inference methods would be preferred in practice. Although there have not been any accurate inference methods that can be broadly applied in random-effects meta-analyses, our approach in this article may provide an explicit solution to this relevant problem.

Finally, methodological research on extensions of random-effects meta-analyses to more complicated statistical models are still in progress (e.g., multivariate network meta-analyses, Riley et al., 2017), and the small sample problems generally exist in most of these applications. Our methods are applicable to these complicated models as well as more advanced approaches that might arise in future research. The developed methods should be effective tools as a unified methodological framework to obtain accurate solutions in medical evidence synthesis.

Acknowledgement    This research was partially supported by JST CREST (grant number: JPMJCR1412) and JSPS KAKENHI (grant number: 17K19808, 15K15954, 18K12757).

## References

• [1]
• [2] Ades, A. E., Mavranezouli, I., Dias, S., Welton, N. J., Whittington, C. and Kendall, T. (2010). Network meta-analysis with competing risk outcomes. Value in Health, 13, 976-983.
• [3]
• [4] Begg, C. and Berlin, J. A. (1988). Publication bias: a problem in interpreting medical data. Journal of the Royal Statistical Society: Series A, 151, 419-463.
• [5]
• [6] Brockwell, S. E. and Gordon, I. R. (2001). A comparison of statistical methods for meta-analysis. Statistics in Medicine, 20, 825-840.
• [7]
• [8] Brockwell, S. E. and Gordon, I. R. (2007). A simple method for inference on an overall effect in meta-analysis. Statistics in Medicine, 26, 4531-4543.
• [9]
• [10] Burden, R. L. and Faires, J. D. (2010). Numerical Analysis, 9th Edition, Brooks-Cole Publishing.
• [11]
• [12] Burke, D. L., Ensor, J. and Riley, R. D. (2016). Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ. Statistics in Medicine, 36, 855-875.
• [13]
• [14] DerSimonian, R and Laird, N. M. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7, 177-188.
• [15]
• [16] Dias, S. and Ades, A. E. (2016). Absolute or relative effects? Arm-based synthesis of trial data. Research Synthesis Methods, 7, 23-28.
• [17]
• [18] Easterbrook, P. J., Gopalan, R., Berlin, J. A. and Matthews, D. R. (1991). Publication bias in clinical research. The Lancet, 337, 867-872.
• [19]
• [20] Guolo, A. (2012). Higher-order likelihood inference in meta-analysis and meta-regression. Statistics in Medicine, 31, 313-327.
• [21]
• [22] Harbord, R. M., Deeks, J. J., Egger, H., Whiting, P. and Sterne. J. A. C. (2007). A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics, 8, 239-251.
• [23]
• [24] Hardy, R. J. and Thompson, S.G. (1996). A likelihood approach to meta-analysis with random effects. Statistics in Medicine, 15, 619-629.
• [25]
• [26] Henmi, M and Copas J. (2010). Confidence intervals for random effects meta-analysis and robustness to publication bias. Statistics in Medicine, 29, 2969-2983.
• [27]
• [28] Higgins, J. P. T. and Green, S. (2011) Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0 (The Cochrane Collaboration, Oxford).
• [29]
• [30]

Jackson, D and Bowden, J. (2009). A re-evaluation of the ‘quantile approximation method’ for random effects meta-analysis.

Statistics in Medicine, 28, 338-348.
• [31]
• [32] Jackson, D. and Riley, R. D. (2014). A refined method for multivariate meta-analysis and meta-regression. Statistics in Medicine, 33, 541-554.
• [33]
• [34] Kriston, L., Höelzel, L., Weiser, A., Berner, M., and Haerter, M. (2008). Meta-analysis: Are 3 Questions Enough to Detect Unhealthy Alcohol Use? Annals of Internal Medicine, 149, 879-888.
• [35]
• [36] LeLorier, J., Gregoire, G., Benhaddad, A., Lapierre, J. and Derderian, F. (1997). Discrepancies between meta-analyses and subsequent large randomized, controlled trials. The New England Journal of Medicine, 337, 536-542.
• [37]
• [38] Lindqvist, B. H. and Taraldsen, G. (2005). Monte Carlo conditioning on a sufficient statistic. Biometrika, 92, 451-464.
• [39]
• [40] Lockhart, R. A., O’reilly, F. J. and Stephens, M. A. (2007). Use of the Gibbs sampler to obtain conditional tests, with applications. Biometrika, 94, 992-998.
• [41]
• [42] Lu, G. and Ades, A. E. (2006). Assessing evidence inconsistency in mixed treatment comparisons. Journal of the American Statistical Association, 101, 447-459.
• [43]
• [44] Noma, H. (2011). Confidence intervals for a random-effects meta-analysis based on Bartlett-type corrections. Statistics in Medicine, 30, 3304-3312.
• [45]
• [46] Noma, H., Nagashima, K., Maruo, K., Gosho, M. and Furukawa, T. A. (2017). Bartlett-type corrections and bootstrap adjustments of likelihood-based inference methods for network meta-analysis. Statistics in Medicine, to appear.
• [47]
• [48] Reitsma, J. B., Glas, A. S., Rutjes, A. W. S., Scholten, R. J. P. M., Bossuyt, P. M. and Zwinderman, A. H. (2005). Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of Clinical Epidemiology, 58, 982-90.
• [49]
• [50] Riley, R. D., Jackson, D., Salanti, G., Burke, D. L., Price, M., Kirkham, J. and White, I. R. (2017). Multivariate and network meta-analysis of multiple outcomes and multiple treatments: rationale, concepts, and examples BMJ, 358, j3932.
• [51]
• [52] Riley, R. D., Lambert, P. C. and Abo-Zaid, G. (2010). Meta-analysis of individual participant data: rationale, conduct, and reporting. BMJ, 340, c221.
• [53]
• [54] Salanti, G. (2012). Indirect and mixed-treatment comparison, network, or multiple- treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Research Synthesis Methods, 3, 80-97.
• [55]
• [56] Salanti, G, Higgins, J. P., Ades, A. and Ioannidis, J. P. (2008). Evaluation of networks of randomized trials. Statistical Methods in Medical Research, 17, 279-301.
• [57]
• [58] Sidik, K and Jonkman, J. N. (2002). A simple confidence interval for meta-analysis. Statistics in Medicine, 21, 3153-3159.
• [59]
• [60] Teo, K. K., Yusuf, S., Collins, R., Held, P. H. and Peto, R. (1991). Effects of intravenous magnesium in suspected acute myocardial infarction: overview of randomized trials. British Medical Journal, 303, 1499-1503.
• [61]
• [62] White, I. R. (2015). Network meta-analysis. The State Journal, 15, 951-985.
• [63]
• [64] White, I. R., Barrett, J. K., Jackson, D. and Higgins, J. P. T. (2012). Consistency and inconsistency in network meta-analysis: model estimation using multivariate meta-regression. Research Synthesis Methods, 3, 111-125.
• [65]
• [66] Whitehead, A. and Whitehead, J. (1991). A general parametric approach to the meta-analysis of randomized clinical trials. Statistics in Medicine, 10, 1665-1677.
• [67]
• [68] Yusuf, S., Peto, R., Lewis, J., Collins, R. and Sleight, P. (1985). Beta blockade during and after myocardial infarction: an overview of the randomized trials. Progress in Cardiovascular Diseases, 27, 355-371.
• [69]