:exclamation: This is a read-only mirror of the CRAN R package repository. pimeta — Prediction Intervals for Random-Effects Meta-Analysis
For the inference of random-effects models in meta-analysis, the prediction interval was proposed as a summary measure of the treatment effects that explains the heterogeneity in the target population. While the Higgins-Thompson-Spiegelhalter (HTS) plug-in-type prediction interval has been widely used, in which the heterogeneity parameter is replaced with its point estimate, its validity depends on a large sample approximation. Most meta-analyses, however, include less than 20 studies. It has been revealed that the validity of the HTS method is not assured under realistic situations, but no solution to this problem has been proposed in literature. Therefore, in this article, we describe our proposed prediction interval. Instead of using the plug-in scheme, we developed a bootstrap approach using an exact confidence distribution to account for the uncertainty in estimation of the heterogeneity parameter. Compared to the HTS method, the proposed method provides an accurate prediction interval that adequately explains the heterogeneity of treatment effects and the statistical error. Simulation studies demonstrated that the HTS method had poor coverage performance; by contrast, the coverage probabilities for the proposed method satisfactorily retained the nominal level. Applications to three published random-effects meta-analyses are presented.READ FULL TEXT VIEW PDF
The prediction interval has been increasingly used in meta-analyses as a...
Meta-analysis combines pertinent information from existing studies to pr...
In a spatial-temporal model, structural change and/or spatial heterogene...
We consider the problem of constructing distribution-free prediction set...
Systematic reviews aim to summarize all the available evidence relevant ...
Popular measures of meta-analysis heterogeneity, such as I^2, cannot be
Background: The area under the curve (AUC) of summary receiver operating...
:exclamation: This is a read-only mirror of the CRAN R package repository. pimeta — Prediction Intervals for Random-Effects Meta-Analysis
Prediction Intervals for Random-Effects Meta-Analysis
Meta-analysis is an important tool in scientific research for combining the results of multiple related studies. Two major approaches (i.e., fixed-effect models and random-effects models) have been widely applied. One frequently important objective of meta-analysis is to estimate the overall mean effect and its confidence interval.
Fixed-effect models assume that the true treatment effects are equal for all studies. The common treatment effect parameter estimate and its confidence interval provide valuable information for applying the results to other subpopulations. By contrast, random-effects models assume that the true treatment effects differ for each study. The average treatment effect across all studies and its confidence interval have been used together with heterogeneity measures that are also very important in terms of generalizability. For instance, the -statistic [2, 3] has been widely used as a heterogeneity measure. However, researchers have interpreted summary results from random-effects models as an estimate of the average treatment effect rather than the common treatment effect [4, 5], which means that they tend to ignore the heterogeneity. Subsequently, Higgins et al.  proposed a prediction interval for a treatment effect in a future study. It can be interpreted as the range of the predicted treatment effect in a new study, given the data. A prediction interval naturally takes into account the heterogeneity, and helps us apply the results to other subpopulations. Riley et al.  recommended that a prediction interval should be additionally reported along with a confidence interval and a heterogeneity measure.
The invalidity problem (i.e., the under-coverage property) of confidence intervals in random-effects meta-analysis has been well studied in the literature [6, 7, 8], since the number of synthesized studies is less than 20 in most meta-analyses in medical research  and a large sample approximation is not generally assured. By contrast, the small-sample problem of prediction intervals has not been very well examined so far, and there has been rising concern regarding this issue in meta-analysis. Recently, Partlett and Riley  revealed that the same problem occurs with prediction intervals. They showed that prediction intervals could have serious under-coverage properties under the general settings of medical meta-analyses, and it was considered that the ordinary methods of constructing prediction intervals, including the Higgins–Thompson–Spiegelhalter (HTS) prediction interval , are no longer valid. However, no explicit solution to this problem has been obtained thus far.
The HTS prediction interval has a fundamental problem. It can be regarded as a plug-in estimator that replaces the heterogeneity parameter with its point estimate . The distribution with degrees of freedom is used to approximately account for the uncertainty of , where is the number of studies. The replacement with the -approximation has a detrimental impact on the coverage probability, especially under a small number of studies. This is of particular concern since most meta-analyses include less than 20 studies. Thus, the HTS prediction interval can have severe under-coverage, as will be shown in Section 3. It is necessary to more precisely account for the uncertainty of .
In this article, to solve this very important problem, we develop a new prediction interval that is valid under more general and realistic settings of meta-analyses in medical research, including those whose is especially small. To avoid using a plug-in estimator, we propose a parametric bootstrap approach using a confidence distribution to account for the uncertainty of with an exact distribution estimator of [11, 12, 13, 14, 15]. A confidence distribution, like a Bayesian posterior, is considered as a distribution function to estimate the parameter of interest in frequentist inference.
This article is organized as follows. In Section 2, we first briefly review the random-effects meta-analysis and the HTS prediction interval, then we provide the new method to construct an accurate prediction interval. In Section 3, we assess the performance of the HTS prediction interval and the proposed prediction interval via simulations. In Section 4, we apply the developed method to three meta-analysis data sets. Finally, we conclude the paper with a brief discussion.
Let the random variable() be an effect size estimate from the -th study. The random-effects model can be defined as
where is the true effect size of the -th study, is the grand mean parameter of the average treatment effect, is the random error within a study, and is the random error across the studies. It is assumed that and are independent, with and
, where the within-studies variancesare known and replaced by their valid estimates [20, 21], and the across-studies variance is an unknown parameter that reflects the treatment effects heterogeneity.
Under Condition 1, the marginal distribution of
is a normal distribution with the meanand the variance .
Random-effects meta-analyses generally estimate to evaluate the average treatment effect and to evaluate the treatment effects heterogeneity. The average treatment effect is estimated by , where is an estimator of the heterogeneity parameter . Estimators of , such as the DerSimonian and Laird estimator , have been proposed by a number of researchers . In this paper, we shall discuss prediction intervals using the DerSimonian and Laird estimator that is defined as , with its untruncated version defined as , where is Cochran’s statistic, , , and for . Under Condition 1, Biggerstaff and Jackson  derived the exact distribution function of , , to obtain confidence intervals for . Cochran’s is a quadratic form that can be written , where , , , , , and the superscript ‘T’ denotes matrix transposition. Here and subsequently, , , , , , and .
The HTS prediction interval was proposed by Higgins et al. . Suppose is known, and the observation in a future study , where
is a standard error ofgiven , and . Assuming independence of and given , . Since is unknown, it should be replaced by an estimator . If is approximately distributed as , then , where is the standard error estimator of , and . By this approximation, the HTS prediction interval is obtained by
where is the percentile of the distribution with degrees of freedom. However, the -approximation is clearly inappropriate, and has a detrimental impact on the coverage probability.
Several HTS-type prediction intervals following restricted maximum likelihood (REML) estimation of have been proposed by Partlett and Reliy. For example, they discussed a HTS-type prediction interval following REML with the Hartung–Knapp variance estimator  (HTS-HK) that is defined as
and a HTS-type prediction interval following REML with the Sidik–Jonkman bias-corrected variance estimator  (HTS-SJ) that is defined as
, , the Hartung–Knapp variance estimator is defined as
the Sidik–Jonkman bias-corrected variance estimator
and . The HTS-HK and HTS-SJ prediction intervals can have a superior performance to other methods discussed in Partlett and Reliy for a large heterogeneity variance and .
However, the HTS prediction intervals could have severe under-coverage under certain conditions (see Section 3 and Partlett and Reliy). The uncertainty of the estimator of should be more precisely taken into account. Therefore, we consider a new prediction interval that is valid under a small number of studies.
As an alternative approach to address the issue discussed in Section 2.2, we propose a new prediction interval. The proposed method starts with assumptions that differ from those of Higgins et al.  in order to address a small number of studies, and accounts for the uncertainty of via a parametric bootstrap with the exact distribution of by using a confidence distribution (see Section 2.4).
From now on we make the following assumptions: Let the observation in a future study , given and , and and are independent.
In Hartung  and Hartung and Knapp , it was shown that assuming normality of , is -distributed with degrees of freedom, and is stochastically independent of , where , and . By replacing in with an appropriate estimate , is approximately -distributed with degrees of freedom, where , and .
The above assumptions and results lead to a system of equations,
where and . Solving for in (2) yields
and the prediction distribution has the same distribution as the statistic . By replacing in (3) with an appropriate estimator (not an estimate), we have
and an approximate prediction distribution can be given by the distribution of .
We use the untruncated estimator here, because we do not need the truncation to consider the distribution of an estimator of .
Hence, can be evaluated by the distribution of .
Since includes three random components, , , and , this gives the following algorithm for the proposed prediction interval.
An algorithm for the proposed prediction interval.
Generate bootstrap samples () that are drawn from the exact distribution of , that are drawn from , and that are drawn from .
Calculate , and , where , , and .
Calculate the prediction limits and that are and percentage points of , respectively.
However, an algorithm for sampling from the exact distribution of has not been studied. We will discuss below the exact distribution of and a sampling method from the exact distribution.
In frequentist inference, sometimes a distribution function like a Bayesian posterior is needed to estimate a parameter of interest. Confidence distribution is an appropriate solution in this situation. Confidence distribution is a distribution estimator that can be defined and interpreted in a frequentist framework in which the parameter is a non-random quantity. A confidence distribution of the parameter of interest, as described below, can be easily defined by the cumulative distribution function of a statistic, which includes the parameter of interest. Confidence distribution has a theoretical relationship to the fiducial approach, and recent developments [11, 12, 13, 14, 15] provide useful statistical tools that are more widely applicable than the previous method. For example, Efron’s bootstrap distribution  is a confidence distribution and a distribution estimator of a parameter of interest. In meta-analysis, the -profile method for an approximate confidence interval for  can be considered as an application of confidence distribution . In this section, we propose the exact distribution of , which is a distribution function for estimating the parameter using a confidence distribution, and then develop a method of sampling from the exact distribution. A useful theorem (Theorem 1) is proved that provides conditions in the case of a statistic with a continuous cumulative distribution function.
The following definition of a confidence distribution was presented in . In the definition, is the parameter space of the unknown parameter of interest ,
is a random vector, andis the sample space corresponding to sample data .
(R1) A function on is called a confidence distribution for a parameter ; (R2) If for each given , is a cumulative distribution function on ; (R3) At the true parameter value , , as a function of the sample
, follows the uniform distribution.
If a cumulative distribution function of a statistic, , is , and is a continuous and strictly monotonic (without loss of generality, assume it is decreasing) function in with the parameter space for each sample , then is a confidence distribution for that satisfies the requirements in Definition 1.
Under Condition 1, is a confidence distribution for .
It follows from the above that we propose an algorithm of sampling from the confidence distribution, , where is the observed value of . By applying Lemma 2 and the inverse transformation method, if is distributed as then follows the distribution . A sample can be computed by a numerical inversion  of , where is an observed value of the random variable . If , then the sample is truncated to zero (). It follows from Lemma 1 that is the distribution function of a positive linear combination of random variables. It can be calculated with the Farebrother’s algorithm .
We assessed the empirical properties of the HTS and the proposed prediction intervals via simulation studies.
Simulation data was generated by the random-effects model (1), assuming independent normal errors and . We conducted two sets of simulations described below.
, parameter settings that mimic meta-analyses for estimating an overall mean log odds-ratio were determined in simulation (i). The average treatment effectwas fixed at , because the coverage probability is not dependent on the value of . The across-studies variance was set to , or [38, 39]. The within-studies variances were generated from a scaled distribution with one degree of freedom, multiplied by , then truncated to lie within . The number of studies was set to , or .
In reference to Partlett and Reliy, parameter settings were determined to evaluate the empirical performance of prediction intervals under various relative degrees of heterogeneity scenarios in simulation (ii). The within-studies variances were generated from , an average within-study variance was set to , and the study sample size was set to , where is a random number from a distribution with degrees of freedom. The degree of heterogeneity is controlled using the ratio . The heterogeneity parameter was set to , or , which corresponds to , or . The average treatment effect was fixed at . The number of studies was set to , or .
For each setting, we simulated 25 000 replications. For each method, two-tailed 95% prediction intervals were calculated. The number of bootstrap samples was set to 5 000. The coverage probability was estimated by the proportion of simulated prediction intervals containing the result of a future study that was generated from a normal distribution .
The results of simulation (i) are presented in Figure 1. The HTS prediction interval could not achieve the nominal level of 95%, and the coverage probabilities were around 90%. Since most meta-analyses include less than 20 studies , the coverage performance of the HTS prediction interval is therefore insufficient. The under-coverage of the HTS prediction interval reflects the rough -approximation; in other words, the uncertainty of is ignored in the HTS prediction interval. The results show that the HTS-HK and HTS-SJ prediction intervals have similar performance. The coverage probabilities for the HTS-HK and HTS-SJ prediction intervals almost retained the nominal level except in situations where the relative degree of heterogeneity is small or moderate. For example, the coverage probabilities of the HTS-HK prediction interval were 86.0%–93.3% for , 91.0%–94.1% for , and 92.5%–94.3% for ; the coverage probabilities of the HTS-SJ prediction interval were 84.1%–93.3% for , 90.1%–94.1% for , and 92.5%–94.4% for . By contrast, the coverage probabilities for the proposed prediction interval almost always retained the nominal level. The only exception was when and , where the coverage probability for the proposed prediction interval was 93.6%, which was slightly below the nominal level. However, in this case, the coverage probability for the HTS, HTS-HJ, and HTS-SJ prediction intervals were even smaller, at 91.1%, 86.0%, and 84.1%, respectively. Analyses using a very small numbers of studies () pose problems for random-effects models, as discussed by Higgins et al. . Nevertheless, the proposed method performed well even when . It should be noted that, the nominal level was attained for almost all values of the heterogeneity parameter in the proposed prediction interval, and this parameter had very little effect on the interval’s performance.
The results of simulation (ii) are presented in Figure 2. The results show that all HTS prediction intervals have similar performance except for . The coverage probabilities for all HTS prediction intervals almost retained the nominal level for and . The coverage probabilities were too large for and too small for and . In the case of , the coverage probabilities of the HTS-HJ and HTS-SJ prediction intervals were too small for , and the coverage probability of the HTS prediction interval was too small for . By contrast, the coverage probabilities for the proposed prediction interval almost always retained the nominal level. The only exception was when , where the coverage probabilities for the proposed prediction interval were 93.3%–95.5%, which was slightly below the nominal level.
In summary, the HTS prediction intervals had poor coverage performance except in situations where the relative degree of heterogeneity is large, and may show severe under-coverage under realistic meta-analysis settings involving medical research, possibly providing misleading results and interpretations. By contrast, since the proposed prediction interval could mostly achieve the nominal level, it can be recommended in practice.
We applied the methods to three published random-effects meta-analyses. The three data sets were
Systolic blood pressure (SBP) data: Riley et al.  analyzed a hypothetical meta-analysis with a random-effects model. They supposed that the data included 10 studies of the same antihypertensive drug.
These data sets are reproduced in Figure 3. The number of bootstrap samples was set to 50 000.
|95%CI (DL)||[0.19, 0.53]||[0.55, 0.30]||[0.48, 0.18]|
|-value for heterogeneity||0.209||0.012||0.001|
|95%PI||Proposed||[0.13, 0.85]||[0.89, 0.02]||[0.88, 0.23]|
|HTS||[0.02, 0.74]||[0.84, 0.02]||[0.76, 0.09]|
|HTS-HK||[0.05, 0.67]||[0.78, 0.06]||[0.99, 0.33]|
|HTS-SJ||[0.06, 0.67]||[0.77, 0.07]||[0.98, 0.33]|
Table 1 presents the estimated results for the average treatment effect and its confidence interval, heterogeneity measures, the -value for the test of heterogeneity, the proposed prediction interval, and the HTS prediction intervals. None of the confidence intervals for the average treatment effect included (Set-shifting data: ; Pain data: ; SBP data: ). This means that on average the interventions are significantly effective. However, substantial between-studies heterogeneities were observed in the three data sets (Set-shifting data: , ; Pain data: , ; SBP data: , ). Accounting for the heterogeneities, prediction intervals would provide additional relevant statistical information.
As shown in Figure 3 and summarized in Table 1, the proposed prediction intervals (Set-shifting data: , length = ; Pain data: , length =; SBP data: , length = ) were consistently wider than the HTS prediction intervals (Set-shifting data: , length = ; Pain data: , length = ; SBP data: , length = ). The lengths of the proposed prediction intervals were , , and wider than the HTS prediction intervals for the Set-shifting data, Pain data, and SBP data, respectively. The HTS-HK (Set-shifting data: , length = ; Pain data: , length =; SBP data: , length = ) and HTS-SJ (Set-shifting data: , length = ; Pain data: , length =; SBP data: , length = ) prediction intervals give similar results.
The prediction intervals may lead to different interpretations of the results. Especially in the Pain data, the HTS prediction intervals did not include , meaning that the intervention may be beneficial in most subpopulations. On the other hand, the proposed prediction interval included , which indicates that the intervention may not be beneficial in some subpopulations. The simulation results in Section 3 suggest that the HTS prediction intervals could have under-coverage in situations where the relative degree of heterogeneity is small or moderate. Since of three data sets and of Set-shifting and Pain data were small (), it may be too narrow under realistic situations and may provide misleading results. By contrast, our proposed method enables adequate evaluations of the statistical error in the predictive inference.
For the random-effects model in meta-analysis, the average treatment effect and its confidence interval have been used with heterogeneity measures such as the -statistic and . However, results from random-effects models have sometimes been misinterpreted. Thus, the new concept “prediction interval” was proposed, which is useful in applying the results to other subpopulations and in decision making. The HTS prediction intervals have a theoretical problem, namely that its rough -approximation could have a detrimental impact on the coverage probability. Therefore, we have presented an appropriate prediction interval to account for the uncertainty of by using a confidence distribution. We also proved a useful theorem for applying confidence distribution.
Simulation studies showed that the HTS prediction intervals could have severe under-coverage for realistic meta-analysis settings and might lead to misleading results and interpretations. The simulation results suggested that the HTS prediction interval may be too narrow when considering a small number of studies. This interval would be valid if , but such a large number of studies can rarely be expected in common meta-analysis settings. The HTS-HK and HTS-SJ prediction intervals may be too narrow when the relative degree of heterogeneity is small. By contrast, the coverage probabilities for the proposed prediction interval satisfactorily retained the nominal level. Although Higgins et al.  cautioned that the random-effects model may not work well under very small numbers of studies (), the proposed method performed well even when . Since the heterogeneity parameter had very little effect on the performance of the proposed prediction interval, the method would be valid regardless of the value of the heterogeneity parameter.
Applications to the three published random-effects meta-analyses concluded that substantially different results and interpretations might be obtained from the prediction intervals. Since the HTS prediction interval is always narrower and the HTS-HK and HTS-SJ prediction intervals are narrower when the heterogeneity parameter is small or moderate, we should be cautious in using and interpreting these approaches.
In conclusion, we showed that the proposed prediction interval works well and can be recommended for random-effects meta-analysis in practice. As shown in the three illustrative examples, quite different results and interpretations might be obtained with our new method. Extensions of these results to other complicated models such as network meta-analysis are now warranted.
(R1) Since is a continuous distribution function, is continuous on . (R2) By the continuity of , a derivative, , exists, and . By (R1) and the monotone decreasingness of , and . Therefore, can be written as . Writing , we find . Thus, is clearly a cumulative distribution function on . (R3) At the true parameter value , it follows that . Thus, by Definition 1, is a confidence distribution for the parameter , and is a confidence density function for . ∎
From Lemma 1, the cumulative distribution function of is . is a continuous and strictly decreasing function in . Note that we have considered the untruncated version of an estimator of with the parameter space , and can be negative. Applying Theorem 1, we easily show that is a confidence distribution for . ∎
This study was supported by CREST from the Japan Science and Technology Agency (Grant number: JPMJCR1412).