The Optimal Design of Clinical Trials with Potential Biomarker Effects, A Novel Computational Approach

05/21/2020 ∙ by Yitao Lu, et al. ∙ University of Victoria 0

As a future trend of healthcare, personalized medicine tailors medical treatments to individual patients. It requires to identify a subset of patients with the best response to treatment. The subset can be defined by a biomarker (e.g. expression of a gene) and its cutoff value. Topics on subset identification have received massive attention. There are over 2 million hits by keyword searches on Google Scholar. However, how to properly incorporate the identified subsets/biomarkers to design clinical trials is not trivial and rarely discussed in the literature, which leads to a gap between research results and real-world drug development. To fill in this gap, we formulate the problem of clinical trial design into an optimization problem involving high-dimensional integration, and propose a novel computational solution based on Monte-Carlo and smoothing methods. Our method utilizes the modern techniques of General-Purpose computing on Graphics Processing Units for large-scale parallel computing. Compared to the standard method in three-dimensional problems, our approach is more accurate and 133 times faster. This advantage increases when dimensionality increases. Our method is scalable to higher-dimensional problems since the precision bound is a finite number not affected by dimensionality. Our software will be available on GitHub and CRAN, which can be applied to guide the design of clinical trials to incorporate the biomarker better. Although our research is motivated by the design of clinical trials, the method can be used widely to solve other optimization problems involving high-dimensional integration.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Personalized medicine is the future trend of healthcare, which tailors medical treatments to individual patients. When treatments are given to the correct subset of patients who have positive responses, patients can receive the most suitable treatment according to their personal characteristics, and pharmaceutical companies can increase the chance of success in their clinical trials. The subset of patients who respond to a drug is usually defined by a predictive biomarker (e.g. expression level of a gene or a protein) and its threshold value. Biomarkers are typically molecular (e.g. expression level of a gene) related to increased benefit (or toxicity) from a particular drug [3]. For example, when HER2 expression level exceeds a threshold (i.e. over-expressed), patients respond well to multiple cancer drugs, including Trastuzumab [5]. Clinical trials with patient selection based on the expression level of HER2 typically show a positive effect in approximately 20% of metastatic breast cancer patients [5, 17]. Similarly, the Pembrolizumab is a new immune-oncology drug, which is only prescribed to cancer patients with high PD-L1 levels [7]. On the other hand, here is an example of failed drug development due to its ignorance of predictive biomarkers. In 2003, the US Food & Drug Administration (FDA) gave accelerated approval to Gefitinib for lung cancer patients who failed with platinum-based chemotherapy. This approval allows the drug to be sold to patients before its Phase III trial, but its formal approval still requests upon a successful Phase III trial. However, in 2005, FDA withdrew the approval because of lack of evidence of drug efficacy in the subsequent Phase III trial. The post-trial analysis found that only patients with gain-of-function mutations in the ATP-binding domain of the receptor respond to Gefitinib [13]. In 2015, the FDA granted final approval Gefitinib to patients with EFGR gene mutations as the biomarker. In summary, using biomarkers to select subpopulations for clinical trials not only can increase the chance of success but also decrease the costs (time and money) in drug development. More importantly, we can save lives by forcing non-responders to use alternative treatments.

In the community of clinical and biology research, ‘subset identification’ received massive attention. As of May 01, 2020, there are about hits by searching this keyword on Google Scholar. However, there is a gap between the research of subset/biomarker identification and its application in real-world drug development. To fill in this gap, we need to pay extra attention as it is challenging to incorporate the identified subsets/biomarkers into the design of Phase III clinical trials. In drug development, Phase III clinical trials are the most important trials, since their purposes are to confirm the drug efficacy and also get approval to the market from regulatory agencies (e.g. FDA). The challenge of using subsets in real-world drug development is that, before conducting Phase III studies, we hardly ever have enough data to confirm the correctness of identified subsets or biomarkers. Hence, it is hard to decide whether or not to use them in trial design, because there is a considerable risk associated with either decision. On the other hand, it is also not practical to use the post-trial data to confirm the subsets or biomarkers, because regulatory agencies can not directly accept the post-study-identified biomarkers or subsets. FDA requests an additional Phase III trial to confirm the results, which leads to a substantial financial loss of multi-million dollars and the delay of multiple years in highly competitive competitions of drug development. Such significant losses are not acceptable for most companies.

Before conducting Phase III trials, most often, we know a potential predictive biomarker (e.g. the expression level of a gene or a protein) but need to figure out its threshold value to identify the subset. Trying multiple threshold values of a biomarker leads to considering drug efficacy in multiple subsets of patients and the entire population. At the design stage, we can seldom know which subset (or entire population) is the true responder set. The current standard approach of the pharmaceutical industry is to migrate the risks by testing drug efficacy both on the entire population and all candidate subsets. When multiple tests are conducted in a single clinical trial, regulatory agencies require controlling the Family-Wised Error Rates (FWER) defined as the probability of false positive in any of the conducted tests. To control FWER, Freidlin et al.

[6] and Jiang et al. [11] proposed designs for alpha-allocation, which enforce the sum of the p-value thresholds of all tests being equal to . As a generalization of Bonferroni correction, Alpha-allocation keeps the assumption that all tests are independent. However, these tests are not independent since they are conducted on the ‘nested’ populations. Therefore, an adjusted method is proposed in Chen et al. [2] to incorporate such dependency. And it allows controlling FWER level at , while the sum of all p-value thresholds exceeds . The authors formulated the problem of the clinical trial design into a constrained optimization problem, which maximizes the study power with the constraint of controlling FWER no more than . They provided a computational solution to find the best p-value thresholds for the tests on two nested populations (‘one’ subset and the entire population). They proposed the optimization problem for two-subset (i.e. three nested population) design problem, but did not provide a computational solution and application examples. In real-world applications, the major challenge in multi-subsets design is the heavy computation load in solving such optimization problems involving higher dimensional integrations. Chen et al. [4] provided a computationally feasible strategy to solve multi-subsets clinical design problems. However, this approach over-simplified the optimization problem by ignoring uncertainty in the design parameters (i.e. replacing the prior distributions by fixed values).

In this paper, we consider the more realistic optimization problems proposed in Chen et al. [2]

and propose a novel computational approach to find the optimal solution for multi-subsets problems in a reasonable amount of time. To solve the optimization problems with high-dimensional integrations, we use on the Monte-Carlo and smoothing methods. Due to the popularity of deep learning, the software and hardware support in General-Purpose computing using Graphics Processing Units (GPU) have exploded recently. We design our algorithm to be compatible with GPU computing, which is

times faster than a standard algorithm using CPU to solve optimization problems of three nested populations. The effect of speed boost increases when dimensionality increases. Our method is also scalable to higher-dimensional problems since the precision bound is a finite number, which is not affected by dimensionality.

In the rest of the paper, in Section 2, we provide notations, introduce problems and challenges in the design of multi-subpopulation clinical trials, and propose a novel computational approach to the optimal design; in Section 3, we illustrate the applications of our method to design multiple clinical trials and compare their speed and precision with the optimal designs obtained by the standard algorithm. Section 4 and 5 contain discussions and conclusions. We provide all mathematical proofs and derivations in the Appendix.

2. Method

2.1. Notations and the Optimization Problem

In Phase III clinical trials, the primary outcome is a time-to-event variable, such as overall survival or progression-free survival in oncology trials. The drug efficacy in the study is tested based on the fitted Cox regression model, specifically based on the Z-statistics defined by the treatment effect on the drug versus placebo. Such tests are conducted on the patients entire population and their multiple subsets simultaneously. We keep the setting used in [2] and extend the problem to the higher dimensional spaces. Therefore, we assume the potential predictive biomarker is a continuous variable (e.g. the expression level of a gene or a protein). A threshold value of this biomarker is needed to distinguish responders and non-responders, but this value is unknown.

We assume different threshold values will be tested, which correspond to nested sub-populations of patients. In the notation below, we order the nested populations by their sample size from large to small. We denote the -th subpopulation contains proportion of patients in the entire population, for and . As defined in [2], the total information units are denoted by , which is of the total number of events expected to be observed in the entire population, hence the information units for the -th subset are . The value of is decided by the sample size of the clinical trial, and it is considered as fixed input in our optimization problem. We denote to be the Z-statistics of treatment effects in these nested populations.

We compare the Z-statistic against the threshold value , which is the quantile of standard normal distribution. If the test is significant in any population , the drug is considered as effective, and the largest effective subpopulation is defined as responders. The task of clinical trial design is to decide the values of the significant levels

, which maximize study power and keep overall type I error ( i.e. FWER) under control.

Under the null hypothesis, the drug is not effective in any of these nested populations, hence

for all

’s. In Section A.1, we derive that the joint distribution of these Z-statistics is a multivariate normal as


We denote as the desired upper bound of overall type I error, whose value is often set as for one-sided test or for two-sided test. Since the drug is considered as effective if it is significant in any tested populations, the FWER should be the probability of making any false positive call in these tests. So the constraint of optimization is



is the cumulative distribution function of the multivariate normal distribution defined in Formula (


Under the alternative hypothesis, the drug is effective in at least one population. To calculate the power, we need to specify the drug effect in each population. We describe the drug effect using , where is the hazard reduction of the survival model in the -th population. As a trivial extension of [10, 2], the joint alternative distribution of is


Investigating the actual effect size of study drug is always the primary goal of clinical trials. Hence, the exact value of is always unknown at the design stage of clinical trials. To incorporate the uncertainty of real effect size, we use the prior joint distribution (instead of fixed values) of drug effect in all these nested populations as an input parameter of study design, whose density function is denoted as . Such prior knowledge of effect sizes ’s are usually obtained from the analysis of the results of early phase (i.e. Phase I or II) trials of the same drug or relevant literature of similar studies.

The expected power with respect to the prior distribution of is given below in Formula (4)


In summary, the optimal design problem of clinical trials is to find the best values of the vector

which can be written as a constrained optimal problem below, that maximizes the expected power defined in Formula (4) subjected to the constraint of Formula (2),


In this problem, () are fixed-value input parameters, whose values are determined by the proportion relationship of nested subpopulations. The is also a fixed-value parameter whose value is determined by sample size of the entire population. The prior distribution of expected drug effect is learned from early phase studies of the same drug or literature.

To solve problem (5), we first re-parametrize this -dimensional constrained problem into a dimensional unconstrained problem by solving as a function of from equation (2). Then we apply the variant of the quasi-Newton method of limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm for bound-constrained optimization(L–BFGS–B) to find the solution. This algorithm approximates the BFGS using a limited amount of computer memory [1, 19].

When , problem (5) is the same as in [2]. The extension of problem (5) from two dimensional to high dimensional is not hard. The real challenge is how to overcome the heavy computational load to find the solution for the optimization problem involving higher dimensional integrations in every iteration of the quasi-Newton method. Next, we will propose our novel computation algorithms to address this challenge.

2.2. Calculate the High-Dimensional Integration by Monte Carlo and GPU Computing

Solving problem (5) requires to evaluate a large number of expected power defined in Formula (4) for various levels of

. Evaluation of the expected power is computing a high-dimensional integration, which is time-consuming and is the major computation challenge in solving our optimization problem. This high dimensional integration can not be further reduced by analytic and closed-form solution; hence it needs to be evaluated using a computational approach. The standard approach approximates such integration by fine-grids sum. The integration region is subdivided and then the sum of the area of each small region is the estimation of Formula  (


). There are two concerns of using standard approach in our problem. First, it is computationally impractical for high dimensional problems due to the curse of dimensionality. For example, to approximate a four-dimensional integration using only

grids in each dimension of , we need to calculate quantities and calculate their sum. Second, using hyper-rectangle or trapezoid to approximate the area under curve in each small region may lead to systematic bias. In some publications, a fixed number is multiplied to the approximation, and the number is ad-hocly chosen, but the real bias is way more complex than that.

To address these two issues, we propose a better approximation algorithm based on Monte Carlo and large-scale parallel computing using GPU as follow. The precision of our method can be further improved by utilizing smoothing technique, which will be discussed in the next subsection.

We randomly generate large number () of samples of from the prior distribution with density function , and then approximate the expected power by


Note the Formula (6) requires evaluating functions ’s, which can be processed in parallel. This type of large scale parallel computing is suitable for GPU computing. The GPU computing allows parallel process tens of thousands of ‘simple’ jobs at a time, which is much more than that of CPU computing can handle (e.g. Tesla V100 can handle 81,920 threads at most one time).

Although calculating the CDF of a multivariate normal distribution is an easy task for CPU, it is not implemented in GPU. Hence we need to design our algorithm by further decomposing each CDF computing task to smaller and handleable jobs by GPU computing. Since CDF is a semidefinite integral, it can be approximated as the sum of results of many simpler operations. We approximate this two-layer integration (inner-layer of CDF evaluation and outer-layer of calculating power) using a two-layer sampling algorithm utilizing GPU computing, as described below.

To evaluate the functions ’s, we generate a large number () inner-layer samples of vector from the multivariate normal distribution given by Formula  (1). For each outer-layer sample vector , we approximate the CDF function as


where is an indicator function that takes value 1 or 0 according to inside expression being true or false, respectively.

From the results of Formulas (6) and (7), the expected power defined in Formula (4) can be approximated by


Formula (7) decomposites the evaluation of CDF into the sum of a lot of comparison operations, which is handleable by GPU. These comparison jobs can be handled by GPU in parallel, and then the sum operations can also be done in parallel by GPU computing.

The sample size needed in the two-layer sampling can be determined by the desired precision of the estimated power, i.e. the bound of

. In Appendix A.2, we derive a bound of this variance, which is given as


For example, if we set in Monte Carlo, the precision of estimated power can be bounded by .

We summarize the procedures discussed in this subsection in Algorithm 1 and Algorithm 2. The Algorithm1 is for calculating the expected power . The Algorithm 2 is for calculating the CDF’s , which is called inside Algorithm 1.



2.3. Improve Accuracy by Smoothing

To find the optimal value of that maximize the expected power, in each iteration of the Newton method, it requires calculating not only power for a given value of , but also its derivative to decide the value of in the next iteration. Since the derivatives cannot be calculated analytically, we have to estimate then numerically in the quasi-Newton method. The accuracy of numerical derivatives affects whether the algorithm is searching the solution in the correct direction. However small bias in estimated power can lead to large errors in estimated derivatives , which makes Newton algorithm hard to converge. To fix this problem, we propose to use a smoothing technique to improve accuracy. In Appendix A.3, we prove that is an (n-1)-variate infinitely differentiable function with respect to . Such smoothness property enables us to borrow information from neighboring points on the surface, and hence remove noise and improve accuracy.

We propose to evaluate at many () different values of using Algorithm 1, and then fit a smooth surface (denoted as ) to these points using Thin Plate Splines (TPS) [8, 14]. The TPS model is well known to be versatile to model any shape of high dimensional smooth surface, and it is more efficient than other functional bases like polynomial splines. We use the optimal solution on the fitted surface as approximate solution of our original optimization problem (5), since this solution can be easily and reliably found using the Newton method for three reasons. First, we can specify good initial point to largely reduce the number of Newton iterations needed to converge. Specifically, among the points on the fitted surface, we select the one with the largest value of as initial point. Second, for any value of , we can quickly evaluate on the fitted TPS with ‘no’ error, hence the derivatives can be precisely estimated with low computational cost. Third, the TPS surface enables borrowing information among neighbor points of ’s to correct their errors via smoothing.

The points of values can be selected using the procedure as follows. For two-sided test, we set , hence for . The solution of re-parametrized and unconstrained optimization problem should be in the -dimensional hyper rectangular . We select the points as grid spanned by equally spaced points in every dimension. Then we solve for each spanned grid point with the constraint function in Formula (2), and identify the invalid grid points using . From the valid grid points we randomly select points of . Based on our experiments, we suggest using for 3-dimensional problems with , and using for 4-dimensional problems with . But user can revise this setting according to their needs.

2.4. Design the nested subpopulations

The discussion above assumes the nested subpopulations are pre-selected, but selecting subpopulations is also an important task of clinical trial design. Selecting subpopulations of is an optimization problem respect to parameters , which could be optimized (together with ) in problem (5). In the design of clinical trials, both the p-value thresholds and the sizes of subpopulations are controllable parameters. However, we deliberately avoid optimizing in problem (5) for the following reason. The choice of is a pure mathematics problem to maximize the study power, which is a common task for designing all clinical trials. On the other hand, the is related to not only potential drug efficacy on final target subpopulation but also future revenue. Whether selling the drug to a slightly larger patient population while scarifying some efficacy involves business decisions, which can be affected by many factors, such as the competitors in the market and the cost-revenue-ratio etc. Hence selecting can be guided by different utility functions in the real-world design of clinical trials. Our modeling strategy is to make selecting as a blackbox component and selecting as a component need specific users input on utility functions, which requires separate the two components in modeling.

For simplicity of discussion, we use the power as the utility function in the rest of this paper to illustrate the method of selecting best values of , but users can change it to any other study-specific utility functions involving both power and other factors. To identify the best values of , we use the following procedure. First, we specify many settings of values of , calculate their corresponding optimal powers solved from problem (5). Second, we fit TPS of optimal power as functions of . At last, we find optimal solution of on the fitted TPS using the same procedure as for the TPS discussed in the last subsection. Note, the smoothing technique can be applied for finding the optimal value of , since it is easy to prove that the function defined in Formula (4) is also an indefinite differentiable function of .

The complexity of our algorithm for selecting is approximately equal to the complexity of solving optimization problem (5) multiplied by the number of settings of used to fit TPS, since fitting TPS and finding optimal point on TPS surfaces can be done very quickly. To fit the TPS of with decent quality, we suggest using at least a hundred of points for the three-nested-population problems, and increase the number for higher dimensional problems. Such procedure requires solving many optimization problems of (5), which makes speed boosting more desired.

Finally, the optimal design is defined by the optimal value of together with optimal solution of under the setting of .

3. Applications in Design of Clinical Trials

Using examples of clinical trial designs, we illustrate the strength of our algorithm. To demonstrate the advantage of our algorithm, we compare our results and speed with a traditional algorithm. In the traditional algorithm, we use the Newton method with L-BFGS-B algorithm, but we approximate the integrations by grid sums.

In these examples, we design a Phase III clinical trial with a potential biomarker effect. The biomarker is a continuous variable, and it has an unknown threshold value for distinguishing responder patients and nonresponders. We consider the design with two candidate threshold values and three candidate threshold values, which correspond to three nested populations and four nested-populations, respectively. To design these trials, we need to solve optimization problems with 3 and 4 dimensional integrations. For input parameters, we use the same setting as described in [2], but extend the structure in higher dimensional problems. We set the total number of events in the entire population of patients in Phase III estimated by


where and

denotes the type-I and type-II error rates, respectively, which are usually set as

( false positive rate for two sided tests) and ( power), and stands for the hazard reduction of the entire population. In the setting of no biomarker effect, we set the drug efficacy to be a constant . Hence, the information unit estimation is 127, indicating that the total number of patients’ events is 508, which is typical for oncology clinical trials. In the setting of continuous biomarker effect, we set the drug efficacy to be in the entire population. Hence, the information unit is 211, and the total number of patients’ events is 844.

The prior distribution of effect sizes in these nested populations (learned from Phase I and II) is assumed to follow a multivariate normal distribution, given by,


where , is the point estimate of hazard reduction for the -th population, and we set to represent the fact that it is harder to get a precise estimation in the literature for the smaller proportion subpopulation.

In the examples of clinical trial design, we need to compare our method with the standard method. To run the standard method within reasonable computation time, we only consider the situation of nested populations. This leads to consider the investigation of two threshold values of biomarker. Since the sizes of subpopulations are controllable parameters in clinical trial design, we need to find the optimal setting of that provides the best optimal power. To fit the TPS power surface of , from the grid spanned points generated by equally spaced points in with step size 0.05, we select all () valid pairs satisfying . Assuming that the drug efficacy is a linear function of the subpopulation’s size, we consider three conditions of biomarker effects: (a) No biomarker effect with drug efficacy as a constant value ; (b) Weak biomarker effect that is continuous as , which linearly increase from to when decreases from to ; and (c) Strong biomarker effect that is continuous as , which linearly increases from to as decreases from to . In summary, this leads to investigating the design of clinical trials under different settings. In deriving the optimal clinical designs, we use equally spaced grids of every dimension of to calculate integrations in the traditional grid sum method. For our novel algorithm, we use random selected values of to fit the TPS , while for each of these points, we calculate the integration using Monte Carlo setting of .

Figure 1 shows the optimal designs in various settings of . The top row shows the optimal values, while the bottom row shows their corresponding optimal powers. From the bottom row, we can learn the optimal setting of which achieve the maximum power. The rotatable 3D plots of these subfigures are available, whose link is given in Appendix A.4. We find two common patterns of relationships among optimal values of , , and . First, when increases to be close to , increases sharply to reach the value of . We find similar relationship between the and . We explain this phenomenon as follows: the subpopulations become identical to the entire population when and approaching . Hence, the three optimal ’s should become identical as well. Second, the sum of our optimal design can be larger than the desired FWER threshold ( for two-sided tests), which is as expected. The alpha-splitting approaches assume all tests are independents, and have to enforce . In contrast, we assume the tests conducted in nested populations are related as modeled in Formula (1) and hence allow more powerful designs as while keeping FWER controlled at .

Figure 1(a) shows the results of no biomarker effect conditions. The best power is , which is achieved at the setting of ( ). This suggests only consider the entire population without subsets, which reduces our optimization problem into one dimension. In this optimal setting, the corresponding optimal p-value thresholds are . Then , are not approaching (i.e. subsets are not identical to the entire population), the optimal always allocates large to the entire population, while and are nearly equal to 0. This decision almost ignored all subsets, which is reasonable since the drug efficacy is a constant and the entire population has a larger sample size than the subsets. Figure 1(b) shows the results of weak biomarker effect conditions. The largest optimal power is , achieved at . This suggests only consider one subpopulation instead of two, which reduces our optimization problem into two dimensions. In this optimal setting, the optimal p-value thresholds are . The subpopulation comprises patient population. Figure 1(c) shows the results of strong biomarker effect conditions. The largest optimal power is , achieved at , which corresponds to the optimal p-value thresholds . This suggests considering two subpopulations, which comprise and patient population, respectively. Note the stronger the biomarker effect, the larger range that drug efficacy can change, and the more subsets suggested by our optimal design. Such a decision trend agrees with our intuition.


Figure 1. Three dimensional optimal design for (a) no biomarker effect, (b) weak biomarker effect , and (c) strong biomarker effect. The rotatable 3D plots of these subfigures are available, whose link is given in Appendix A.4. Each column is generated from the solutions of optimization problems (5) using different settings of and . The top row shows the optimal values, and the bottom row shows their corresponding optimal powers. The optimal setting of can be found on the peak of power surface in the bottom row, which guide us to define the subsets of patients.

Table 3 shows the summary of computation times needed to solve a single optimization problem using each method. This summary is based on all optimization problems solved for generating Figure 1. The speed of our novel method (using Nvidia Tesla V100 GPU on Compute Canada server Beluga) is 133 times faster than the traditional method (using the CPU on the same server). Without the help of the GPU algorithm the speed Monte Carlo approach is only 3.2 times faster than the standard grid sum approach, which illustrates the importance of utilizing GPU computing. Note the standard method uses at most seconds to solve a single optimization problem, which seems to be acceptable. However, in the application of clinical trial design, we always need to understand the relationship of power to the settings of and , which requires solving a large number of optimization problems. Solving all optimization problems used to generate Figure 1 requires hours to run our algorithm and over hours to run the standard grid sum algorithm. The time required could be much more if we consider the optimization problems of more than 3 dimensions. Hence, the speed boost offered by our algorithm is critical in real applications. We will discuss this in more detail in Section 4.


To compare the difference of optimal obtained from our method and standard method, we calculate their relative difference defined as for . Where and denotes optimal solution of obtained by the standard method and the novel method, respectively. Figure 2 shows the relative difference for all alpha values estimated for the optimization problems. Except for no biomarker conditions, most relative differences are well bounded by . The optimal obtained from two methods are not only similar in magnitude, but also statistically (with all non-significant p-values from pairwise Wilcoxon tests).

In no biomarker effect conditions, the solutions of and are very close to . Hence the relative difference should be interpreted differently. In solutions of our method, most values of are less than , hence relative difference bounded by 2 (as shown in boxplot) means both methods suggest almost ignoring the smallest subset. In addition, the Wilcoxon test p value for is non-significant (). The relative difference of is always positive (as seen in Figure 2 ), and the relative difference of is always negative (not enough resolution to show in Figure 2 since the absolute value of optimal is big). Such a strong trend indicates that, compared with the standard method, our method consistently suggests spending more -allocation on the entire population. This a wise suggestion, since in no biomarker condition, we should only test drug efficacy on the entire population to achieve better power.


Figure 2. Boxplot of relative differences between the two methods. The relative difference is defined as for . Where and denotes optimal solution of obtained by the standard method and the novel method, respectively. Except for no biomarker effect conditions, the relative errors are well bounded by the two vertical lines at , which means their relative difference is small.  In addition, the pairwise Wilcoxon test shows no significant difference in two methods, except for and in no biomarker effect conditions. Interpreting the results of no biomarker conditions are more complicated and are discussed in the main text of this paper.

To compare the precision of estimated optimal power between our novel algorithm and the standard algorithm, we need to compare them against the ‘gold standard’. For each optimal setting of values in the left hand side panel of Figure 1, we calculate the expected power using a fine-grid sum approach with grid points for each dimension. The values calculated using fine grid sum approach should be more precise than both standard and novel method discussed above, so we use them as ‘gold standard’ in the precision comparison. We denote the estimated power using the standard method, our novel method, and the fine grid ‘gold standard’, and denoted as ,, and , respectively. For each optimization problem we solved for Figure 1, we calculate precision difference . Positive value of statistics indicates the solution of the novel method is closer to the truth. Figure 3 shows all Q statistics obtained from the optimization problems solved for generating Figure 1. There is only one Q statistic slightly below the horizontal line of , which indicates that our method consistently provides more precise solutions than the traditional method. Wilcoxon test p-values corresponding to all three boxes/conditions are less than . We also find the performance difference between our method and the standard method is not always similar for different kinds of optimization problems. For example, the weak biomarker effect problems have much more difference than strong biomarker effect problems.


Figure 3. Boxplot of precision difference statistics , defined as . The , and denote the optimal power obtained by ”gold standard”, novel, and standard method, respectively.  Positive Q values indicate that our method is more accurate than the standard method. Each box in this figure is made from the solutions of optimization problems in a biomarker effect condition. Almost all data points are above the horizontal line of , which shows that our novel method consistently outperforms the standard method. 

4. Discussion

To generate Figure 1, we need to solve optimization problems, which takes over hours to run the standard method, and hours to run our novel method. This illustrates the importance of speed-boosting in real-world applications. In higher dimensional design, we can speed up more significantly. Based on our limited amount of experiments in the situation of nested populations (results not shown), our method is over times faster than the standard method. The following are the details about expected speed-change in dimensional problems. Compared with dimensional problems, the standard method need times more time to solve an optimization problem, because of the one more dimension of sum with grid size and other related calculations. In comparison, our novel method is about times slower because of the doubled value of and other related calculations. We use to fit -dimensional TPS of , since higher dimensional surface needs more points to achieve a similar level of good fit.

We are not the first algorithm using GPU to evaluate integrals. For example, VEGAS [18] is one of the most popular integration packages in Python. To improve accuracy without much time consumption, the algorithms in VEGAS are combined with an adaptive multi-channel sampling method [15]. BASES [16] is another popular algorithm. It deals with the integration of singular functions. Typical computer clusters can take months to perform the computations that calculate high dimensional integration, but GPU accelerated solutions can massively speed up these computations [9, 12]. However, these methods are mostly developed for general purpose integration. General methods are not efficiently designed to utilize properties of our specific problem, and some methods (such as ZMCIntegral[18]) cannot handle our complex integral functions in our problem. This motivated us to implement our integral functions. In addition, the most contribution of our method is incorporating a smoothing technique to improve precision and solve the converging issue of the Newton methods, which is not proposed in previous studies.

The bound of variance Formula (9) leads to many advantages of sampling-based methods over the standard sum-of-grid-value method. First, because the upper bound of estimation variance is a finite number regardless of dimensionality, our method is scalable to obtain the solution of higher dimensional problems with reasonable precision and reasonable time. In contrast, to keep the same precision level, the complexity of the ‘sum of grid values approximation’ method increases exponentially with dimension. Second, our method can reach any desired precision level by increasing the number of samples (i.e. and ) accordingly, whereas the sum-of-grid approach introduce a systematic error by approximating area under a curve using trapezoid shape.

5. Conclusion

In this work, we proposed a novel computational solution for optimization problems involving high dimensional integration, and applied it to the design of clinical trials. Our algorithm utilizes GPU parallel computing to accelerate computation by Monte Carlo. It incorporates a smoothing technique to not only fix the convergence issue in the Newton method but also improve the precision of estimates. Using examples, we illustrated that our novel algorithm outperforms the standard method in the real- world design of clinical trials. Our method not only boosts the computing speed to solve the design problem within a reasonable amount of time, but also improves the accuracy of estimations. We implement our algorithm using R language with Python functions called inside R. We will make our software available via GitHub and CRAN after the manuscript is accepted, so that all researchers can use to design their clinical trials.

This research is motivated by the problem of the design of clinical trials. Our software is developed to help the design of clinical trials. However, our method of using Monte Carlo, GPU computing, and smoothing can solve other general optimization problems with high-dimensional integration.


The research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants (JZ and XZ) and Canada Research Chair Grant (YL and XZ). This research was enabled in part by support provided by WestGrid ( and Compute Canada (

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


A.1 Derivation of the Covariance Matrix of Normal Distribution in Formula (1)

As discussed in [2], the correlation efficient of Z-statistics between entire population and the -th subpopulation is

Consider any two (the -th and the -th) nested subpopulations for . The -th subpopulation is a subset of the -th subpopulation and the proportion is . We treat as the new ‘full’ population, so we have

A.2 Derivation of the Upper bound of Variance for the Estimated Power


Then, we have, for ,

From Formula (8), we obtain

A.3 Proof of the Smoothness property of P()

Without loss of generality, we prove is a smooth (infinitely differentialble) function of for . From Formula (4), we have


where is the density function of the stadard normal distribution. It is clear that is a continuous function of for all . Similarly, we could abtain the second, third and fourth derivation of , which are all continous functions of . Thus, is a smooth function of , for all .

A.4 The links of rotatable 3D plots corresponding to subfigures in Figure 1

No biomarker effect:
Fitted TPS Surface of the optimal power :
Fitted TPS Surface of the optimal :

Weak biomarker effect:
Fitted TPS Surface of the optimal power :
Fitted TPS Surface of the optimal :

Strong biomarker effect:
Fitted TPS Surface of the optimal power :
Fitted TPS Surface of the optimal :


  • [1] RH Byrd, PH Lu, J Nocedal, and C Zhu. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput, 16(5):1190–1208, 1995.
  • [2] C Chen and RA Beckman. Hypothesis testing in a confirmatory phase iii trial with a possible subset effect. Statistics in Biopharmaceutical Research, 1(4):431–440, 2009.
  • [3] C Chen and RA Beckman. Efficient, adaptive clinical validation of predictive biomarkers in cancer therapeutic development. Advances in Experimental Medicine and Biology, 867:81–90, 2015.
  • [4] C Chen, Nicole Li, Y Shentu, L Pang, and RA Beckman. Adaptive informational design of confirmatory phase iii trials with an uncertain biomarker effect to improve the probability of success. Statistics in Biopharmaceutical Research, 8(3):237–247, 2016.
  • [5] MA Cobleigh, CL Vogel, and D Tripathy. Multinational study of the efficacy and safety of humanized anti-her2 monoclonal antibody in women who have her2-overexpressing metastatic breast cancer that has progressed after chemotherapy for metastatic disease. Journal of Clinical Oncology, 17(9):2639–2648, 1999.
  • [6] B Freidlin and R Simon. Adaptive signature design: An adaptive clinical trial design for generating and prospectively testing a gene expression signature for sensitive patients. Clinical Cancer Research, 11(21):7872–7878, 2005.
  • [7] EB Garon, NA Rizvi, and R Hui. Pembrolizumab for the treatment of non-small-cell lung cancer. New England Journal of Medicine, 372(21):2018–2028, 2015.
  • [8] PJ Green and BW Silverman. Nonparametric Regression and Generalized Linear Models: A roughness penalty approach. Chapman and Hall/CRC, 1993.
  • [9] M Hernández, GD Guerrero, and JM Cecilia. Accelerating fibre orientation estimation from diffusion weighted magnetic resonance imaging using gpus. PLOS ONE, 8(4):e61892, 2013.
  • [10] Christopher Jennison and BW Turnbull. Group Sequential Methods with Applications to Clinical Trials. Chapman and Hall/CRC, 1999.
  • [11] W Jiang, B Freidlin, and R Simon. Biomarker-adaptive threshold design: A procedure for evaluating treatment with possible biomarker-defined subset effect. Journal of National Cancer Institute, 99(13):1036–1043, 2007.
  • [12] P Klus, S Lam, and D Lyberg. Barracuda-a fast short read sequence aligner using graphics processing units. BMC Res Notes 5, 5(1), 2012.
  • [13] TJ Lynch, DW Bell, R Sordella, and S Gurubhagavatula. Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-smallcell lung cancer to gefitinib. New England Journal Medicine, 350(21):2129–2139, 2004.
  • [14] D Nychka, R Furrer, J Paige, and S Sain. fields: Tools for spatial data, 2017. R package version 10.0.
  • [15] T Ohl. Vegas revisited: Adaptive monte carlo integration beyong factorization. Computer Physics Communications, 120(1):13–19, 1999.
  • [16] Kawabata Setsuya. A new version of the multi-dimensional integration and event generation package bases/spring. Computer Physics Communications, 88:309–326, 1995.
  • [17] DJ Slamon, B Leyland-Jones, and S Shak. Use of chemotherapy plus a monoclonal antibody against her2 for metastatic breast cancer that overexpresses her2. New England Journal of Medicine, 344(11):783–792, 2001.
  • [18] Hong-Zhong Wu, JunJie Zhang, Long-Gang Pang, and Qun Wang. Zmcintegral: A package for multi-dimensional monte carlo integration on multi-gpus. Computer Physics Communications, 248:106962, 2020.
  • [19] C Zhu, RH Byrd, Peihuang Lu, and Jorge Nocedal. L-bfgs-b: Algorithm 778: L-bfgs-b, fortran routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software, 23(4):550–560, 1997.