CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search

10/11/2021
by   Hengrui Cai, et al.
NC State University
Merck
0

Personalized medicine, a paradigm of medicine tailored to a patient's characteristics, is an increasingly attractive field in health care. An important goal of personalized medicine is to identify a subgroup of patients, based on baseline covariates, that benefits more from the targeted treatment than other comparative treatments. Most of the current subgroup identification methods only focus on obtaining a subgroup with an enhanced treatment effect without paying attention to subgroup size. Yet, a clinically meaningful subgroup learning approach should identify the maximum number of patients who can benefit from the better treatment. In this paper, we present an optimal subgroup selection rule (SSR) that maximizes the number of selected patients, and in the meantime, achieves the pre-specified clinically meaningful mean outcome, such as the average treatment effect. We derive two equivalent theoretical forms of the optimal SSR based on the contrast function that describes the treatment-covariates interaction in the outcome. We further propose a ConstrAined PolIcy Tree seArch aLgorithm (CAPITAL) to find the optimal SSR within the interpretable decision tree class. The proposed method is flexible to handle multiple constraints that penalize the inclusion of patients with negative treatment effects, and to address time to event data using the restricted mean survival time as the clinically interesting mean outcome. Extensive simulations, comparison studies, and real data applications are conducted to demonstrate the validity and utility of our method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/22/2018

PSICA: decision trees for probabilistic subgroup identification with categorical treatments

Personalized medicine aims at identifying best treatments for a patient ...
05/13/2016

ABtree: An Algorithm for Subgroup-Based Treatment Assignment

Given two possible treatments, there may exist subgroups who benefit gre...
06/15/2020

A Nonparametric Method for Value Function Guided Subgroup Identification via Gradient Tree Boosting for Censored Survival Data

In randomized clinical trials with survival outcome, there has been an i...
11/17/2021

Jump Interval-Learning for Individualized Decision Making

An individualized decision rule (IDR) is a decision function that assign...
02/04/2020

Robust Optimal Design of Two-Armed Trials with Side Information

Significant evidence has become available that emphasizes the importance...
09/23/2019

Tuning parameter calibration for prediction in personalized medicine

Personalized medicine has become an important part of medicine, for inst...
04/14/2022

Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine

Dynamic treatment regimes (DTRs) are used in medicine to tailor sequenti...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Personalized medicine, a paradigm of medicine tailored to a patient’s characteristics, is an increasingly attractive field in health care (Kosorok and Laber, 2019). Its ultimate goal is to optimize the outcome of interest by assigning the right treatment to the right patients. To ensure the success of personalized medicine, it is important to identify a subgroup of patients, based on baseline covariates, that benefits more from the targeted treatment than other comparative treatments (Loh et al., 2019). The resulting identification strategy is referred to as a subgroup selection rule (SSR). Subgroup analysis, if properly used, can lead to more well informed clinical decisions and improved demonstration of the efficacy of the treatment.

Though various data-driven methods for subgroup identification (Song and Pepe, 2004; Su et al., 2009; Foster et al., 2011; Cai et al., 2011; Sivaganesan et al., 2011; Imai and Ratkovic, 2013; Loh et al., 2015; Fu et al., 2016) have been developed during the recent decade (see a comprehensive review in Lipkovich et al. (2017) (Lipkovich et al., 2017)), these works focus only on obtaining a subgroup with an enhanced treatment effect or identifying patients who benefit more from the new treatment, and usually yield a smaller and thus less satisfactory group of selected patients. To see this, we apply the virtual twins (VT) method (Foster et al., 2011) to identify the subgroup in a simulated scenario (Scenario 1; see detailed setting in Section 5.1) for an illustration. The desired average treatment effect is 1.0 with the corresponding optimal subgroup sample proportion of , i.e., half of the population should be selected into the subgroup. Yet, as summarized in Table 1 over 200 replications, the selected sample proportion under the VT method is less than even under sample size with an overestimated average treatment effect of . Identifying the largest possible subgroup of patients that benefit from a given treatment at or above some clinically meaningful threshold can be critical both for the success of a new treatment, and most importantly for the patients who may rely on a treatment for their health and survival. When too small of a subgroup is selected, the erroneously unselected patients may suffer from suboptimal treatments. For a test treatment, this reduced subgroup size can further lead to problems with regulatory approvals or drug reimbursements that in extreme cases may even halt compound development and availability. In the above example, where less than 30% of patients are selected as benefitting from the new treatment, a drug approval may be unlikely, though in truth half of all subjects do show substantial improvement in health from the new treatment. Post-approval accessibility can also be hindered by a lackluster subgroup size, especially in countries with all-or-nothing reimbursement markets where the seemingly low proportion of benefiting patients leads to low reimbursements that may not be financially sustainable for continued treatment manufacturing. A subgroup learning approach that selects as many patients as possible with evidence of a clinically meaningful benefit from treatment is thus desired so that more patients can receive the better treatment.

In this paper, we aim to solve the subgroup optimization that finds the optimal SSR to maximize the number of the selected patients, and in the meantime, achieve the pre-specified clinically desired mean outcome, such as the average treatment effect. There are two major difficulties in developing the optimal SSR. First, there is a trade-off between the size of the selected subgroup and its corresponding average treatment effect: the more patients selected, the lower average treatment effect we can achieve. To optimize the size of the subgroup and also maintain the enhanced treatment effect, constrained optimization is required. Second, most of the existing optimization approaches with constraints (see e.g., Wang et al., 2018 (Wang et al., 2018) and Zhou et al., 2021 (Zhou et al., 2021)) used complex decision rules and thus were hard to interpret. In this paper, we focus on the tree-based decision rules to develop an interpretable optimal SSR.

Our contributions can be summarized as follows. First, we derive two equivalent theoretical forms of the optimal SSR based on the contrast function that describes the treatment-covariates interaction in the outcome. Second, according to the theoretical optimal SSR, we propose a ConstrAined PolIcy Tree seArch aLgorithm (CAPITAL) to optimize the subgroup size and achieve the pre-specified clinical threshold. Specifically, we transform the loss function of the constrained optimization into individual rewards defined at the patient level. This enables us to identify the patients with a large mean outcome and develop a decision tree to generate an interpretable subgroup. For instance, recall the toy example at the beginning of this section. In contrast to the current subgroup identification methods, the selected sample proportion under the proposed method is nearly optimal at

and its average treatment effect under the estimated SSR is close to the truth at

. See details provided in Table 1 and Section 5.1. Third, we extend our proposed method to the framework with multiple constraints that penalize the inclusion of patients with negative treatment effects, and to time to event data, using the restricted mean survival time as the clinically interesting mean outcome. Extensive simulations, comparison studies, and real data applications are conducted to demonstrate the validity and utility of our method. The source code is publicly available at our repository at https://github.com/HengruiCai/CAPITAL implemented in R language.

Method Results
Virtual Twins Selected Sample Proportion 0.21(0.13) 0.24(0.10) 0.26(0.07)
Average Treatment Effect 1.19(0.21) 1.37(0.18) 1.45(0.13)
CAPITAL Selected Sample Proportion 0.46(0.16) 0.48(0.09) 0.50(0.06)
Average Treatment Effect 0.90(0.27) 1.00(0.15) 0.99(0.11)
Table 1: Empirical results of subgroup identification (using Virtual Twins (Foster et al., 2011)) and subgroup optimization using CAPITAL under Scenario 1 with the desired average treatment effect of 1.0 and the optimal subgroup sample proportion of

. The results are averaged over 200 replications with standard deviations presented in the parentheses.

1.1 Related Works

There are numerous data-driven methods proposed for subgroup identification. Song and Pepe (2004) (Song and Pepe, 2004) considered using the selection impact curve to evaluate treatment policies for a binary outcome based on a single baseline covariate. Foster et al. (2011) (Foster et al., 2011) developed a virtual twins method which first predicts the counterfactual outcome for each individual under both the test and control treatments, and then uses tree-based methods to infer the subgroups with an enhanced treatment effect. Cai et al. (2011) (Cai et al., 2011) proposed using parametric scoring systems based on multiple baseline covariates to rank treatment effects and then identified patients who benefit more from the new treatment using the ranked effect sizes. A useful tutorial and preliminary literature review for commonly used subgroup identification methods is provided in Lipkovich et al. (2017) (Lipkovich et al., 2017). Yet, all these methods focus on subgroup identification but not subgroup optimization, potentially leading to a greatly reduced number of selected patients. More details can be found in our comparison studies (Section 5.1).

Recently, a number of approaches have been developed to handle the constrained optimization problems. Wang et al. (2018) (Wang et al., 2018) proposed an individualized optimal decision rule that maximizes the clinical benefit for patients and controls the risk of adverse events, based on outcome weighted learning. Guan et al. (2020) (Guan et al., 2020) estimated the optimal dynamic treatment regime under a constraint on the cost function by leveraging nonparametric Bayesian dynamics modeling with policy search algorithms. To handle the trade-off between the primary event of interest and the time to severe side effects of treatment in the competing risks data, Zhou et al. (2021) (Zhou et al., 2021) derived a restricted optimal treatment regime based on the penalized value search method. However, none of the cited works are applicable to our problem, as they only focus on optimizing the mean outcome of interest while we also consider the size of the subgroup. In addition, since the loss functions in both outcome weighted learning and value search methods are defined based on the whole sample, it is infeasible to search the interpretable class of decision trees using these methods.

The rest of this paper is organized as follows. We first formulate our problem in Section 2. In Section 3, we establish the theoretical optimal SSR that achieves our objective, and then propose CAPITAL to solve the optimal SSR. We extend our work to multiple constraints and survival data in Section 4. Simulation and comparison studies are conducted to evaluate our methods in Section 5, followed by the real data analysis in Section 6. In Section 7, we conclude our paper. All the technical proofs and additional simulation results are provided in the appendix.

2 Problem Formulation

Let denote a

-dimensional vector containing individual’s baseline covariates with the support

, and denote the binary treatment an individual receives. After a treatment is assigned, we observe the outcome of interest with support . Let and

denote the potential outcomes that would be observed after an individual receives treatment 0 or 1, respectively. Define the propensity score function as the conditional probability of receiving treatment 1 given baseline covariates

, denoted as . Denote as the sample size. The sample consists of observations independent and identically distributed (I.I.D.) across .

As standard in the causal inference literature (Rubin, 1978), we make the following assumptions:

(A1). Stable Unit Treatment Value Assumption (SUTVA):

(A2). Ignorability:

(A3). Positivity: for all .

Based on assumptions (A1) and (A2), we define the contrast function as

that describes the treatment-covariates interaction in the outcome. Under assumptions (A1) to (A3), the contrast function is estimable from the observed data.

Define the subgroup selection rule (SSR) as that assigns the patient with baseline covariates to the subgroup () or not (). Denote the class of the SSR as . The goal is to find an optimal SSR that maximizes the size of the subgroup and also maintains a desired mean outcome such as the average treatment effect (), i.e.,

(1)

where is a pre-specified threshold of clinically meaningful average treatment effect.

3 Method

In this section, we first establish the theoretical optimal SSR that achieves our objective, and then propose CAPITAL to solve the optimal SSR.

3.1 Theoretical Optimal SSR

We first derive the theoretical optimal SSR that solves the objective in (1). Based on assumptions (A1) and (A2), the constraint in (1) can be represented by

Given the pre-specified threshold , we denote a cut point associated with the contrast function such that the expectation of the contrast function larger than achieves , i.e.,

(2)

By introducing , when we are maximizing the subgroup size, the treatment effect of each patient in the subgroup is ensured to meet the minimum acceptable beneficial effect size. We illustrate the density function of the contrast function with a cut point for the pre-specified threshold in Figure 1. The yellow area in Figure 1 contains the patients whose contrast functions are larger than and thus satisfy (2).

Figure 1: Illustration of the density function of the contrast function with a cut point for the pre-specified threshold .

Intuitively, the theoretical optimal SSR should choose the patients whose contrast functions fall into the yellow area in Figure 1, i.e., those whose treatment effects are larger than , to maximize the size of the subgroup. Without loss of generality, we consider the class of the theoretical SSRs as

Here, for a given , the SSR selects a patient into the subgroup if his / her contrast function is larger than . The following theorem gives the theoretical optimal SSR.

Theorem 3.1

(Theoretical Optimal SSR) Assuming (A1) and (A2), the optimal subgroup selection rule is

(3)

Equivalently, the optimal subgroup selection rule is

(4)

The proof of Theorem 3.1 consists of two parts. First, we show the optimal SSR is , where satisfies (2), within the class . Second, we derive the equivalence between (3) and (4). See the detailed proof of Theorem 3.1 provided in the appendix.

From Theorem 3.1 and the definition of the cut point , the optimal SSR can be found based on the density of the contrast function. Since the density function is usually unknown to us in reality, we use the estimated contrast function for each patient, i.e., the individual treatment effect, to approximate the density function. A constrained policy tree search algorithm is provided to solve the optimal SSR in the next section.

3.2 Constrained Policy Tree Search Algorithm

In this section, we formally present CAPITAL. First, we transform the constrained optimization in (1) into individual rewards defined at the patient level. This enables us to identify patients more likely to benefit from treatment. Then, we develop a decision tree to partition these patients into the subgroups based on the policy tree algorithm proposed by Athey and Wager (2017) (Athey and Wager, 2021).

We focus on the SSR in the class of finite-depth decision trees. Specifically, for any , a depth- decision tree is specified via a splitting variable , a threshold , and two depth- decision trees , and , such that if , and otherwise. Denote the class of decision trees as . We illustrate a simple decision tree with splitting variables and in Figure 2. This decision tree has a mathematical form as .

Figure 2: Illustration of a simple decision tree with splitting variables and .

Define as the difference between the contrast function and the desired average treatment effect . Under (A1)-(A3), we can estimate the contrast function, denoted as

, using the random forest method and out-of-bag prediction (see e.g., Lu et al., 2018

(Lu et al., 2018)). Define . It is immediate that a patient with larger is more likely to be selected into the subgroup based on Figure 1. We sort the estimates as

This sequence gives an approximation of the density of .

We further define the cumulative mean based on the above sequence as

With sufficiently large sample size, converges to the average treatment effect minus the desired effect , within the selected patients whose contrast function is larger than the upper quantile of the density of , i.e.,

where is the upper quantile of the density of when goes to infinity.

As long as is larger than zero, the selected subgroup satisfies the condition in (1) based on the theoretical optimal SSR in (4) from Theorem 3.1. Therefore, we need to select patients with positive and maximize the subgroup size to solve (1). To do this, we define the reward of the -th individual based on the sign of as follows:

Reward 1:

(5)

where is the rank of in the sequence or the sequence , and ‘sign’ is the sign operator such that if , if , and if . Given is positive, the reward is 1 if the patient is selected to be part of the subgroup, and is 0 otherwise. Likewise, supposing is negative, the reward is if the patient is selected to be in the subgroup, i.e., , and is 0 otherwise. This is in accordance with the intuition that we should select patients with larger than zero.

To encourage the decision tree to include patients who have a lager treatment effect, we also propose the following reward choice based on the value of directly:

Reward 2:

(6)

The optimal SSR is searched within the decision tree class to maximize the sum of the individual rewards defined in (5) or (6). Specifically, the decision tree allocates each patient to the subgroup or not, and receives the corresponding rewards. We use the exhaustive search to estimate the optimal SSR that optimizes the total reward, using the policy tree algorithm proposed in Athey and Wager (2017) (Athey and Wager, 2021). It is shown in the simulation studies (Section 5) that the performances are very similar under these two reward choices.

We denote the estimated optimal SSR that maximizes the size of the subgroup and also maintains the desired average treatment effect as . The proposed algorithm not only results in an interpretable SSR (see more discussion in Section 5), but also is flexible to handle multiple constraints and survival data, as discussed in detail in the next section.

4 Extensions

In this section, we discuss two main extensions of CAPITAL for solving (1). We first address multiple constraints on the average treatment effect in Section 4.1, and then handle the time to event data with the restricted mean survival time as the clinically interesting mean outcome in Section 4.2.

4.1 Extension to Multiple Constraints

In addition to the main constraint described in (1), in reality there may exist secondary constraints of interest. For instance, besides a desired average treatment effect, the individual treatment effect for each patient should be greater than some minimum beneficial value. Under such multiple constraints, the optimal SSR is defined by

(7)

where is a pre-specified minimum beneficial value. In the rest of this paper, we focus on the case with , that is, the individual treatment effect for each patient should be nonnegative so that the treatment is beneficial to the patients in the selected group.

The above objective function can be solved by modifying CAPITAL presented in Section 3.2. Specifically, we define the reward of the -th individual based on (7) and (6) as follows.

Reward 3:

(8)

where is the nonnegative penalty parameter that represents the trade-off between the first and the second constraint. When , the reward defined in (8) reduces to (6). Here, we only add the penalty on the reward when the estimated contrast function is negative, i.e., . This prevents the method from selecting patients with a negative individual treatment effect.

4.2 Extension to Survival Data

We next consider finding the optimal SSR for a survival endpoint. Let and denote the survival time of interest and the censoring time, respectively. Assume that and are independent given baseline covariates and the treatment. Then, the observed dataset consists of independent and identically distributed triplets, , where and . The goal is to maximize the size of the subgroup with a pre-specified clinically desired effect , i.e.,

(9)

where is the maximum follow up time, which is pre-specified or can be estimated based on the observed data.

Denote and as the restricted mean survival time for groups with treatment 0 and 1, respectively, given baseline covariate , where and are survival functions in the control and treatment groups, respectively. To estimate and , we first fit a random forest on the survival functions in the control and treatment groups, respectively, and get the estimations as and . Then, the estimated restricted mean survival time for groups with treatment 0 and 1, denoted as and , are calculated by integrating the estimated survival functions to the minimum of the maximum times over the 2 arms. Define to capture the distance from the estimated contrast function to the desired difference in restricted mean survival time for the -th individual. It is immediate that an individual with larger is more likely to be selected into the subgroup. We sort the estimates as and define the cumulative mean as . The reward for the constrained policy tree search can be defined following similar arguments as in (5) and (6).

5 Simulation Studies

5.1 Evaluation and Comparison with Average Treatment Effect

Suppose baseline covariates , the treatment information , and the outcome are generated from the following model:

(10)

where is the baseline function of the outcome, is the contrast function, is the random error. We set the dimension of covariates as and consider the following three scenarios respectively.

Scenario 1:

Scenario 2:

Scenario 3:

The true average treatment effect can be calculated as 0 under all scenarios. We illustrate the density of for Scenarios 2 and 3 in Figure 3. Note the density of

for Scenarios 1 is just a uniform distribution on interval

. Based on Figure 3, we consider the clinically meaningful treatment effect for all scenarios, with the corresponding optimal subgroup sample proportions as listed in Table 3. Let the total sample size be chosen from the set .

Figure 3: Left panel: the density function of for simulation Scenario 2. Right Panel: the density function of for simulation Scenario 3.

We apply CAPITAL to find the optimal SSR. The policy is searched within based on the R package ‘policytree’ (Athey and Wager, 2021; Zhou et al., 2018). For better demonstration, we focus on decision trees. To illustrate the interpretability of the resulting SSR, we show more specific results of three particular simulation replicates (as replicate No.1, No.2, and No.3) under Scenario 2 with using the first choice of reward in (6) for . The estimated SSR under these three selected replicates are shown in Figure 4, with the splitting variables and their splitting thresholds reported in Table 2. We summarize the selected sample proportion under the estimated SSR, the average treatment effect of the estimated SSR, and the rate of making correct subgroup decisions by the estimated SSR, using Monte Carlo approximations. Finally, we visualize the density function of within the subgroup selected by the estimated SSR, with comparison to that of unselected patients, for three replicates in Figure 5.

Figure 4: The estimated optimal subgroup selection tree by CAPITAL under Scenario 2 with and . Upper left panel: for replicate No.1. Upper right Panel: for replicate No.2. Lower middle Panel: for replicate No.3.
Simulation Replicate No.1 Replicate No.2 Replicate No.3
44.5% 49.2% 55.0%
1.11 1.00 0.90
Rate of Correct Decision 91.85% 92.01% 94.45%
Split Variable (Split Value)
(Left) Split Variable (Split Value)
(Right) Split Variable (Split Value)
Table 2: Results of estimated optimal subgroup selection tree for three particular replicates under Scenario 2 with and (where the optimal subgroup sample proportion is ) under CAPITAL.

Over 200 replicates, the rate of correctly identifying important features and under the estimated SSRs is 70.8% with , increasing to 95.8% with , and 100.0% with , under Scenario 2 with . It can be seen from both Figure 4 and Table 2 that the estimated SSRs under the proposed method identify the important features and that determine the outcome for all three replicates. In Scenario 2, and have identical roles in the contrast function, so the resulting optimal tree can either use or as the first splitting variable. Replicate No.3 over-selects the subgroup and therefore yields a lower average treatment effect, while replicate No.1 under-selects the subgroup and achieves a higher average treatment effect, as shown in Table 2. This finding is in line with the trade-off between the size of the selected subgroup and its corresponding average treatment effect discussed in the introduction. Moreover, all these three replicates have a high rate () of making correct subgroup decisions under the estimated SSRs, supported by both Table 2 and Figure 5.

Figure 5: The density function of within or outside the subgroup under Scenario 2 with and . Left panel: for replicate No.1. Middle Panel: for replicate No.2. Right Panel: for replicate No.3.

In addition, we compare the proposed method with the VT method (Foster et al., 2011). Though the VT method can theoretically be used for both binary and continuous outcomes, the current R package ‘aVirtualTwins’ only deals with binary outcomes in a two-armed clinical trial. To address the continuous outcomes in Scenarios 1-3, following the VT method (Foster et al., 2011), we fit the estimated individual treatment effect on features via a regression tree. We next consider two subgroup selection rules based on the VT method.

VT-A: Denote the average treatment effect within a terminal node as . The final subgroup is formed as the union of the terminal nodes where the predicted values are greater than .

VT-C: Denote . Then each terminal node

is classified into the subgroup based on a majority vote within the node by

. The final subgroup is defined as the union of the terminal nodes with .

We apply the proposed method, the VT-A and VT-C methods for Scenarios 1-3 with 200 replications. We summarize the selected sample proportion as and the average treatment effect as under the estimated SSR, the rate of making correct subgroup decisions by the estimated SSR (RCD, the number of correct subgroup decisions divided by the total sample size), and the rate of positive individual treatment effect within the selected subgroup (RPI, the number of positive individual treatment effects divided by the size of the selected subgroup), aggregated over 200 replications, using Monte Carlo approximations, with standard deviations presented. Since the VT-A and the VT-C methods have nearly identical results, and performances under our method with reward (5) and with reward (6) are similar, for a better demonstration on the comparison results, we report the empirical results in Tables 3 for the proposed method with reward (5) and the VT-A method, and in Tables 4 for the proposed method with reward (6) and the VT-C method.

Method Scenario 1 Scenario 2 Scenario 3
CAPITAL Proportion
0.62(0.16) 0.63(0.08) 0.65(0.05) 0.42(0.23) 0.51(0.11) 0.56(0.05) 0.72(0.15) 0.74(0.08) 0.77(0.05)
0.66(0.28) 0.72(0.17) 0.69(0.10) 0.72(0.47) 0.96(0.20) 0.86(0.11) 0.66(0.34) 0.67(0.18) 0.61(0.11)
RCD 0.83(0.10) 0.91(0.05) 0.93(0.03) 0.62(0.15) 0.81(0.08) 0.87(0.03) 0.83(0.08) 0.89(0.03) 0.90(0.01)
RPI 0.78(0.13) 0.80(0.09) 0.78(0.06) 0.74(0.15) 0.88(0.08) 0.86(0.06) 0.67(0.10) 0.67(0.06) 0.65(0.04)
Proportion
0.46(0.16) 0.48(0.09) 0.50(0.06) 0.21(0.17) 0.32(0.12) 0.40(0.06) 0.56(0.16) 0.59(0.09) 0.62(0.06)
0.90(0.27) 1.00(0.15) 0.99(0.11) 0.83(0.63) 1.31(0.27) 1.17(0.11) 1.02(0.37) 1.00(0.20) 0.94(0.15)
RCD 0.84(0.11) 0.91(0.05) 0.94(0.03) 0.62(0.12) 0.79(0.11) 0.88(0.05) 0.79(0.07) 0.85(0.03) 0.87(0.01)
RPI 0.88(0.11) 0.94(0.06) 0.94(0.05) 0.75(0.19) 0.95(0.05) 0.97(0.03) 0.78(0.11) 0.78(0.06) 0.77(0.05)
Proportion
0.30(0.16) 0.31(0.11) 0.34(0.08) 0.09(0.09) 0.14(0.10) 0.25(0.09) 0.41(0.15) 0.44(0.09) 0.48(0.06)
1.05(0.33) 1.28(0.18) 1.29(0.14) 0.66(0.73) 1.58(0.59) 1.48(0.24) 1.36(0.40) 1.38(0.24) 1.29(0.15)
RCD 0.81(0.10) 0.88(0.07) 0.92(0.04) 0.67(0.07) 0.74(0.08) 0.82(0.06) 0.78(0.08) 0.83(0.03) 0.86(0.02)
RPI 0.93(0.12) 0.99(0.02) 1.00(0.01) 0.69(0.21) 0.91(0.13) 0.95(0.04) 0.86(0.10) 0.89(0.06) 0.88(0.04)
VT-A Proportion
0.31(0.12) 0.34(0.09) 0.35(0.08) 0.15(0.10) 0.19(0.09) 0.22(0.08) 0.29(0.10) 0.30(0.06) 0.30(0.06)
1.11(0.20) 1.27(0.17) 1.30(0.15) 0.85(0.61) 1.46(0.38) 1.53(0.32) 1.76(0.36) 1.82(0.23) 1.81(0.21)
RCD 0.66(0.12) 0.69(0.09) 0.70(0.08) 0.43(0.08) 0.51(0.09) 0.55(0.09) 0.54(0.10) 0.55(0.06) 0.55(0.06)
RPI 0.97(0.06) 0.99(0.03) 1.00(0.01) 0.77(0.17) 0.95(0.09) 0.97(0.08) 0.95(0.07) 0.98(0.03) 0.98(0.03)
Proportion
0.21(0.13) 0.24(0.10) 0.26(0.07) 0.07(0.06) 0.09(0.07) 0.14(0.07) 0.23(0.09) 0.24(0.06) 0.25(0.05)
1.19(0.21) 1.37(0.18) 1.45(0.13) 1.01(0.74) 1.67(0.49) 1.78(0.38) 1.94(0.34) 2.02(0.23) 2.00(0.18)
RCD 0.70(0.12) 0.74(0.10) 0.76(0.07) 0.54(0.06) 0.59(0.07) 0.64(0.07) 0.60(0.08) 0.62(0.06) 0.62(0.05)
RPI 0.98(0.05) 1.00(0.02) 1.00(0.00) 0.81(0.20) 0.96(0.09) 0.98(0.08) 0.97(0.05) 0.99(0.02) 0.99(0.01)
Proportion
0.12(0.11) 0.11(0.11) 0.16(0.11) 0.03(0.04) 0.03(0.04) 0.07(0.05) 0.17(0.09) 0.18(0.06) 0.20(0.05)
1.25(0.23) 1.43(0.18) 1.50(0.12) 1.11(0.81) 1.81(0.61) 1.98(0.42) 2.12(0.37) 2.24(0.23) 2.19(0.20)
RCD 0.74(0.09) 0.76(0.11) 0.81(0.11) 0.65(0.03) 0.66(0.04) 0.69(0.05) 0.65(0.09) 0.67(0.06) 0.69(0.05)
RPI 0.99(0.04) 1.00(0.01) 1.00(0.00) 0.83(0.21) 0.95(0.13) 0.98(0.07) 0.99(0.03) 1.00(0.01) 1.00(0.00)
Table 3: Empirical results of subgroup analysis under the estimated optimal SSR by CAPITAL with reward in (5) and the VT-A method.
Method Scenario 1 Scenario 2 Scenario 3
CAPITAL Proportion
0.63(0.16) 0.63(0.08) 0.65(0.05) 0.44(0.24) 0.52(0.11) 0.57(0.06) 0.72(0.15) 0.75(0.07) 0.77(0.04)
0.67(0.30) 0.72(0.17) 0.70(0.11) 0.71(0.48) 0.94(0.20) 0.85(0.11) 0.67(0.35) 0.66(0.17) 0.60(0.10)
RCD 0.84(0.10) 0.91(0.05) 0.93(0.03) 0.63(0.15) 0.82(0.08) 0.87(0.03) 0.83(0.08) 0.89(0.03) 0.91(0.01)
RPI 0.78(0.13) 0.80(0.09) 0.78(0.06) 0.74(0.16) 0.88(0.09) 0.85(0.07) 0.67(0.10) 0.67(0.06) 0.65(0.04)
Proportion
0.46(0.16) 0.48(0.08) 0.50(0.05) 0.21(0.18) 0.32(0.12) 0.41(0.05) 0.56(0.16) 0.60(0.09) 0.63(0.07)
0.91(0.28) 1.01 (0.15) 0.99(0.10) 0.76(0.66) 1.32(0.27) 1.16(0.10) 1.03(0.39) 0.99(0.21) 0.93(0.16)
RCD 0.85(0.11) 0.92(0.05) 0.94(0.03) 0.62(0.12) 0.79(0.10) 0.88(0.05) 0.79(0.08) 0.85(0.03) 0.87(0.01)
RPI 0.89(0.12) 0.94(0.06) 0.95(0.05) 0.74(0.19) 0.96(0.03) 0.97(0.03) 0.78(0.11) 0.78(0.07) 0.76(0.06)
Proportion
0.30(0.16) 0.32(0.11) 0.34(0.08) 0.09(0.09) 0.14(0.10) 0.25(0.09) 0.41(0.16) 0.44(0.09) 0.48(0.06)
1.05(0.35) 1.27(0.17) 1.29(0.14) 0.71(0.76) 1.57(0.63) 1.50(0.25) 1.34(0.42) 1.40(0.25) 1.29(0.17)
RCD 0.81(0.10) 0.89(0.07) 0.92(0.04) 0.67(0.07) 0.74(0.08) 0.82(0.06) 0.77(0.08) 0.83(0.04) 0.86(0.02)
RPI 0.93(0.13) 0.99(0.03) 1.00 (0.01) 0.70(0.21) 0.92(0.14) 0.97(0.02) 0.86(0.10) 0.90(0.06) 0.88(0.05)
VT-C Proportion
0.31(0.12) 0.34(0.09) 0.35(0.08) 0.15(0.10) 0.19(0.09) 0.22(0.08) 0.29(0.10) 0.30(0.06) 0.30(0.06)
1.11(0.20) 1.27(0.17) 1.30(0.15) 0.85(0.61) 1.46(0.38) 1.53(0.32) 1.76(0.36) 1.82(0.23) 1.81(0.21)
RCD 0.66(0.12) 0.69(0.09) 0.70(0.08) 0.43(0.08) 0.51(0.09) 0.55(0.09) 0.54(0.10) 0.55(0.06) 0.55(0.06)
RPI 0.97(0.06) 0.99(0.03) 1.00(0.01) 0.77(0.17) 0.95(0.09) 0.97(0.08) 0.95(0.07) 0.98(0.03) 0.98(0.03)
Proportion
0.21(0.13) 0.24(0.10) 0.26(0.07) 0.07(0.06) 0.09(0.07) 0.14(0.07) 0.23(0.09) 0.24(0.06) 0.25(0.05)
1.19(0.21) 1.37(0.18) 1.45(0.13) 1.01(0.74) 1.67(0.49) 1.78(0.38) 1.94(0.34) 2.02(0.23) 2.00(0.18)
RCD 0.70(0.12) 0.74(0.10) 0.76(0.07) 0.54(0.06) 0.59(0.07) 0.64(0.07) 0.60(0.08) 0.62(0.06) 0.62(0.05)
RPI 0.98(0.05) 1.00(0.02) 1.00(0.00) 0.81(0.20) 0.96(0.09) 0.98(0.08) 0.97(0.05) 0.99(0.02) 0.99(0.01)
Proportion
0.12(0.11) 0.11(0.11) 0.16(0.11) 0.03(0.04) 0.03(0.04) 0.07(0.05) 0.17(0.09) 0.18(0.06) 0.20(0.05)
1.25(0.23) 1.43(0.18) 1.50(0.12) 1.11(0.81) 1.81(0.61) 1.98(0.42) 2.12(0.37) 2.24(0.23) 2.19(0.20)
RCD 0.74(0.09) 0.76(0.11) 0.81(0.11) 0.65(0.03) 0.66(0.04) 0.69(0.05) 0.65(0.09) 0.67(0.06) 0.69(0.05)
RPI 0.99(0.04) 1.00(0.01) 1.00(0.00) 0.83(0.21) 0.95(0.13) 0.98(0.07) 0.99(0.03) 1.00(0.01) 1.00(0.00)
Table 4: Empirical results of subgroup analysis under the estimated optimal SSR by CAPITAL with reward in (6) and the VT-C method.

Based on Tables 3 and 4, it is clear that the proposed method has better performance than the VT methods in all cases. To be specific, in Scenario 1 under , our method achieves a selected sample proportion of 65% for (the optimal is 65%), 50% for (the optimal is 50%), and 34% for (the optimal is 35%), with corresponding average treatment effects close to the true values. The selected sample proportion under Scenario 2 is a bit underestimated due to the fact that the density function of is concentrated around 0 as illustrated in the left panel of Figure 3. In addition, the proposed method performs well under small sizes with sightly lower selected sample proportion, and gets better as the sample size increases. In contrast, the VT methods can hardly achieve half of the desired optimal subgroup size in most cases. Lastly, by comparing Table 3 with Table 4, the simulation results are very similar under the two reward choices.

5.2 Evaluation of Multiple Constraints

In this section, we further investigate the performance of the proposed method under multiple constraints. Specifically, we aim to solve the objective in (7) with a penalized reward defined in (8). Set the penalty term as four different cases, where corresponds to (6).

We use the same setting as described in Section 5.1 with under Scenarios 1 to 3 and apply CAPITAL to find the optimal SSR within . The empirical results are reported in Table 5 under the different penalty term over 200 replications. It can be observed from Table 5 that the rate of positive individual treatment effect within the selected subgroup increases, while the rate of making correct subgroup decisions slightly decreases, as the penalty term increases in all cases. This reflects the trade-off between two constraints in our objective in (7).

Scenario 1 Scenario 2 Scenario 3
Proportion
0.63(0.16) 0.63(0.08) 0.65(0.05) 0.44(0.24) 0.51(0.11) 0.57(0.06) 0.72(0.15) 0.75(0.07) 0.77(0.04)
0.67(0.30) 0.72(0.17) 0.70(0.11) 0.71(0.48) 0.95(0.20) 0.85(0.11) 0.67(0.35) 0.66(0.17) 0.60(0.10)
RCD 0.84(0.10) 0.91(0.05) 0.93(0.03) 0.62(0.15) 0.81(0.08) 0.87(0.03) 0.83(0.08) 0.89(0.03) 0.91(0.01)
RPI 0.78(0.13) 0.80(0.09) 0.78(0.06) 0.74(0.16) 0.88(0.09) 0.85(0.07) 0.67(0.10) 0.67(0.06) 0.65(0.04)
0.55(0.12) 0.56(0.06) 0.57(0.04) 0.39(0.21) 0.48(0.10) 0.53(0.05) 0.63(0.13) 0.65(0.07) 0.66(0.05)
0.83(0.23) 0.86(0.11) 0.86(0.08) 0.77(0.48) 1.01(0.17) 0.93(0.10) 0.89(0.30) 0.88(0.16) 0.86(0.11)
RCD 0.84(0.09) 0.90(0.05) 0.91(0.03) 0.61(0.15) 0.79(0.08) 0.85(0.04) 0.81(0.09) 0.86(0.04) 0.87(0.03)
RPI 0.86(0.11) 0.88(0.07) 0.88(0.05) 0.76(0.15) 0.91(0.07) 0.90(0.05) 0.74(0.09) 0.74(0.06) 0.74(0.04)
0.52(0.11) 0.54(0.05) 0.54(0.04) 0.37(0.20) 0.46(0.09) 0.51(0.05) 0.57(0.13) 0.60(0.07) 0.61(0.05)
0.88(0.20) 0.91(0.11) 0.91(0.07) 0.79(0.48) 1.05(0.16) 0.97(0.10) 1.00(0.29) 0.99(0.16) 0.98(0.12)
RCD 0.83(0.09) 0.88(0.05) 0.89(0.04) 0.60(0.15) 0.78(0.08) 0.83(0.05) 0.78(0.10) 0.83(0.05) 0.84(0.04)
RPI 0.88(0.09) 0.90(0.06) 0.91(0.05) 0.77(0.15) 0.92(0.06) 0.92(0.05) 0.78(0.09) 0.78(0.05) 0.78(0.04)
0.49(0.11) 0.52(0.05) 0.52(0.04) 0.33(0.19) 0.43(0.09) 0.48(0.05) 0.52(0.12) 0.55(0.07) 0.55(0.05)
0.93(0.19) 0.95(0.11) 0.96(0.07) 0.83(0.51) 1.10(0.15) 1.03(0.10) 1.12(0.30) 1.11(0.16) 1.11(0.12)
RCD 0.81(0.10) 0.86(0.05) 0.87(0.04) 0.58(0.15) 0.76(0.08) 0.81(0.05) 0.74(0.10) 0.78(0.06) 0.79(0.04)
RPI 0.91(0.09) 0.92(0.06) 0.94(0.05) 0.78(0.16) 0.94(0.05) 0.94(0.04) 0.81(0.09) 0.82(0.05) 0.83(0.04)
Table 5: Empirical results of optimal subgroup selection tree by CAPITAL with the penalized reward in (8).

5.3 Evaluation of Survival Data

The data is generated by a similar model in (10) as:

We set the dimension of covariates as , and define the survival time as . Consider the following scenario:

Scenario 4:

Here, for the random noise component we consider three cases: (i) Case 1 (normal): ; (ii) Case 2 (logistic): ; (iii) Case 3 (extreme): .

Figure 6: The density function of restricted mean survival time () for Scenario 4 under different noises and censoring levels.

The censoring times are generated from a uniform distribution on , where is chosen to yield the desired censoring level 15% and 25%, respectively, each applied for the three choices of noise distributions for a total of 6 settings considered. We illustrate in Figure 6. The clinically meaningful difference in restricted mean survival time is summarized in Table 6. Each setting was selected to yield a selected sample proportion of 50%. We report the empirical results in Table 6 with the second choice of reward in (6), including the selected sample proportion under the estimated SSR, the average treatment effect of the estimated SSR, and the rate of making correct subgroup decisions by the estimated SSR, over 200 replications, using Monte Carlo approximations with standard deviations presented in the parentheses.

Censoring Level Censoring Level
Case 1 (normal) True 1.07 0.86
0.45(0.17) 0.47(0.12) 0.46(0.16) 0.48(0.11)
1.07(0.31) 1.11(0.24) 0.87(0.22) 0.87(0.16)
RCD 0.84(0.11) 0.88(0.07) 0.84(0.09) 0.90(0.06)
Case 2 (logistic) True 1.34 0.87
0.57(0.26) 0.56(0.18) 0.52(0.24) 0.52(0.18)
0.94(0.49) 1.06(0.36) 0.63(0.31) 0.75(0.24)
RCD 0.72(0.13) 0.80(0.10) 0.74(0.13) 0.82(0.09)
Case 3 (extreme) True 0.73 0.54
0.44(0.18) 0.46(0.12) 0.41(0.18) 0.44(0.12)
0.76(0.21) 0.78(0.15) 0.57(0.15) 0.58(0.11)
RCD 0.84(0.11) 0.89(0.08) 0.83(0.12) 0.88(0.08)
Table 6: Empirical results of optimal subgroup selection tree by CAPITAL for the survival data under Scenario 4 (where the optimal subgroup sample proportion is ).

Table 6 shows that the proposed method performs reasonably well under all three considered noise distributions. Both the selected sample proportion and average treatment effect under the estimated SSR get closer to the truth, and the rate of making correct subgroup decisions increases as the sample size increases. The selected sample proportion is slightly underestimated for Cases 1 and 3 where has a more concentrated density function, while marginally overestimated for Case 2 where has a more spread density function. All these findings are in accordance with our conclusions in Section 5.1.

6 Real Data Analysis

In this section, we illustrate our proposed method by application to the AIDS Clinical Trials Group Protocol 175 (ACTG 175) data as described in Hammer et al. (1996) (Hammer et al., 1996) and a Phase III clinical trial in patients with hematological malignancies from Lipkovich et al. (2017) (Lipkovich et al., 2017).

6.1 Case 1: ACTG 175 data

There were 1046 HIV-infected subjects enrolled in ACTG 175, randomized to two competing antiretroviral regimens (Hammer et al., 1996): zidovudine (ZDV) + zalcitabine (zal) (denoted as treatment 0), and ZDV+didanosine (ddI) (denoted as treatment 1). Patients were randomized in equal proportions, with 524 patients randomized to treatment 0 and 522 patients to treatment 1, with constant propensity score . We consider

baseline covariates: 1) four continuous variables: age (years), weight (kg), CD4 count (cells/mm3) at baseline, and CD8 count (cells/mm3) at baseline; and 2) eight categorical variables: hemophilia (0=no, 1=yes), homosexual activity (0=no, 1=yes), history of intravenous drug use (0=no, 1=yes), Karnofsky score (4 levels on the scale of 0-100, as 70, 80, 90, and 100), race (0=white, 1=non-white), gender (0=female), antiretroviral history (0=naive, 1=experienced), and symptomatic status (0=asymptomatic). The outcome of interest (

) is the CD4 count (cells/mm3) at 20 5 weeks. A higher CD4 count usually indicates a stronger immune system. We normalize by its mean and standard deviation. Our goal is to find the optimal subgroup selection rule that optimizes the size of the selected subgroup and achieves the desired average treatment effect.

Figure 7: The density function of the estimated contrast function for the ACTG 175 data.

The density of the estimated contrast function for the ACTG 175 data is illustrated in Figure 7. The mean contrast difference is 0.228. Based on Figure 7, we consider the clinically meaningful average treatment effects of and , respectively. We apply the proposed CAPITAL method in comparison to the virtual twin method (VT-C) (Foster et al., 2011) (because the VT-A and the VT-C methods have nearly identical performances, as shown in the simulation studies), using the same procedure as described in Section 5.1. The estimated SSRs under the proposed method are shown in Figure 8. To evaluate the proposed method and VT-C method in the ACTG 175 data, we randomly split the whole data, with 70% of the data as a training sample to find the SSR and 30% as a testing sample to evaluate its performance. Here, we consider CAPITAL without penalty, with small penalty, and with large penalty on negativity of average treatment effect, respectively. The penalty term is chosen from , where encourages a positive average treatment effect in the selected group. In Table 7, we summarize the selected sample proportion , the average treatment effect under the estimated SSR , the average treatment effect outside the subgroup , the difference of the average treatment effect within the subgroup and outside the subgroup , and the rate of positive individual treatment effect within the selected subgroup (RPI), aggregated over 200 replications with standard deviations presented in the parentheses, under different for two methods.

Figure 8: The estimated optimal subgroup selection tree using CAPITAL under the ACTG 175 data. Left panel: for . Right Panel: for .
Threshold
CAPITAL 92.8% (0.023) 82.8% (0.029)
without penalty 0.250 (0.015) 0.270 (0.016)
-0.107 (0.069) 0.004 (0.038)
0.357 (0.068) 0.266 (0.038)
RPI 83.0% (0.021) 85.1% (0.022)
CAPITAL Penalty 4 20
with small penalty 52.7% (0.052) 34.2% (0.034)
0.327 (0.022) 0.385 (0.021)
0.113 (0.021) 0.142 (0.017)
0.214 (0.027) 0.243 (0.026)
RPI 91.5% (0.029) 96.2% (0.017)
CAPITAL Penalty 20 100
with large penalty 35.6% (0.035) 19.5% (0.051)
0.381 (0.021) 0.414 (0.032)
0.139 (0.017) 0.180 (0.017)
0.242 (0.025) 0.234 (0.033)
RPI 95.9% (0.017) 96.9% (0.025)
Virtual Twins 22.1% (0.063) 10.5% (0.029)
0.462 (0.043) 0.556 (0.050)
0.159 (0.021) 0.187 (0.014)
0.302 (0.037) 0.368 (0.047)
RPI 97.8% (0.019) 99.6% (0.010)
Table 7: Evaluation results of the subgroup optimization using CAPITAL and the subgroup identification (using Virtual Twins (Foster et al., 2011)) under the ACTG 175 data.

As illustrated in Figure 8, the estimated SSRs based on the proposed method under both and rely on the weight and age of patients. For instance, for a desired average treatment effect of 0.35, younger patients ( years old) who weigh less than 91.2 kg or those years old weighting 91.2 kg may not benefit from treatment 1 (ZDV+ddI) and thus are not selected in the subgroup, while those older should be included into the subgroup who will have enhanced effects from treating with ZDV+ddI. From Table 7, it is clear that the selected sample proportion under our method is much larger than that under the TV method in all cases. Specifically, our method yields a selected sample proportion at for , and at for , without penalty. Under a penalty on negativity of average treatment effect, the size of the identified subgroup is reduced to with small penalty and to with large penalty under , and yields at with small penalty and at with large penalty under , by the proposed method. With a large penalty, our proposed method can achieve the desired average treatment effect at 0.381 (versus ) and at 0.414 (versus ). In contrast, the VT method identifies less than a quarter of the patients (22.1%) in the case of , and nearly a tenth of patients for , with overestimated average treatment effects of 0.462 and 0.556, respectively. These imply that the proposed method could largely increase the number of benefitting patients to be selected in the subgroup while also maintaining the desired clinically meaningful threshold.

6.2 Case 2: Phase III Trial for Hematological Malignancies

Next, we consider a Phase III randomized clinical trial in 599 patients with hematological malignancies (Lipkovich et al., 2017). We exclude 7 subjects with missing records and use the remaining 592 complete records consisting of 301 patients receiving the experimental therapy plus best supporting care (as treatment 1) and 291 patients only receiving the best supporting care (as treatment 0). We use the same baseline covariates selected by Lipkovich et al. (2017) (Lipkovich et al., 2017)

: 1) twelve categorical variables: gender (1=Male, 2=Female), race (1= Asian, 2=Black, 3=White), Cytogenetic markers 1 through 9 (0=Absent, 1=Present), and outcome for patient’s prior therapy (1=Failure, 2=Progression, 3=Relapse); and 2) two ordinal variables: Cytogenetic category (1=Very good, 2=Good, 3 =Intermediate, 4=Poor, 5=Very poor), and prognostic score for myelodysplastic syndromes risk assessment (IPSS) (1=Low, 2=Intermediate, 3=High, 4=Very high). These baseline covariates contain demographic and clinical information that is related to baseline disease severity and cytogenetic markers. The primary endpoint in the trial was overall survival time. Our goal is to find the optimal subgroup selection rule that maximizes the size of the selected group while achieving the desired clinically meaningful difference in restricted mean survival time in the survival data.

Figure 9: The density function of the estimated contrast function for the hematological malignancies data.

The density of the estimated contrast function for the hematological malignancies data is provided in Figure 9, with a mean treatment difference of 44.1 days. Based on Figure 9, we consider the clinically meaningful average treatment effects to be and days, respectively. We apply the proposed method and the virtual twin method (Foster et al., 2011) using the procedure described in Sections 5.3 and 6.1. The estimated SSRs under the proposed method are shown in Figure 10. The evaluation results for the hematological malignancies data are summarized in Table 8 for varying under the proposed method with and the virtual twin method. Our estimated SSRs shown in Figure 10, both using the IPSS score and the outcome for the patient’s prior therapy as the splitting features in the decision tree. With a desired average treatment effect of , patients who had a relapse during prior therapy and IPSS larger than 3 or had no relapse with IPSS larger than 2 are selected into the subgroup with an enhanced treatment effect of the experimental therapy plus best supporting care. In addition, from Table 8, we can also observe that our proposed method has a much better performance compared to the virtual twins method. To be specific, the selected sample proportion under the proposed method is much larger than that under the virtual twins method for all cases, with estimated treatment effect sizes closer to and over the desired clinically meaningful difference in restricted mean survival time as the penalty term increases. All these findings conform with the results in Section 6.1.

Figure 10: The estimated optimal subgroup selection tree using CAPITAL under the hematological malignancies data. Left panel: for . Right Panel: for .
CAPITAL 79.3% (0.031) 43.2% (0.057)
without penalty 69.5 (5.0) 101.2 (9.8)
-53.7 (17.7) 1.2 (7.2)
123.2 (16.9) 100.0 (9.3)
RPI 87.0% (0.028) 94.5% (0.034)
CAPITAL Penalty 2 2
with small penalty 71.7% (0.061) 33.9% (0.060)
74.6 (6.5) 108.4 (9.9)
-34.1 (15.8) 11.5 (8.7)
108.7 (13.2) 96.8 (9.0)
RPI 89.2% (0.027) 97.2% (0.034)
CAPITAL Penalty 4 4
with large penalty 51.9% (0.119) 30.8% (0.032)
87.2 (13.2) 112.6 (7.0)
-2.6 (15.9) 13.9 (6.5)
89.9 (10.9) 98.7 (8.9)
RPI 92.2% (0.039) 99.1% (0.015)
Virtual Twins 38.1% (0.043) 12.9% (0.117)
113.8 (6.2) 151.4 (29.2)
1.4 (7.2) 29.7 (13.9)
112.4 (7.9) 121.7 (21.4)
RPI 99.5% (0.010) 99.9% (0.003)
Table 8: Evaluation results of the subgroup optimization using CAPITAL and the subgroup identification (using Virtual Twins (Foster et al., 2011)) under the hematological malignancies data.

7 Conclusion

In this paper we proposed a constrained policy tree search method, i.e., CAPITAL, to address the subgroup optimization problem. This approach identifies the theoretically optimal subgroup selection rule that maximizes the number of selected patients under the constraint of a pre-specified clinically desired effect. Our proposed method is flexible and easy to implement in practice and has good interpretability. Extensive simulation studies show the improved performance of our proposed method over the popular virtual twins subgroup identification method, with larger selected benefitting subgroup sizes and estimated treatment effect sizes closer to the truth, and the broad usage of our methods in multiple use cases, for different trait types and varying constraint conditions.

There are several possible extensions we may consider in future work. First, we only consider two treatment options in this paper, while in clinical trials it is not uncommon to have more than two treatments available for patients. Thus, a more general method applicable to multiple treatments or even continuous treatment domains is desirable. Second, we only provide the theoretical form of the optimal SSR. It may be of interest to develop the asymptotic performance of the estimated SSR such as the convergence rate.

References

  • S. Athey and S. Wager (2021) Policy learning with observational data. Econometrica 89 (1), pp. 133–161. Cited by: §3.2, §3.2, §5.1.
  • T. Cai, L. Tian, P. H. Wong, and L. Wei (2011) Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics 12 (2), pp. 270–282. Cited by: §1.1, §1.
  • J. C. Foster, J. M. Taylor, and S. J. Ruberg (2011) Subgroup identification from randomized clinical trial data. Statistics in medicine 30 (24), pp. 2867–2880. Cited by: §1.1, Table 1, §1, §5.1, §6.1, §6.2, Table 7, Table 8.
  • H. Fu, J. Zhou, and D. E. Faries (2016) Estimating optimal treatment regimes via subgroup identification in randomized control trials and observational studies. Statistics in medicine 35 (19), pp. 3285–3302. Cited by: §1.
  • Q. Guan, B. J. Reich, E. B. Laber, and D. Bandyopadhyay (2020) Bayesian nonparametric policy search with application to periodontal recall intervals. Journal of the American Statistical Association 115 (531), pp. 1066–1078. Cited by: §1.1.
  • S. M. Hammer, D. A. Katzenstein, M. D. Hughes, H. Gundacker, R. T. Schooley, R. H. Haubrich, W. K. Henry, M. M. Lederman, J. P. Phair, M. Niu, et al. (1996) A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine 335 (15), pp. 1081–1090. Cited by: §6.1, §6.
  • K. Imai and M. Ratkovic (2013) Estimating treatment effect heterogeneity in randomized program evaluation. The Annals of Applied Statistics 7 (1), pp. 443–470. Cited by: §1.
  • M. R. Kosorok and E. B. Laber (2019) Precision medicine. Annual review of statistics and its application 6, pp. 263–286. Cited by: §1.
  • I. Lipkovich, A. Dmitrienko, and R. B D’Agostino Sr (2017) Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials. Statistics in medicine 36 (1), pp. 136–196. Cited by: §1.1, §1, §6.2, §6.
  • W. Loh, L. Cao, and P. Zhou (2019) Subgroup identification for precision medicine: a comparative review of 13 methods. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9 (5), pp. e1326. Cited by: §1.
  • W. Loh, X. He, and M. Man (2015) A regression tree approach to identifying subgroups with differential treatment effects. Statistics in medicine 34 (11), pp. 1818–1833. Cited by: §1.
  • M. Lu, S. Sadiq, D. J. Feaster, and H. Ishwaran (2018) Estimating individual treatment effect in observational data using random forest methods. Journal of Computational and Graphical Statistics 27 (1), pp. 209–219. Cited by: §3.2.
  • D. B. Rubin (1978) Bayesian inference for causal effects: the role of randomization. The Annals of statistics 6, pp. 34–58. External Links: ISBN 0090-5364 Cited by: §2.
  • S. Sivaganesan, P. W. Laud, and P. Müller (2011) A bayesian subgroup analysis with a zero-enriched polya urn scheme. Statistics in medicine 30 (4), pp. 312–323. Cited by: §1.
  • X. Song and M. S. Pepe (2004) Evaluating markers for selecting a patient’s treatment. Biometrics 60 (4), pp. 874–883. Cited by: §1.1, §1.
  • X. Su, C. Tsai, H. Wang, D. M. Nickerson, and B. Li (2009) Subgroup analysis via recursive partitioning..

    Journal of Machine Learning Research

    10 (2).
    Cited by: §1.
  • Y. Wang, H. Fu, and D. Zeng (2018) Learning optimal personalized treatment rules in consideration of benefit and risk: with an application to treating type 2 diabetes patients with insulin therapies. Journal of the American Statistical Association 113 (521), pp. 1–13. Cited by: §1.1, §1.
  • J. Zhou, J. Zhang, W. Lu, and X. Li (2021) On restricted optimal treatment regime estimation for competing risks data. Biostatistics 22 (2), pp. 217–232. Cited by: §1.1, §1.
  • Z. Zhou, S. Athey, and S. Wager (2018) Offline multi-action policy learning: generalization and optimization. arXiv preprint arXiv:1810.04778. Cited by: §5.1.

Appendix A Proof of Theorem 1

The proof of Theorem 3.1 consists of two parts. First, we show the optimal subgroup selection rule is where satisfies (2). Second, we derive the equivalence between (3) and (4). Without loss of generality, we focus on the class of SSRs as

Part One: To show is the optimal SSR that solves (1), it is equivalent to show the SSR satisfies the constraint in (1) and maximizes the size of subgroup.

First, based on assumptions (A1) and (A2), the average treatment effect under a SSR for a parameter can be represented by

which is a non-decreasing function of the cut point . Given the definition in (2) that , we have to satisfies the constraint in (1).

Second, the probability of falling into subgroup under the SSR as

is a non-increasing function of the cut point .

To maximize the size of subgroup, we need to select the smallest cut point from its constraint range . Thus, the optimal cut point is , which gives the optimal SSR as as the solution of (1). This completes the proof of (3).


Part Two: We next focus on proving the optimal SSR in (3) is equivalent to the SSR in (4). Based on the definition in (4), we have