1 Introduction
Personalized medicine, a paradigm of medicine tailored to a patient’s characteristics, is an increasingly attractive field in health care (Kosorok and Laber, 2019). Its ultimate goal is to optimize the outcome of interest by assigning the right treatment to the right patients. To ensure the success of personalized medicine, it is important to identify a subgroup of patients, based on baseline covariates, that benefits more from the targeted treatment than other comparative treatments (Loh et al., 2019). The resulting identification strategy is referred to as a subgroup selection rule (SSR). Subgroup analysis, if properly used, can lead to more well informed clinical decisions and improved demonstration of the efficacy of the treatment.
Though various datadriven methods for subgroup identification (Song and Pepe, 2004; Su et al., 2009; Foster et al., 2011; Cai et al., 2011; Sivaganesan et al., 2011; Imai and Ratkovic, 2013; Loh et al., 2015; Fu et al., 2016) have been developed during the recent decade (see a comprehensive review in Lipkovich et al. (2017) (Lipkovich et al., 2017)), these works focus only on obtaining a subgroup with an enhanced treatment effect or identifying patients who benefit more from the new treatment, and usually yield a smaller and thus less satisfactory group of selected patients. To see this, we apply the virtual twins (VT) method (Foster et al., 2011) to identify the subgroup in a simulated scenario (Scenario 1; see detailed setting in Section 5.1) for an illustration. The desired average treatment effect is 1.0 with the corresponding optimal subgroup sample proportion of , i.e., half of the population should be selected into the subgroup. Yet, as summarized in Table 1 over 200 replications, the selected sample proportion under the VT method is less than even under sample size with an overestimated average treatment effect of . Identifying the largest possible subgroup of patients that benefit from a given treatment at or above some clinically meaningful threshold can be critical both for the success of a new treatment, and most importantly for the patients who may rely on a treatment for their health and survival. When too small of a subgroup is selected, the erroneously unselected patients may suffer from suboptimal treatments. For a test treatment, this reduced subgroup size can further lead to problems with regulatory approvals or drug reimbursements that in extreme cases may even halt compound development and availability. In the above example, where less than 30% of patients are selected as benefitting from the new treatment, a drug approval may be unlikely, though in truth half of all subjects do show substantial improvement in health from the new treatment. Postapproval accessibility can also be hindered by a lackluster subgroup size, especially in countries with allornothing reimbursement markets where the seemingly low proportion of benefiting patients leads to low reimbursements that may not be financially sustainable for continued treatment manufacturing. A subgroup learning approach that selects as many patients as possible with evidence of a clinically meaningful benefit from treatment is thus desired so that more patients can receive the better treatment.
In this paper, we aim to solve the subgroup optimization that finds the optimal SSR to maximize the number of the selected patients, and in the meantime, achieve the prespecified clinically desired mean outcome, such as the average treatment effect. There are two major difficulties in developing the optimal SSR. First, there is a tradeoff between the size of the selected subgroup and its corresponding average treatment effect: the more patients selected, the lower average treatment effect we can achieve. To optimize the size of the subgroup and also maintain the enhanced treatment effect, constrained optimization is required. Second, most of the existing optimization approaches with constraints (see e.g., Wang et al., 2018 (Wang et al., 2018) and Zhou et al., 2021 (Zhou et al., 2021)) used complex decision rules and thus were hard to interpret. In this paper, we focus on the treebased decision rules to develop an interpretable optimal SSR.
Our contributions can be summarized as follows. First, we derive two equivalent theoretical forms of the optimal SSR based on the contrast function that describes the treatmentcovariates interaction in the outcome. Second, according to the theoretical optimal SSR, we propose a ConstrAined PolIcy Tree seArch aLgorithm (CAPITAL) to optimize the subgroup size and achieve the prespecified clinical threshold. Specifically, we transform the loss function of the constrained optimization into individual rewards defined at the patient level. This enables us to identify the patients with a large mean outcome and develop a decision tree to generate an interpretable subgroup. For instance, recall the toy example at the beginning of this section. In contrast to the current subgroup identification methods, the selected sample proportion under the proposed method is nearly optimal at
and its average treatment effect under the estimated SSR is close to the truth at
. See details provided in Table 1 and Section 5.1. Third, we extend our proposed method to the framework with multiple constraints that penalize the inclusion of patients with negative treatment effects, and to time to event data, using the restricted mean survival time as the clinically interesting mean outcome. Extensive simulations, comparison studies, and real data applications are conducted to demonstrate the validity and utility of our method. The source code is publicly available at our repository at https://github.com/HengruiCai/CAPITAL implemented in R language.Method  Results  
Virtual Twins  Selected Sample Proportion  0.21(0.13)  0.24(0.10)  0.26(0.07) 
Average Treatment Effect  1.19(0.21)  1.37(0.18)  1.45(0.13)  
CAPITAL  Selected Sample Proportion  0.46(0.16)  0.48(0.09)  0.50(0.06) 
Average Treatment Effect  0.90(0.27)  1.00(0.15)  0.99(0.11) 
. The results are averaged over 200 replications with standard deviations presented in the parentheses.
1.1 Related Works
There are numerous datadriven methods proposed for subgroup identification. Song and Pepe (2004) (Song and Pepe, 2004) considered using the selection impact curve to evaluate treatment policies for a binary outcome based on a single baseline covariate. Foster et al. (2011) (Foster et al., 2011) developed a virtual twins method which first predicts the counterfactual outcome for each individual under both the test and control treatments, and then uses treebased methods to infer the subgroups with an enhanced treatment effect. Cai et al. (2011) (Cai et al., 2011) proposed using parametric scoring systems based on multiple baseline covariates to rank treatment effects and then identified patients who benefit more from the new treatment using the ranked effect sizes. A useful tutorial and preliminary literature review for commonly used subgroup identification methods is provided in Lipkovich et al. (2017) (Lipkovich et al., 2017). Yet, all these methods focus on subgroup identification but not subgroup optimization, potentially leading to a greatly reduced number of selected patients. More details can be found in our comparison studies (Section 5.1).
Recently, a number of approaches have been developed to handle the constrained optimization problems. Wang et al. (2018) (Wang et al., 2018) proposed an individualized optimal decision rule that maximizes the clinical benefit for patients and controls the risk of adverse events, based on outcome weighted learning. Guan et al. (2020) (Guan et al., 2020) estimated the optimal dynamic treatment regime under a constraint on the cost function by leveraging nonparametric Bayesian dynamics modeling with policy search algorithms. To handle the tradeoff between the primary event of interest and the time to severe side effects of treatment in the competing risks data, Zhou et al. (2021) (Zhou et al., 2021) derived a restricted optimal treatment regime based on the penalized value search method. However, none of the cited works are applicable to our problem, as they only focus on optimizing the mean outcome of interest while we also consider the size of the subgroup. In addition, since the loss functions in both outcome weighted learning and value search methods are defined based on the whole sample, it is infeasible to search the interpretable class of decision trees using these methods.
The rest of this paper is organized as follows. We first formulate our problem in Section 2. In Section 3, we establish the theoretical optimal SSR that achieves our objective, and then propose CAPITAL to solve the optimal SSR. We extend our work to multiple constraints and survival data in Section 4. Simulation and comparison studies are conducted to evaluate our methods in Section 5, followed by the real data analysis in Section 6. In Section 7, we conclude our paper. All the technical proofs and additional simulation results are provided in the appendix.
2 Problem Formulation
Let denote a
dimensional vector containing individual’s baseline covariates with the support
, and denote the binary treatment an individual receives. After a treatment is assigned, we observe the outcome of interest with support . Let anddenote the potential outcomes that would be observed after an individual receives treatment 0 or 1, respectively. Define the propensity score function as the conditional probability of receiving treatment 1 given baseline covariates
, denoted as . Denote as the sample size. The sample consists of observations independent and identically distributed (I.I.D.) across .As standard in the causal inference literature (Rubin, 1978), we make the following assumptions:
(A1). Stable Unit Treatment Value Assumption (SUTVA):
(A2). Ignorability:
(A3). Positivity: for all .
Based on assumptions (A1) and (A2), we define the contrast function as
that describes the treatmentcovariates interaction in the outcome. Under assumptions (A1) to (A3), the contrast function is estimable from the observed data.
Define the subgroup selection rule (SSR) as that assigns the patient with baseline covariates to the subgroup () or not (). Denote the class of the SSR as . The goal is to find an optimal SSR that maximizes the size of the subgroup and also maintains a desired mean outcome such as the average treatment effect (), i.e.,
(1)  
where is a prespecified threshold of clinically meaningful average treatment effect.
3 Method
In this section, we first establish the theoretical optimal SSR that achieves our objective, and then propose CAPITAL to solve the optimal SSR.
3.1 Theoretical Optimal SSR
We first derive the theoretical optimal SSR that solves the objective in (1). Based on assumptions (A1) and (A2), the constraint in (1) can be represented by
Given the prespecified threshold , we denote a cut point associated with the contrast function such that the expectation of the contrast function larger than achieves , i.e.,
(2) 
By introducing , when we are maximizing the subgroup size, the treatment effect of each patient in the subgroup is ensured to meet the minimum acceptable beneficial effect size. We illustrate the density function of the contrast function with a cut point for the prespecified threshold in Figure 1. The yellow area in Figure 1 contains the patients whose contrast functions are larger than and thus satisfy (2).
Intuitively, the theoretical optimal SSR should choose the patients whose contrast functions fall into the yellow area in Figure 1, i.e., those whose treatment effects are larger than , to maximize the size of the subgroup. Without loss of generality, we consider the class of the theoretical SSRs as
Here, for a given , the SSR selects a patient into the subgroup if his / her contrast function is larger than . The following theorem gives the theoretical optimal SSR.
Theorem 3.1
(Theoretical Optimal SSR) Assuming (A1) and (A2), the optimal subgroup selection rule is
(3) 
Equivalently, the optimal subgroup selection rule is
(4) 
The proof of Theorem 3.1 consists of two parts. First, we show the optimal SSR is , where satisfies (2), within the class . Second, we derive the equivalence between (3) and (4). See the detailed proof of Theorem 3.1 provided in the appendix.
From Theorem 3.1 and the definition of the cut point , the optimal SSR can be found based on the density of the contrast function. Since the density function is usually unknown to us in reality, we use the estimated contrast function for each patient, i.e., the individual treatment effect, to approximate the density function. A constrained policy tree search algorithm is provided to solve the optimal SSR in the next section.
3.2 Constrained Policy Tree Search Algorithm
In this section, we formally present CAPITAL. First, we transform the constrained optimization in (1) into individual rewards defined at the patient level. This enables us to identify patients more likely to benefit from treatment. Then, we develop a decision tree to partition these patients into the subgroups based on the policy tree algorithm proposed by Athey and Wager (2017) (Athey and Wager, 2021).
We focus on the SSR in the class of finitedepth decision trees. Specifically, for any , a depth decision tree is specified via a splitting variable , a threshold , and two depth decision trees , and , such that if , and otherwise. Denote the class of decision trees as . We illustrate a simple decision tree with splitting variables and in Figure 2. This decision tree has a mathematical form as .
Define as the difference between the contrast function and the desired average treatment effect . Under (A1)(A3), we can estimate the contrast function, denoted as
, using the random forest method and outofbag prediction (see e.g., Lu et al., 2018
(Lu et al., 2018)). Define . It is immediate that a patient with larger is more likely to be selected into the subgroup based on Figure 1. We sort the estimates asThis sequence gives an approximation of the density of .
We further define the cumulative mean based on the above sequence as
With sufficiently large sample size, converges to the average treatment effect minus the desired effect , within the selected patients whose contrast function is larger than the upper quantile of the density of , i.e.,
where is the upper quantile of the density of when goes to infinity.
As long as is larger than zero, the selected subgroup satisfies the condition in (1) based on the theoretical optimal SSR in (4) from Theorem 3.1. Therefore, we need to select patients with positive and maximize the subgroup size to solve (1). To do this, we define the reward of the th individual based on the sign of as follows:
Reward 1:
(5) 
where is the rank of in the sequence or the sequence , and ‘sign’ is the sign operator such that if , if , and if . Given is positive, the reward is 1 if the patient is selected to be part of the subgroup, and is 0 otherwise. Likewise, supposing is negative, the reward is if the patient is selected to be in the subgroup, i.e., , and is 0 otherwise. This is in accordance with the intuition that we should select patients with larger than zero.
To encourage the decision tree to include patients who have a lager treatment effect, we also propose the following reward choice based on the value of directly:
Reward 2:
(6) 
The optimal SSR is searched within the decision tree class to maximize the sum of the individual rewards defined in (5) or (6). Specifically, the decision tree allocates each patient to the subgroup or not, and receives the corresponding rewards. We use the exhaustive search to estimate the optimal SSR that optimizes the total reward, using the policy tree algorithm proposed in Athey and Wager (2017) (Athey and Wager, 2021). It is shown in the simulation studies (Section 5) that the performances are very similar under these two reward choices.
We denote the estimated optimal SSR that maximizes the size of the subgroup and also maintains the desired average treatment effect as . The proposed algorithm not only results in an interpretable SSR (see more discussion in Section 5), but also is flexible to handle multiple constraints and survival data, as discussed in detail in the next section.
4 Extensions
In this section, we discuss two main extensions of CAPITAL for solving (1). We first address multiple constraints on the average treatment effect in Section 4.1, and then handle the time to event data with the restricted mean survival time as the clinically interesting mean outcome in Section 4.2.
4.1 Extension to Multiple Constraints
In addition to the main constraint described in (1), in reality there may exist secondary constraints of interest. For instance, besides a desired average treatment effect, the individual treatment effect for each patient should be greater than some minimum beneficial value. Under such multiple constraints, the optimal SSR is defined by
(7)  
where is a prespecified minimum beneficial value. In the rest of this paper, we focus on the case with , that is, the individual treatment effect for each patient should be nonnegative so that the treatment is beneficial to the patients in the selected group.
The above objective function can be solved by modifying CAPITAL presented in Section 3.2. Specifically, we define the reward of the th individual based on (7) and (6) as follows.
Reward 3:
(8) 
where is the nonnegative penalty parameter that represents the tradeoff between the first and the second constraint. When , the reward defined in (8) reduces to (6). Here, we only add the penalty on the reward when the estimated contrast function is negative, i.e., . This prevents the method from selecting patients with a negative individual treatment effect.
4.2 Extension to Survival Data
We next consider finding the optimal SSR for a survival endpoint. Let and denote the survival time of interest and the censoring time, respectively. Assume that and are independent given baseline covariates and the treatment. Then, the observed dataset consists of independent and identically distributed triplets, , where and . The goal is to maximize the size of the subgroup with a prespecified clinically desired effect , i.e.,
(9)  
where is the maximum follow up time, which is prespecified or can be estimated based on the observed data.
Denote and as the restricted mean survival time for groups with treatment 0 and 1, respectively, given baseline covariate , where and are survival functions in the control and treatment groups, respectively. To estimate and , we first fit a random forest on the survival functions in the control and treatment groups, respectively, and get the estimations as and . Then, the estimated restricted mean survival time for groups with treatment 0 and 1, denoted as and , are calculated by integrating the estimated survival functions to the minimum of the maximum times over the 2 arms. Define to capture the distance from the estimated contrast function to the desired difference in restricted mean survival time for the th individual. It is immediate that an individual with larger is more likely to be selected into the subgroup. We sort the estimates as and define the cumulative mean as . The reward for the constrained policy tree search can be defined following similar arguments as in (5) and (6).
5 Simulation Studies
5.1 Evaluation and Comparison with Average Treatment Effect
Suppose baseline covariates , the treatment information , and the outcome are generated from the following model:
(10) 
where is the baseline function of the outcome, is the contrast function, is the random error. We set the dimension of covariates as and consider the following three scenarios respectively.
Scenario 1:
Scenario 2:
Scenario 3:
The true average treatment effect can be calculated as 0 under all scenarios. We illustrate the density of for Scenarios 2 and 3 in Figure 3. Note the density of
for Scenarios 1 is just a uniform distribution on interval
. Based on Figure 3, we consider the clinically meaningful treatment effect for all scenarios, with the corresponding optimal subgroup sample proportions as listed in Table 3. Let the total sample size be chosen from the set .We apply CAPITAL to find the optimal SSR. The policy is searched within based on the R package ‘policytree’ (Athey and Wager, 2021; Zhou et al., 2018). For better demonstration, we focus on decision trees. To illustrate the interpretability of the resulting SSR, we show more specific results of three particular simulation replicates (as replicate No.1, No.2, and No.3) under Scenario 2 with using the first choice of reward in (6) for . The estimated SSR under these three selected replicates are shown in Figure 4, with the splitting variables and their splitting thresholds reported in Table 2. We summarize the selected sample proportion under the estimated SSR, the average treatment effect of the estimated SSR, and the rate of making correct subgroup decisions by the estimated SSR, using Monte Carlo approximations. Finally, we visualize the density function of within the subgroup selected by the estimated SSR, with comparison to that of unselected patients, for three replicates in Figure 5.
Simulation  Replicate No.1  Replicate No.2  Replicate No.3 
44.5%  49.2%  55.0%  
1.11  1.00  0.90  
Rate of Correct Decision  91.85%  92.01%  94.45% 
Split Variable (Split Value)  
(Left) Split Variable (Split Value)  
(Right) Split Variable (Split Value) 
Over 200 replicates, the rate of correctly identifying important features and under the estimated SSRs is 70.8% with , increasing to 95.8% with , and 100.0% with , under Scenario 2 with . It can be seen from both Figure 4 and Table 2 that the estimated SSRs under the proposed method identify the important features and that determine the outcome for all three replicates. In Scenario 2, and have identical roles in the contrast function, so the resulting optimal tree can either use or as the first splitting variable. Replicate No.3 overselects the subgroup and therefore yields a lower average treatment effect, while replicate No.1 underselects the subgroup and achieves a higher average treatment effect, as shown in Table 2. This finding is in line with the tradeoff between the size of the selected subgroup and its corresponding average treatment effect discussed in the introduction. Moreover, all these three replicates have a high rate () of making correct subgroup decisions under the estimated SSRs, supported by both Table 2 and Figure 5.
In addition, we compare the proposed method with the VT method (Foster et al., 2011). Though the VT method can theoretically be used for both binary and continuous outcomes, the current R package ‘aVirtualTwins’ only deals with binary outcomes in a twoarmed clinical trial. To address the continuous outcomes in Scenarios 13, following the VT method (Foster et al., 2011), we fit the estimated individual treatment effect on features via a regression tree. We next consider two subgroup selection rules based on the VT method.
VTA: Denote the average treatment effect within a terminal node as . The final subgroup is formed as the union of the terminal nodes where the predicted values are greater than .
VTC: Denote . Then each terminal node
is classified into the subgroup based on a majority vote within the node by
. The final subgroup is defined as the union of the terminal nodes with .We apply the proposed method, the VTA and VTC methods for Scenarios 13 with 200 replications. We summarize the selected sample proportion as and the average treatment effect as under the estimated SSR, the rate of making correct subgroup decisions by the estimated SSR (RCD, the number of correct subgroup decisions divided by the total sample size), and the rate of positive individual treatment effect within the selected subgroup (RPI, the number of positive individual treatment effects divided by the size of the selected subgroup), aggregated over 200 replications, using Monte Carlo approximations, with standard deviations presented. Since the VTA and the VTC methods have nearly identical results, and performances under our method with reward (5) and with reward (6) are similar, for a better demonstration on the comparison results, we report the empirical results in Tables 3 for the proposed method with reward (5) and the VTA method, and in Tables 4 for the proposed method with reward (6) and the VTC method.
Method  Scenario 1  Scenario 2  Scenario 3  

CAPITAL  Proportion  
0.62(0.16)  0.63(0.08)  0.65(0.05)  0.42(0.23)  0.51(0.11)  0.56(0.05)  0.72(0.15)  0.74(0.08)  0.77(0.05)  
0.66(0.28)  0.72(0.17)  0.69(0.10)  0.72(0.47)  0.96(0.20)  0.86(0.11)  0.66(0.34)  0.67(0.18)  0.61(0.11)  
RCD  0.83(0.10)  0.91(0.05)  0.93(0.03)  0.62(0.15)  0.81(0.08)  0.87(0.03)  0.83(0.08)  0.89(0.03)  0.90(0.01)  
RPI  0.78(0.13)  0.80(0.09)  0.78(0.06)  0.74(0.15)  0.88(0.08)  0.86(0.06)  0.67(0.10)  0.67(0.06)  0.65(0.04)  
Proportion  
0.46(0.16)  0.48(0.09)  0.50(0.06)  0.21(0.17)  0.32(0.12)  0.40(0.06)  0.56(0.16)  0.59(0.09)  0.62(0.06)  
0.90(0.27)  1.00(0.15)  0.99(0.11)  0.83(0.63)  1.31(0.27)  1.17(0.11)  1.02(0.37)  1.00(0.20)  0.94(0.15)  
RCD  0.84(0.11)  0.91(0.05)  0.94(0.03)  0.62(0.12)  0.79(0.11)  0.88(0.05)  0.79(0.07)  0.85(0.03)  0.87(0.01)  
RPI  0.88(0.11)  0.94(0.06)  0.94(0.05)  0.75(0.19)  0.95(0.05)  0.97(0.03)  0.78(0.11)  0.78(0.06)  0.77(0.05)  
Proportion  
0.30(0.16)  0.31(0.11)  0.34(0.08)  0.09(0.09)  0.14(0.10)  0.25(0.09)  0.41(0.15)  0.44(0.09)  0.48(0.06)  
1.05(0.33)  1.28(0.18)  1.29(0.14)  0.66(0.73)  1.58(0.59)  1.48(0.24)  1.36(0.40)  1.38(0.24)  1.29(0.15)  
RCD  0.81(0.10)  0.88(0.07)  0.92(0.04)  0.67(0.07)  0.74(0.08)  0.82(0.06)  0.78(0.08)  0.83(0.03)  0.86(0.02)  
RPI  0.93(0.12)  0.99(0.02)  1.00(0.01)  0.69(0.21)  0.91(0.13)  0.95(0.04)  0.86(0.10)  0.89(0.06)  0.88(0.04)  
VTA  Proportion  
0.31(0.12)  0.34(0.09)  0.35(0.08)  0.15(0.10)  0.19(0.09)  0.22(0.08)  0.29(0.10)  0.30(0.06)  0.30(0.06)  
1.11(0.20)  1.27(0.17)  1.30(0.15)  0.85(0.61)  1.46(0.38)  1.53(0.32)  1.76(0.36)  1.82(0.23)  1.81(0.21)  
RCD  0.66(0.12)  0.69(0.09)  0.70(0.08)  0.43(0.08)  0.51(0.09)  0.55(0.09)  0.54(0.10)  0.55(0.06)  0.55(0.06)  
RPI  0.97(0.06)  0.99(0.03)  1.00(0.01)  0.77(0.17)  0.95(0.09)  0.97(0.08)  0.95(0.07)  0.98(0.03)  0.98(0.03)  
Proportion  
0.21(0.13)  0.24(0.10)  0.26(0.07)  0.07(0.06)  0.09(0.07)  0.14(0.07)  0.23(0.09)  0.24(0.06)  0.25(0.05)  
1.19(0.21)  1.37(0.18)  1.45(0.13)  1.01(0.74)  1.67(0.49)  1.78(0.38)  1.94(0.34)  2.02(0.23)  2.00(0.18)  
RCD  0.70(0.12)  0.74(0.10)  0.76(0.07)  0.54(0.06)  0.59(0.07)  0.64(0.07)  0.60(0.08)  0.62(0.06)  0.62(0.05)  
RPI  0.98(0.05)  1.00(0.02)  1.00(0.00)  0.81(0.20)  0.96(0.09)  0.98(0.08)  0.97(0.05)  0.99(0.02)  0.99(0.01)  
Proportion  
0.12(0.11)  0.11(0.11)  0.16(0.11)  0.03(0.04)  0.03(0.04)  0.07(0.05)  0.17(0.09)  0.18(0.06)  0.20(0.05)  
1.25(0.23)  1.43(0.18)  1.50(0.12)  1.11(0.81)  1.81(0.61)  1.98(0.42)  2.12(0.37)  2.24(0.23)  2.19(0.20)  
RCD  0.74(0.09)  0.76(0.11)  0.81(0.11)  0.65(0.03)  0.66(0.04)  0.69(0.05)  0.65(0.09)  0.67(0.06)  0.69(0.05)  
RPI  0.99(0.04)  1.00(0.01)  1.00(0.00)  0.83(0.21)  0.95(0.13)  0.98(0.07)  0.99(0.03)  1.00(0.01)  1.00(0.00) 
Method  Scenario 1  Scenario 2  Scenario 3  

CAPITAL  Proportion  
0.63(0.16)  0.63(0.08)  0.65(0.05)  0.44(0.24)  0.52(0.11)  0.57(0.06)  0.72(0.15)  0.75(0.07)  0.77(0.04)  
0.67(0.30)  0.72(0.17)  0.70(0.11)  0.71(0.48)  0.94(0.20)  0.85(0.11)  0.67(0.35)  0.66(0.17)  0.60(0.10)  
RCD  0.84(0.10)  0.91(0.05)  0.93(0.03)  0.63(0.15)  0.82(0.08)  0.87(0.03)  0.83(0.08)  0.89(0.03)  0.91(0.01)  
RPI  0.78(0.13)  0.80(0.09)  0.78(0.06)  0.74(0.16)  0.88(0.09)  0.85(0.07)  0.67(0.10)  0.67(0.06)  0.65(0.04)  
Proportion  
0.46(0.16)  0.48(0.08)  0.50(0.05)  0.21(0.18)  0.32(0.12)  0.41(0.05)  0.56(0.16)  0.60(0.09)  0.63(0.07)  
0.91(0.28)  1.01 (0.15)  0.99(0.10)  0.76(0.66)  1.32(0.27)  1.16(0.10)  1.03(0.39)  0.99(0.21)  0.93(0.16)  
RCD  0.85(0.11)  0.92(0.05)  0.94(0.03)  0.62(0.12)  0.79(0.10)  0.88(0.05)  0.79(0.08)  0.85(0.03)  0.87(0.01)  
RPI  0.89(0.12)  0.94(0.06)  0.95(0.05)  0.74(0.19)  0.96(0.03)  0.97(0.03)  0.78(0.11)  0.78(0.07)  0.76(0.06)  
Proportion  
0.30(0.16)  0.32(0.11)  0.34(0.08)  0.09(0.09)  0.14(0.10)  0.25(0.09)  0.41(0.16)  0.44(0.09)  0.48(0.06)  
1.05(0.35)  1.27(0.17)  1.29(0.14)  0.71(0.76)  1.57(0.63)  1.50(0.25)  1.34(0.42)  1.40(0.25)  1.29(0.17)  
RCD  0.81(0.10)  0.89(0.07)  0.92(0.04)  0.67(0.07)  0.74(0.08)  0.82(0.06)  0.77(0.08)  0.83(0.04)  0.86(0.02)  
RPI  0.93(0.13)  0.99(0.03)  1.00 (0.01)  0.70(0.21)  0.92(0.14)  0.97(0.02)  0.86(0.10)  0.90(0.06)  0.88(0.05)  
VTC  Proportion  
0.31(0.12)  0.34(0.09)  0.35(0.08)  0.15(0.10)  0.19(0.09)  0.22(0.08)  0.29(0.10)  0.30(0.06)  0.30(0.06)  
1.11(0.20)  1.27(0.17)  1.30(0.15)  0.85(0.61)  1.46(0.38)  1.53(0.32)  1.76(0.36)  1.82(0.23)  1.81(0.21)  
RCD  0.66(0.12)  0.69(0.09)  0.70(0.08)  0.43(0.08)  0.51(0.09)  0.55(0.09)  0.54(0.10)  0.55(0.06)  0.55(0.06)  
RPI  0.97(0.06)  0.99(0.03)  1.00(0.01)  0.77(0.17)  0.95(0.09)  0.97(0.08)  0.95(0.07)  0.98(0.03)  0.98(0.03)  
Proportion  
0.21(0.13)  0.24(0.10)  0.26(0.07)  0.07(0.06)  0.09(0.07)  0.14(0.07)  0.23(0.09)  0.24(0.06)  0.25(0.05)  
1.19(0.21)  1.37(0.18)  1.45(0.13)  1.01(0.74)  1.67(0.49)  1.78(0.38)  1.94(0.34)  2.02(0.23)  2.00(0.18)  
RCD  0.70(0.12)  0.74(0.10)  0.76(0.07)  0.54(0.06)  0.59(0.07)  0.64(0.07)  0.60(0.08)  0.62(0.06)  0.62(0.05)  
RPI  0.98(0.05)  1.00(0.02)  1.00(0.00)  0.81(0.20)  0.96(0.09)  0.98(0.08)  0.97(0.05)  0.99(0.02)  0.99(0.01)  
Proportion  
0.12(0.11)  0.11(0.11)  0.16(0.11)  0.03(0.04)  0.03(0.04)  0.07(0.05)  0.17(0.09)  0.18(0.06)  0.20(0.05)  
1.25(0.23)  1.43(0.18)  1.50(0.12)  1.11(0.81)  1.81(0.61)  1.98(0.42)  2.12(0.37)  2.24(0.23)  2.19(0.20)  
RCD  0.74(0.09)  0.76(0.11)  0.81(0.11)  0.65(0.03)  0.66(0.04)  0.69(0.05)  0.65(0.09)  0.67(0.06)  0.69(0.05)  
RPI  0.99(0.04)  1.00(0.01)  1.00(0.00)  0.83(0.21)  0.95(0.13)  0.98(0.07)  0.99(0.03)  1.00(0.01)  1.00(0.00) 
Based on Tables 3 and 4, it is clear that the proposed method has better performance than the VT methods in all cases. To be specific, in Scenario 1 under , our method achieves a selected sample proportion of 65% for (the optimal is 65%), 50% for (the optimal is 50%), and 34% for (the optimal is 35%), with corresponding average treatment effects close to the true values. The selected sample proportion under Scenario 2 is a bit underestimated due to the fact that the density function of is concentrated around 0 as illustrated in the left panel of Figure 3. In addition, the proposed method performs well under small sizes with sightly lower selected sample proportion, and gets better as the sample size increases. In contrast, the VT methods can hardly achieve half of the desired optimal subgroup size in most cases. Lastly, by comparing Table 3 with Table 4, the simulation results are very similar under the two reward choices.
5.2 Evaluation of Multiple Constraints
In this section, we further investigate the performance of the proposed method under multiple constraints. Specifically, we aim to solve the objective in (7) with a penalized reward defined in (8). Set the penalty term as four different cases, where corresponds to (6).
We use the same setting as described in Section 5.1 with under Scenarios 1 to 3 and apply CAPITAL to find the optimal SSR within . The empirical results are reported in Table 5 under the different penalty term over 200 replications. It can be observed from Table 5 that the rate of positive individual treatment effect within the selected subgroup increases, while the rate of making correct subgroup decisions slightly decreases, as the penalty term increases in all cases. This reflects the tradeoff between two constraints in our objective in (7).
Scenario 1  Scenario 2  Scenario 3  
Proportion  
0.63(0.16)  0.63(0.08)  0.65(0.05)  0.44(0.24)  0.51(0.11)  0.57(0.06)  0.72(0.15)  0.75(0.07)  0.77(0.04)  
0.67(0.30)  0.72(0.17)  0.70(0.11)  0.71(0.48)  0.95(0.20)  0.85(0.11)  0.67(0.35)  0.66(0.17)  0.60(0.10)  
RCD  0.84(0.10)  0.91(0.05)  0.93(0.03)  0.62(0.15)  0.81(0.08)  0.87(0.03)  0.83(0.08)  0.89(0.03)  0.91(0.01)  
RPI  0.78(0.13)  0.80(0.09)  0.78(0.06)  0.74(0.16)  0.88(0.09)  0.85(0.07)  0.67(0.10)  0.67(0.06)  0.65(0.04)  
0.55(0.12)  0.56(0.06)  0.57(0.04)  0.39(0.21)  0.48(0.10)  0.53(0.05)  0.63(0.13)  0.65(0.07)  0.66(0.05)  
0.83(0.23)  0.86(0.11)  0.86(0.08)  0.77(0.48)  1.01(0.17)  0.93(0.10)  0.89(0.30)  0.88(0.16)  0.86(0.11)  
RCD  0.84(0.09)  0.90(0.05)  0.91(0.03)  0.61(0.15)  0.79(0.08)  0.85(0.04)  0.81(0.09)  0.86(0.04)  0.87(0.03)  
RPI  0.86(0.11)  0.88(0.07)  0.88(0.05)  0.76(0.15)  0.91(0.07)  0.90(0.05)  0.74(0.09)  0.74(0.06)  0.74(0.04)  
0.52(0.11)  0.54(0.05)  0.54(0.04)  0.37(0.20)  0.46(0.09)  0.51(0.05)  0.57(0.13)  0.60(0.07)  0.61(0.05)  
0.88(0.20)  0.91(0.11)  0.91(0.07)  0.79(0.48)  1.05(0.16)  0.97(0.10)  1.00(0.29)  0.99(0.16)  0.98(0.12)  
RCD  0.83(0.09)  0.88(0.05)  0.89(0.04)  0.60(0.15)  0.78(0.08)  0.83(0.05)  0.78(0.10)  0.83(0.05)  0.84(0.04)  
RPI  0.88(0.09)  0.90(0.06)  0.91(0.05)  0.77(0.15)  0.92(0.06)  0.92(0.05)  0.78(0.09)  0.78(0.05)  0.78(0.04)  
0.49(0.11)  0.52(0.05)  0.52(0.04)  0.33(0.19)  0.43(0.09)  0.48(0.05)  0.52(0.12)  0.55(0.07)  0.55(0.05)  
0.93(0.19)  0.95(0.11)  0.96(0.07)  0.83(0.51)  1.10(0.15)  1.03(0.10)  1.12(0.30)  1.11(0.16)  1.11(0.12)  
RCD  0.81(0.10)  0.86(0.05)  0.87(0.04)  0.58(0.15)  0.76(0.08)  0.81(0.05)  0.74(0.10)  0.78(0.06)  0.79(0.04)  
RPI  0.91(0.09)  0.92(0.06)  0.94(0.05)  0.78(0.16)  0.94(0.05)  0.94(0.04)  0.81(0.09)  0.82(0.05)  0.83(0.04) 
5.3 Evaluation of Survival Data
The data is generated by a similar model in (10) as:
We set the dimension of covariates as , and define the survival time as . Consider the following scenario:
Scenario 4:
Here, for the random noise component we consider three cases: (i) Case 1 (normal): ; (ii) Case 2 (logistic): ; (iii) Case 3 (extreme): .
The censoring times are generated from a uniform distribution on , where is chosen to yield the desired censoring level 15% and 25%, respectively, each applied for the three choices of noise distributions for a total of 6 settings considered. We illustrate in Figure 6. The clinically meaningful difference in restricted mean survival time is summarized in Table 6. Each setting was selected to yield a selected sample proportion of 50%. We report the empirical results in Table 6 with the second choice of reward in (6), including the selected sample proportion under the estimated SSR, the average treatment effect of the estimated SSR, and the rate of making correct subgroup decisions by the estimated SSR, over 200 replications, using Monte Carlo approximations with standard deviations presented in the parentheses.
Censoring Level  Censoring Level  

Case 1 (normal)  True  1.07  0.86  
0.45(0.17)  0.47(0.12)  0.46(0.16)  0.48(0.11)  
1.07(0.31)  1.11(0.24)  0.87(0.22)  0.87(0.16)  
RCD  0.84(0.11)  0.88(0.07)  0.84(0.09)  0.90(0.06)  
Case 2 (logistic)  True  1.34  0.87  
0.57(0.26)  0.56(0.18)  0.52(0.24)  0.52(0.18)  
0.94(0.49)  1.06(0.36)  0.63(0.31)  0.75(0.24)  
RCD  0.72(0.13)  0.80(0.10)  0.74(0.13)  0.82(0.09)  
Case 3 (extreme)  True  0.73  0.54  
0.44(0.18)  0.46(0.12)  0.41(0.18)  0.44(0.12)  
0.76(0.21)  0.78(0.15)  0.57(0.15)  0.58(0.11)  
RCD  0.84(0.11)  0.89(0.08)  0.83(0.12)  0.88(0.08) 
Table 6 shows that the proposed method performs reasonably well under all three considered noise distributions. Both the selected sample proportion and average treatment effect under the estimated SSR get closer to the truth, and the rate of making correct subgroup decisions increases as the sample size increases. The selected sample proportion is slightly underestimated for Cases 1 and 3 where has a more concentrated density function, while marginally overestimated for Case 2 where has a more spread density function. All these findings are in accordance with our conclusions in Section 5.1.
6 Real Data Analysis
In this section, we illustrate our proposed method by application to the AIDS Clinical Trials Group Protocol 175 (ACTG 175) data as described in Hammer et al. (1996) (Hammer et al., 1996) and a Phase III clinical trial in patients with hematological malignancies from Lipkovich et al. (2017) (Lipkovich et al., 2017).
6.1 Case 1: ACTG 175 data
There were 1046 HIVinfected subjects enrolled in ACTG 175, randomized to two competing antiretroviral regimens (Hammer et al., 1996): zidovudine (ZDV) + zalcitabine (zal) (denoted as treatment 0), and ZDV+didanosine (ddI) (denoted as treatment 1). Patients were randomized in equal proportions, with 524 patients randomized to treatment 0 and 522 patients to treatment 1, with constant propensity score . We consider
baseline covariates: 1) four continuous variables: age (years), weight (kg), CD4 count (cells/mm3) at baseline, and CD8 count (cells/mm3) at baseline; and 2) eight categorical variables: hemophilia (0=no, 1=yes), homosexual activity (0=no, 1=yes), history of intravenous drug use (0=no, 1=yes), Karnofsky score (4 levels on the scale of 0100, as 70, 80, 90, and 100), race (0=white, 1=nonwhite), gender (0=female), antiretroviral history (0=naive, 1=experienced), and symptomatic status (0=asymptomatic). The outcome of interest (
) is the CD4 count (cells/mm3) at 20 5 weeks. A higher CD4 count usually indicates a stronger immune system. We normalize by its mean and standard deviation. Our goal is to find the optimal subgroup selection rule that optimizes the size of the selected subgroup and achieves the desired average treatment effect.The density of the estimated contrast function for the ACTG 175 data is illustrated in Figure 7. The mean contrast difference is 0.228. Based on Figure 7, we consider the clinically meaningful average treatment effects of and , respectively. We apply the proposed CAPITAL method in comparison to the virtual twin method (VTC) (Foster et al., 2011) (because the VTA and the VTC methods have nearly identical performances, as shown in the simulation studies), using the same procedure as described in Section 5.1. The estimated SSRs under the proposed method are shown in Figure 8. To evaluate the proposed method and VTC method in the ACTG 175 data, we randomly split the whole data, with 70% of the data as a training sample to find the SSR and 30% as a testing sample to evaluate its performance. Here, we consider CAPITAL without penalty, with small penalty, and with large penalty on negativity of average treatment effect, respectively. The penalty term is chosen from , where encourages a positive average treatment effect in the selected group. In Table 7, we summarize the selected sample proportion , the average treatment effect under the estimated SSR , the average treatment effect outside the subgroup , the difference of the average treatment effect within the subgroup and outside the subgroup , and the rate of positive individual treatment effect within the selected subgroup (RPI), aggregated over 200 replications with standard deviations presented in the parentheses, under different for two methods.
Threshold  

CAPITAL  92.8% (0.023)  82.8% (0.029)  
without penalty  0.250 (0.015)  0.270 (0.016)  
0.107 (0.069)  0.004 (0.038)  
0.357 (0.068)  0.266 (0.038)  
RPI  83.0% (0.021)  85.1% (0.022)  
CAPITAL  Penalty  4  20 
with small penalty  52.7% (0.052)  34.2% (0.034)  
0.327 (0.022)  0.385 (0.021)  
0.113 (0.021)  0.142 (0.017)  
0.214 (0.027)  0.243 (0.026)  
RPI  91.5% (0.029)  96.2% (0.017)  
CAPITAL  Penalty  20  100 
with large penalty  35.6% (0.035)  19.5% (0.051)  
0.381 (0.021)  0.414 (0.032)  
0.139 (0.017)  0.180 (0.017)  
0.242 (0.025)  0.234 (0.033)  
RPI  95.9% (0.017)  96.9% (0.025)  
Virtual Twins  22.1% (0.063)  10.5% (0.029)  
0.462 (0.043)  0.556 (0.050)  
0.159 (0.021)  0.187 (0.014)  
0.302 (0.037)  0.368 (0.047)  
RPI  97.8% (0.019)  99.6% (0.010) 
As illustrated in Figure 8, the estimated SSRs based on the proposed method under both and rely on the weight and age of patients. For instance, for a desired average treatment effect of 0.35, younger patients ( years old) who weigh less than 91.2 kg or those years old weighting 91.2 kg may not benefit from treatment 1 (ZDV+ddI) and thus are not selected in the subgroup, while those older should be included into the subgroup who will have enhanced effects from treating with ZDV+ddI. From Table 7, it is clear that the selected sample proportion under our method is much larger than that under the TV method in all cases. Specifically, our method yields a selected sample proportion at for , and at for , without penalty. Under a penalty on negativity of average treatment effect, the size of the identified subgroup is reduced to with small penalty and to with large penalty under , and yields at with small penalty and at with large penalty under , by the proposed method. With a large penalty, our proposed method can achieve the desired average treatment effect at 0.381 (versus ) and at 0.414 (versus ). In contrast, the VT method identifies less than a quarter of the patients (22.1%) in the case of , and nearly a tenth of patients for , with overestimated average treatment effects of 0.462 and 0.556, respectively. These imply that the proposed method could largely increase the number of benefitting patients to be selected in the subgroup while also maintaining the desired clinically meaningful threshold.
6.2 Case 2: Phase III Trial for Hematological Malignancies
Next, we consider a Phase III randomized clinical trial in 599 patients with hematological malignancies (Lipkovich et al., 2017). We exclude 7 subjects with missing records and use the remaining 592 complete records consisting of 301 patients receiving the experimental therapy plus best supporting care (as treatment 1) and 291 patients only receiving the best supporting care (as treatment 0). We use the same baseline covariates selected by Lipkovich et al. (2017) (Lipkovich et al., 2017)
: 1) twelve categorical variables: gender (1=Male, 2=Female), race (1= Asian, 2=Black, 3=White), Cytogenetic markers 1 through 9 (0=Absent, 1=Present), and outcome for patient’s prior therapy (1=Failure, 2=Progression, 3=Relapse); and 2) two ordinal variables: Cytogenetic category (1=Very good, 2=Good, 3 =Intermediate, 4=Poor, 5=Very poor), and prognostic score for myelodysplastic syndromes risk assessment (IPSS) (1=Low, 2=Intermediate, 3=High, 4=Very high). These baseline covariates contain demographic and clinical information that is related to baseline disease severity and cytogenetic markers. The primary endpoint in the trial was overall survival time. Our goal is to find the optimal subgroup selection rule that maximizes the size of the selected group while achieving the desired clinically meaningful difference in restricted mean survival time in the survival data.
The density of the estimated contrast function for the hematological malignancies data is provided in Figure 9, with a mean treatment difference of 44.1 days. Based on Figure 9, we consider the clinically meaningful average treatment effects to be and days, respectively. We apply the proposed method and the virtual twin method (Foster et al., 2011) using the procedure described in Sections 5.3 and 6.1. The estimated SSRs under the proposed method are shown in Figure 10. The evaluation results for the hematological malignancies data are summarized in Table 8 for varying under the proposed method with and the virtual twin method. Our estimated SSRs shown in Figure 10, both using the IPSS score and the outcome for the patient’s prior therapy as the splitting features in the decision tree. With a desired average treatment effect of , patients who had a relapse during prior therapy and IPSS larger than 3 or had no relapse with IPSS larger than 2 are selected into the subgroup with an enhanced treatment effect of the experimental therapy plus best supporting care. In addition, from Table 8, we can also observe that our proposed method has a much better performance compared to the virtual twins method. To be specific, the selected sample proportion under the proposed method is much larger than that under the virtual twins method for all cases, with estimated treatment effect sizes closer to and over the desired clinically meaningful difference in restricted mean survival time as the penalty term increases. All these findings conform with the results in Section 6.1.
CAPITAL  79.3% (0.031)  43.2% (0.057)  

without penalty  69.5 (5.0)  101.2 (9.8)  
53.7 (17.7)  1.2 (7.2)  
123.2 (16.9)  100.0 (9.3)  
RPI  87.0% (0.028)  94.5% (0.034)  
CAPITAL  Penalty  2  2 
with small penalty  71.7% (0.061)  33.9% (0.060)  
74.6 (6.5)  108.4 (9.9)  
34.1 (15.8)  11.5 (8.7)  
108.7 (13.2)  96.8 (9.0)  
RPI  89.2% (0.027)  97.2% (0.034)  
CAPITAL  Penalty  4  4 
with large penalty  51.9% (0.119)  30.8% (0.032)  
87.2 (13.2)  112.6 (7.0)  
2.6 (15.9)  13.9 (6.5)  
89.9 (10.9)  98.7 (8.9)  
RPI  92.2% (0.039)  99.1% (0.015)  
Virtual Twins  38.1% (0.043)  12.9% (0.117)  
113.8 (6.2)  151.4 (29.2)  
1.4 (7.2)  29.7 (13.9)  
112.4 (7.9)  121.7 (21.4)  
RPI  99.5% (0.010)  99.9% (0.003) 
7 Conclusion
In this paper we proposed a constrained policy tree search method, i.e., CAPITAL, to address the subgroup optimization problem. This approach identifies the theoretically optimal subgroup selection rule that maximizes the number of selected patients under the constraint of a prespecified clinically desired effect. Our proposed method is flexible and easy to implement in practice and has good interpretability. Extensive simulation studies show the improved performance of our proposed method over the popular virtual twins subgroup identification method, with larger selected benefitting subgroup sizes and estimated treatment effect sizes closer to the truth, and the broad usage of our methods in multiple use cases, for different trait types and varying constraint conditions.
There are several possible extensions we may consider in future work. First, we only consider two treatment options in this paper, while in clinical trials it is not uncommon to have more than two treatments available for patients. Thus, a more general method applicable to multiple treatments or even continuous treatment domains is desirable. Second, we only provide the theoretical form of the optimal SSR. It may be of interest to develop the asymptotic performance of the estimated SSR such as the convergence rate.
References
 Policy learning with observational data. Econometrica 89 (1), pp. 133–161. Cited by: §3.2, §3.2, §5.1.
 Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics 12 (2), pp. 270–282. Cited by: §1.1, §1.
 Subgroup identification from randomized clinical trial data. Statistics in medicine 30 (24), pp. 2867–2880. Cited by: §1.1, Table 1, §1, §5.1, §6.1, §6.2, Table 7, Table 8.
 Estimating optimal treatment regimes via subgroup identification in randomized control trials and observational studies. Statistics in medicine 35 (19), pp. 3285–3302. Cited by: §1.
 Bayesian nonparametric policy search with application to periodontal recall intervals. Journal of the American Statistical Association 115 (531), pp. 1066–1078. Cited by: §1.1.
 A trial comparing nucleoside monotherapy with combination therapy in hivinfected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine 335 (15), pp. 1081–1090. Cited by: §6.1, §6.
 Estimating treatment effect heterogeneity in randomized program evaluation. The Annals of Applied Statistics 7 (1), pp. 443–470. Cited by: §1.
 Precision medicine. Annual review of statistics and its application 6, pp. 263–286. Cited by: §1.
 Tutorial in biostatistics: datadriven subgroup identification and analysis in clinical trials. Statistics in medicine 36 (1), pp. 136–196. Cited by: §1.1, §1, §6.2, §6.
 Subgroup identification for precision medicine: a comparative review of 13 methods. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9 (5), pp. e1326. Cited by: §1.
 A regression tree approach to identifying subgroups with differential treatment effects. Statistics in medicine 34 (11), pp. 1818–1833. Cited by: §1.
 Estimating individual treatment effect in observational data using random forest methods. Journal of Computational and Graphical Statistics 27 (1), pp. 209–219. Cited by: §3.2.
 Bayesian inference for causal effects: the role of randomization. The Annals of statistics 6, pp. 34–58. External Links: ISBN 00905364 Cited by: §2.
 A bayesian subgroup analysis with a zeroenriched polya urn scheme. Statistics in medicine 30 (4), pp. 312–323. Cited by: §1.
 Evaluating markers for selecting a patient’s treatment. Biometrics 60 (4), pp. 874–883. Cited by: §1.1, §1.

Subgroup analysis via recursive partitioning..
Journal of Machine Learning Research
10 (2). Cited by: §1.  Learning optimal personalized treatment rules in consideration of benefit and risk: with an application to treating type 2 diabetes patients with insulin therapies. Journal of the American Statistical Association 113 (521), pp. 1–13. Cited by: §1.1, §1.
 On restricted optimal treatment regime estimation for competing risks data. Biostatistics 22 (2), pp. 217–232. Cited by: §1.1, §1.
 Offline multiaction policy learning: generalization and optimization. arXiv preprint arXiv:1810.04778. Cited by: §5.1.
Appendix A Proof of Theorem 1
The proof of Theorem 3.1 consists of two parts. First, we show the optimal subgroup selection rule is where satisfies (2). Second, we derive the equivalence between (3) and (4). Without loss of generality, we focus on the class of SSRs as
Part One: To show is the optimal SSR that solves (1), it is equivalent to show the SSR satisfies the constraint in (1) and maximizes the size of subgroup.
First, based on assumptions (A1) and (A2), the average treatment effect under a SSR for a parameter can be represented by
which is a nondecreasing function of the cut point . Given the definition in (2) that , we have to satisfies the constraint in (1).
Second, the probability of falling into subgroup under the SSR as
is a nonincreasing function of the cut point .
To maximize the size of subgroup, we need to select the smallest cut point from its constraint range . Thus, the optimal cut point is , which gives the optimal SSR as as the solution of (1). This completes the proof of (3).
Comments
There are no comments yet.