1 Introduction
With an increased emphasis on personalized medicine [1] and dynamic treatment regimes [2], there is need for statistical methods to evaluate potential effect modifiers. Understanding effect modifiers for a treatment or exposure helps medical personnel, scientists, and politicians better understand mechanisms which can lead to more effective treatment strategies or improved public policies [3]. In this paper, we introduce a flexible methodology to provide lowdimensional summaries of effect modification when there are a large number of confounders to be controlled for. We implement this through a semiparametric version of Bayesian Additive Regression Trees (BART) [4] within a generalized linear model (GLM) framework [5]
. We model the confounders (the nuisance component) using BART but allow select covariates (treatment, effect modifiers, etc.) to have parametric form. This parametric part can include interactions and typically represents the parameters that are of scientific interest. In practice, these parameters are often estimated in a GLM, but our formulation permits the nuisance component to be nonparametric, increasing efficiency when the outcome process is complex.
Summarizing effect modification can be a difficult task. An introduction to statistical methods for effect modification and interactions is found in VanderWeele [6]. In this paper, we focus on relatively simple summaries of modifiers such as an interaction between the exposure and a covariate. While there are situations where a one dimensional parameter will not adequately capture a complex association, many applications assess effect modification through simple interaction terms [7, 8]. Our goal is simply to provide a more flexible framework in which to estimate these parameters.
In addition, we show that under some assumptions including no unmeasured confounding, our model serves as a Bayesian implementation of a structural mean model. This fits neatly within the theory laid out in Vansteelandt and Goetghebeur [9]
which details estimation of SMMs when gestimation is not possible, as is the case with a logit or probit link function. We provide an R package
semibart to implement our method, available on GitHub (https://www.github.com/zeldow/semibart). When there is no effect modification or the effect modification is between binary covariates, the framework developed by Hill [10] can be used. Our model differs from Hill’s in that continuous treatments and effect modifiers are more easily interpreted.We demonstrate our methodology on data from a cohort of HIV/Hepatitis C coinfected individuals who initiate antiretroviral therapy. Certain antiretrovirals are mitochondrial toxic and known to cause liver toxicity, particularly concerning for individuals with Hepatitis C infection. In this paper, we examine twoyear death rates in subjects taking antiretroviral regimens with mitochondrial toxic drugs compared to subjects on regimens without those drugs. This analysis is based off a previous analysis that looked at rates of liver decompensation by type of antiretroviral regimen [11].
2 Background
We review some semi and nonparametric methods for predicting an output given inputs. Let be an outcome and let be predictors of . Consider models of the form , where is a known link function and is a function. A special case of this model is parametric linear regression which posits that the outcome has mean . For the remainder of this paper, we relax the assumption that , which can yield biased or inefficient estimates if this assumption is far from the truth.
In practice, the functional form of is unknown and so it is natural to think of it as a random parameter. We assign a prior distribution within an appropriate function space and estimate it with data. One such prior is the Gaussian process which induces flexibility through its covariance function [12]. Other options for modeling include the use of basis functions like splines or wavelets and placing prior distributions on the coefficients [13, 14]. Splines, in particular, have been used extensively in Bayesian nonparametric and semiparametric regression. For example, Biller (2000) presented a semiparametric GLM where one variable is modeled using splines and the remaining variables were part of a parametric linear model [15]. Holmes and Mallick (2001) developed a flexible Bayesian piecewise regression using linear splines [16]. The approach in Denison et al (1998) involved piecewise polynomials and was able to approximate nonlinearities [17]. Biller and Fahrmeir (2001) introduced a varyingcoefficient model with Bsplines with adaptive knot locations [18].
Alongside these Bayesian methods reside two of the most commonly used procedures to predict an outcome given covariates : generalized additive models (GAM) [19]
and multivariate adaptive regression splines (MARS)
[20]. GAM allows each predictor to have its own functional form using splines. However, any interactions between covariates must be specified by the analyst, which can pose difficulties in highdimensional problems with multiway interactions. Bayesian versions of GAM based on Psplines exist [21] but do not have the widespread availability in statistical software that the frequentist version does. MARS is a fully nonparametric procedure which can automatically detect nonlinearities and interactions through basis functions also based on splines. A Bayesian MARS algorithm has also been developed [22] but also lacks offtheshelf software. A third option for nonparametric estimation of given is Bayesian additive regression trees (BART), which like MARS, allows for nonlinear relationships between an outcome and covariates and interactions between covariates, while taking a Bayesian approach to estimation [4].In this paper we will extend BART to a semiparametric setting in order to meet our goal of having a parametric form for a subset of variables of interest and a flexible model for the nuisance variables. Before introducing our method, we provide a brief review of BART.
2.1 Bayesian Additive Regression Trees
Bayesian additive regression trees (BART) is an algorithm that uses sumoftrees to predict a binary or continuous outcome given predictors. For continuous outcomes, let where , and is the unknown functional relating the predictors to the outcome . For binary we use a probit link function so that , where
is the distribution function of a standard normal random variable. We write the BART sumoftrees model as
, where each is a single tree and and are the parameters that represent the tree structure and end node parameters, respectively. Each individual tree is a sequence of binary decisions based on predictors which yield predictions of within clusters of observations with similar covariate patterns. Typically, the number of trees is chosen to be large and each tree is restricted to be small through regularization priors, which restricts the influence of any single tree and allows for nonlinearities and interactions that would be not possible with any one tree. We provide an example of a BART fit to a nonlinear mean function in Figure 1 over a univariate predictor space restricted to , along with comparision to the fit of a single regression tree and linear regression.The Markov Chain Monte Carlo (MCMC) algorithm for BART incorporates Bayesian backfitting
[23], which we summarize below. Recall that whereis assumed zeromean normal with unknown variance
. The algorithm alternates between updates to the error variance and updates to the trees . To update , we find the residuals from the current fit and draw a new value for . In Chipman et al. [4] and this paper, we use a conjugate inverse  distribution for the prior of , so drawing a new value is also a draw from an inverse  distribution. Second, the trees are updated one at a time. Starting with , we compute the residuals of the outcome by subtracting off the fit of the other trees, . We then propose a modification for the tree , which is either accepted or rejected by a MetropolisHastings step. We update the trees in the same fashion. More details are available in the original BART paper [4]. In the next section, we propose a semiparametric extension of BART, called semiBART, where a small subset of covariates have linear functional form and the rest are modeled with BART’s sumoftrees.3 SemiBART Model
3.1 Notation
Suppose we have independent observations. Let denote the outcome, which can be binary or continuous. Let denote treatment, which can be binary or continuous. The remaining covariates we call . Let .
3.2 Semiparametric Generalized Linear Model
In applied research, it is common for the effects of only a few covariates to be of scientific interest, while a larger number of covariates are needed to address confounding. Our model imposes linearity on just this small subset of covariates, while retaining flexibility in modeling the rest of the covariates whose exact functional form in relation to the outcome may be considered a nuisance. We partition the predictors into two distinct subsets so that and . Here, represents nuisance covariates that we must control for but is not of primary interest and represents covariates that do have scientific interest, such as treatment and its effect modifiers. For continuous , we write , where is a linear function of its covariates in (as in linear regression) but is a function with unspecified form. The errors
are iid mean zero and normally distributed with unknown variance
. More generally, we write for a known link function . We call this the semiBART model since we estimate using BART. Note that this implies that if and , we have a nonparametric BART model. On the other hand if and we have a fully parametric regression model. While there is no restriction on the dimensionality of and , we assume that that is large enough such that BART is a reasonable choice of an algorithm and that contains only a few covariates that are of particular interest.3.3 Special Case: Structural Mean Models
We now consider a special case of our semiparametric GLM from an observational study with no unmeasured confounders, introducing additional notation specific to this section. As before, the exposure of interest is denoted and can be either binary or continuous. The counterfactual denotes the outcome that would have been observed under exposure . For the special case of binary , each individual has two counterfactual outcomes – and – but we observe at most one of the two, corresponding to the actual level of exposure received. That is,
Robins developed structural nested mean models to adjust for timevarying confounding with a longitudinal exposure [24, 25]. In the case of a point treatment, structural nested mean models are no longer nested and called structural mean models (SMMs). While timevarying confounding with point treatments is not a concern, SMMs still parameterize a useful causal contrast – the mean effect of treatment among the treated given the covariates [9, 26]. Write this as:
(1) 
where is a known link function. In this paper, we provide a Bayesian solution to (1). To do so, we impose some restrictions on , requiring that under no treatment or when there is no treatment effect the function must equal 0. That is, satisfies . Some examples of are or , when some covariate modifies to the effect of on .
While expression (1) cannot be evaluated directly due to the unobserved counterfactuals, two assumptions are needed to identify it with observed data [26].

Consistency: If , then ;

Ignorability: .
The consistency assumption asserts that we actually get to see an individual’s counterfactual corresponding to the exposure received. Ignorability ensures the exposure and the counterfactual under no treatment are independent given . Under these two assumptions together with the parametric assumption of , the contrast on the left hand side of (1) is identified, and the SMM from (1) can be rewritten using observed variables as
(2) 
where is unspecified and is a linear function of [26]. Note that the left hand side of (1) is nonparametrically identified with a third assumption, dropping the parametric assumption of . That is,

Positivity: .
The positivity assumption states that whenever
has a positive probability of occurring, there is positive probability that an individual is treated. This assumption is violated in situations where treatment is deterministic at certain levels of
.Let us return briefly to our parametric choice of , specifically let . Note that the main effect of is not present, guaranteeing that the restriction holds. Through this specification, the function from equation 2 absorbs the main effect of and any interactions with nontreatment covariates. Practically speaking, we found that modeling a covariate in both and sometimes led to bias and undercoverage of in complicated settings. Due to this, we allow the main effect of any effect modifiers to be contained in , a choice which comes with a few consequences. First, it imposes linearity on an additional covariate. Second, the restriction no longer holds necessarily. We handle this issue by considering and the term jointly and allowing to be treated separately. Third, if researchers are interested in quantifying effect modification by , they might also be interested in interpreting the main effect and can do so with semiBART.
Hill (2012) has previously estimated causal effects on the treated using BART [10]. The methods in that paper correspond to our setting in equation (2) where is the identity link function and is a scalar describing only an effect of treatment with no effect modification. We extend this setup to binary outcomes, continuousvalued treatment, or where lowdimensional summaries of effect modification are of interest, particularly with continuous effect modifiers. In settings with continuous outcomes, binary treatment, and no effect modification (or with a binary effect modifier), the methods presented in Hill (2012) are preferred, which we later explore in simulations.
3.4 Computations
The algorithm for semiBART follows the BART algorithm – briefly reviewed in Section 2.1 – with an additional step. We solve equation (2), where can be written as the sumoftrees . The parameter contains the structure of the tree; for instance, the covariates and rules on which the tree splits. The parameters contain the parameters governing the endnodes of the tree. For example, the mean of the endnode of the tree is assumed to be normally distributed with mean and variance .
For continuous outcomes, we assume independent errors distributed with unknown and proceed as follows. Initialize all values including the error variance , the parameters , and the tree structures and for and iterate through the following steps. First, update the trees one at a time. Starting with the first tree , calculate the residuals by subtracting the fit of the remaining trees at their current parameter values as well as the fit of the linear part . That is for the individual, we calculate , where indicates the fit of the trees except the first tree. As in Chipman et al. [4], a modification of the tree is now proposed. We can grow the tree (breaking one endnode into two endnodes), prune the tree (collapse two endnodes into one), change a splitting rule (for nonterminal nodes), or swap the rules between two nodes. We accept or reject this modification with a MetropolisHastings step given the residuals [27]. Once we have updated , we update in the same fashion and continue until all trees are completed.
Next we update , given a multivariate normal prior for so that , where
is the identity matrix of appropriate dimension and
is large enough so that the prior is diffuse. We calculate the residuals after subtracting off the fit of all trees so that . The posterior for is multivariate normal with covariance and mean , where is thevector of residuals
[28].Lastly, we update the error variance . We calculate the residuals given the trees and so that . With a conjugate scaled inverse distribution for (parameters and ), the posterior is an updated scaled inverse distribution with parameters and where indicates the dot product. These three steps are repeated until the posterior distributions are well approximated.
Our algorithm for binary outcomes with a probit link uses the underlying latent continuous variable formulation of Albert and Chib [29], replacing the step in the algorithm that updates the error variance . Full details of the BART portion of the algorithm are available in Chipman et al. [4], whereas our modified code for semiBART is available at https://www.github.com/zeldow/semibart.
4 Simulation
We use simulation to compare the performance of semiBART to competitor models when estimating the regression coefficient for simulated treatment along with the coefficients for its effect modifiers (main effects and interaction terms with treatment). Our competitors were BART (taken from Hill [10]), GAM, and linear regression for continuous outcomes and probit regression for binary outcomes. For all simulations, we generated 500 datasets at sample sizes of and
, and we estimated mean bias, 95% credible/confidence interval coverage, and empirical standard deviation (ESD). For GAM we used the mcgv package in R along with splines (using the s function [the function used to define smooth terms within GAM formulae] with default settings) for continuous covariates
[30]. For BART, we used the bart() function from the BayesTree package in R with default settings[31]. The linear regression/probit regression models were fit with the lm and glm functions in R. For semiBART we used 10,000 MCMC iterations including 2,500 burnin iterations and trees.4.1 Scenario 1: Continuous outcome with binary treatment and no effect modification
In the first scenario, we generated data with a continuous outcome, binary treatment, twenty continuous covariates with a block diagonal covariance structure, and four independent binary covariates. The data generating code is available in the appendix. The outcome was generated as independent normal variables with variance one and mean where and . The parameter , which encodes the treatment effect, was set to 2.
Method  Parameter  Bias  Cov.  ESD 

SemiBART  0.02  0.96  0.153  
GAM  0.02  0.94  0.371  
BART  0.02  0.94  0.153  
Regression  0.02  0.95  0.390  
SemiBART  0.00  0.95  0.036  
GAM  0.00  0.94  0.111  
BART  0.00  0.92  0.037  
Regression  0.01  0.94  0.119 
The results in Table 1 show that, at the smaller sample size of , all point estimates are slightly biased in the same direction, and the 95% coverage probabilities hovered around . Notably, the ESD was over half as small for BARTbased methods than for GAM or regression. At the bias disappeared for all methods, and the discrepancy in ESD between the BARTbased methods and nonBART methods remained.
4.2 Scenario 2: Continuous outcome with binary treatment and continuous effect modifier
We randomly generated 30 continuous covariates with mean zero from a multivariate normal distribution with an autoregressive(1) covariance with with the diagonal containing ones. That is,
4.2.1 Part a: simple treatment mechanism and nonlinear mean function
Given the covariates , the treatment was generated as a Bernoulli random variable with probability . The outcome was generated as independent random normal variables with variance one and mean where and . The true values for the parameters are , , and .
Results for these simulations are shown in Table 2. The estimated parameters are unbiased and have coverage near 95% for both sample sizes and all estimators. The ESD for all parameters is smaller with semiBART than it is with GAM or linear regression. This improvement of semiBART over GAM comes from the fact that covariate interactions are detected in the semiBART procedure, whereas they must be prespecified in this implementation of GAM.
Method  Parameter  Bias  Cov.  ESD 

SemiBART  0.01  0.94  0.123  
0.01  0.94  0.121  
0.00  0.96  0.095  
GAM  0.01  0.93  0.135  
0.01  0.94  0.127  
0.00  0.93  0.102  
Regression  0.01  0.94  0.166  
0.01  0.94  0.167  
0.00  0.94  0.127  
SemiBART  0.00  0.95  0.034  
0.00  0.94  0.033  
0.00  0.96  0.023  
GAM  0.00  0.94  0.038  
0.00  0.94  0.039  
0.00  0.96  0.031  
Regression  0.00  0.95  0.049  
0.00  0.95  0.049  
0.00  0.95  0.038 
4.2.2 Part b: complex treatment mechanism and complex mean function
We also performed these simulations with different treatment and outcome data generating functions. Here, given the covariates , the treatment was generated as a Bernoulli random variable with probability . The outcome was generated as independent random normal variables with variance one and mean where and . The true values for the parameters are , , and .
Results for these simulations are shown in Table 3. At , semiBART yielded biased estimates (average of ) for , the main effect of the effect modifier. On the other hand, GAM and linear regression were unbiased for but had varying degrees of bias for the treatment effect of and , respectively. SemiBART had slight undercoverage for all parameters – 90% to 92%. At , semiBART was unbiased for and , and the bias of attenuated ( down to ). For GAM and linear regression, the bias of persisted. Coverage rates were all around 95% save for using semiBART, which was at 89%. The ESD was notably smaller for semiBART than the competitors.
Method  Parameter  Bias  Cov.  ESD 

SemiBART  0.00  0.92  0.460  
0.01  0.92  0.459  
0.12  0.90  0.361  
GAM  0.05  0.95  0.654  
0.02  0.92  0.672  
0.00  0.93  0.520  
Regression  0.36  0.92  0.731  
0.03  0.95  0.741  
0.01  0.95  0.582  
SemiBART  0.00  0.96  0.081  
0.00  0.93  0.090  
0.03  0.89  0.071  
GAM  0.07  0.94  0.213  
0.00  0.95  0.200  
0.00  0.97  0.151  
Regression  0.39  0.60  0.230  
0.01  0.95  0.220  
0.01  0.96  0.166 
4.3 Scenario 3: Binary outcome with binary treatment and continuous effect modifier
As in scenario 2, we randomly generated 30 continuous covariates with mean zero from a multivariate normal distribution with an autoregressive(1) covariance structure with with the diagonal containing ones. The treatment was generated as a Bernoulli random variable with probability . The outcome was generated as random Bernoulli variable with probability with and . The true values for the parameters of interest are , , and .
The results for these simulations are shown in Table 4. For semiBART, there is some bias on at , but this vanishes at . Overall, bias is small and coverage good for both probit regression and semiBART. Using probit regression is slightly more efficient than semiBART at (based on ESD) but these differences mostly disappear at .
Method  Parameter  Bias  Cov.  ESD 

SemiBART  0.03  0.92  0.144  
0.00  0.94  0.140  
0.00  0.93  0.106  
Regression  0.01  0.93  0.131  
0.01  0.94  0.127  
0.01  0.94  0.101  
SemiBART  0.00  0.94  0.039  
0.00  0.95  0.039  
0.00  0.94  0.029  
Regression  0.03  0.84  0.038  
0.01  0.93  0.036  
0.01  0.93  0.029 
4.4 Scenario 4: Continuous outcome with misspecified mean function
As before, we randomly generated 30 continuous covariates with mean zero from a multivariate normal distribution with an autoregressive(1) covariance structure. The treatment was generated as a Bernoulli random variable with probability . The outcome was generated as independent random normal variables with variance one and mean with and . However, we posited the relationship . Since the effect of is actually contained in , this is a misspecified model. The true value of was 2.
The results for these simulations are shown in Table 5. All methods have no bias and good coverage for . There is a slight improvement in terms of ESD for semiBART compared to its competitors.
Method  Parameter  Bias  Cov.  ESD 

SemiBART  0.01  0.95  0.143  
GAM  0.00  0.92  0.153  
Regression  0.01  0.96  0.177  
SemiBART  0.00  0.92  0.041  
GAM  0.00  0.93  0.048  
Regression  0.00  0.95  0.060 
5 Data Application
To illustrate our method we analyzed data from the Veterans Aging Cohort Study (VACS) in the years 2002 to 2009, which is a cohort of patients being treated at Veterans Affairs facilities in the United States. Our study sample consisted of patients with HIV/Hepatitis C coinfection who were newly initiating antiretrovirals (including at least one nucleoside reverse transcriptase inhibitor [NRTI]) and had at least six months of observations recorded in VACS prior to initiation. Certain NRTIs are known to cause mitochrondial toxicity. These mitochrondial toxic NRTIs (mtNRTIs) include didanosine, stavudine, zidovudine, and zalcitabine [32]. While these drugs are no longer part of first line HIV treatment regimens, they are still used in resourcelimited settings or in salvage regimens [33].
Exposure to mtNRTIs may increase the risk of hepatic injury which in turn may increase the risk of hepatic decompensation and death [34]. The goal of this analysis was to determine if initiating an antiretroviral regimen containing a mtNRTI increased the risk of death versus antiretroviral containing a NRTI that is not a mtNRTI. VACS data contains a number of variables confounding the relationship between mtNRTI use and death including subject demographics, year of antiretroviral initiation, HIV characteristics such as CD4 count and HIV viral load, concomitant medications, and laboratory measures relating to liver function.
One of the covariates included in our analysis is Fibrosis4 (FIB4), an index that measures hepatitic fibrosis with higher values indicating larger injury. Specifically FIB4 (no units) indicates advanced hepatic fibrosis. FIB4 is calculated as [35]:
Here, AST stands for aspartate aminotransferase and ALT for alanine aminotransferase. There is some concern in that mtNRTI use in subjects with high FIB4 will result in higher risk of liver decompensation and death than in subjects who have lower FIB4. Thus, we consider FIB4 as a possible effect modifier of the effect of mtNRTIs on death.
The outcome is a binary indicator of death within a twoyear period after the subject initiated antiretroviral therapy. We considered only baseline values for this analysis. There were some missing values among the predictors that were handled through a single imputation. A previous analysis of this data used multiple imputation to handle missing covariates but found that results were very similar across imputations
[11]. All continuous covariates were centered at interpretable values. For example, age was centered around 50 years and year of study entry was centered at 2005.In the first analysis we sought to determine the effect of mtNRTI use on death without considering effect modification, and to this extent we fit a Bayesian SMM with a probit link. The estimand can be written as
(3) 
where is the indicator of death, represents whether mtNRTIs were part of the antiretroviral regimen at baseline ( if mtNRTI were included in the regimen), and all other covariates, including FIB4. In the second and third analysis, we considered FIB4 to be an effect modifier, once as a continuous covariate and once as a binary indicator which equaled 1 whenever FIB4 . This estimand can be written as
(4) 
where corresponds to the appropriate FIB4 variable.
The analysis was conducted using trees with 20,000 total iterations (5,000 burnin). The prior distribution on the parameters were independent . In the first analysis the mean estimate of the posterior distribution for
was 0.15 (95% credible interval [CI]: 0.02, 0.33). Notably the interval includes 0, but the direction of the point estimate indicates that subjects initiating antiretroviral therapy with an mtNRTI had greater risk of death within 2 years than subjects initiating therapy without an mtNRTI. We can interpret this coefficient in terms of
and from equation (3). Figure 2a shows the value of as a function of for . As an example, suppose the unknowable quantity . This means that subjects treated with a mtNRTI () with covariates would have had a probability of death of within 2 years had they been untreated (). However, given we see that if then , an increase of 4%. One can examine the change in probability for other base probabilities by examining the graph in Figure 2a.We conducted a second analysis with FIB4 as a continuous effect modifier (centered around 3.25) with the same settings as the previous one. This analysis corresponds to the contrast from equation (4). Here, the estimate for the main effect of mtNRTI was (0.00, 0.36) and the interaction between mtNRTI use and FIB4 was (0.02, 0.12). The results can be viewed in Figure 2b. Again, for illustration, consider the special case where . When FIB4 is 3.25, then . However, when FIB4 is 5.25, .
Finally we did a third analysis with FIB4 as a binary effect modifier ( vs. ). Here we found that (0.12, 0.26) and (0.07, 0.69). These results can be viewed in Figure 2c. Here, we see that if , then for subjects with FIB4 and for subjects with FIB4 .
6 Discussion
We presented a new Bayesian semiparametric model, which can be implemented with an R package semibart that is available on the author’s GitHub page (https://github.com/zeldow/semibart). Our model allows for flexible estimation of the nuisance component while being parametric for covariates that are of immediate scientific interest, which provides a viable and intuitive alternative to fully parametric regression. Under some causal assumptions, this model can as be interpreted as a SMM, which also provides the first fully Bayesian SMM. This is particularly useful in the case of binary outcomes where gestimation is not possible. Vansteelandt and Goetghebeur [9] provided approaches for estimating SMMs with binary outcomes in frequentist settings; our method is consistent with their suggestions but incorporates the added flexibility of BART.
The primary limitation of our method in the causal setting is that semiBART does not currently accommodate instrumental variables or longitudinal treatment measures, which are frequently used components of structural nested models. In simulations with binary outcomes, we also found little difference in our estimates using semiBART versus probit regression. Although it is reassuring that semiBART works as well as parametric regression, we aim to better understand the reasons why we are seeing equivalent – rather than superior – performance of semiBART (versus probit regression) for binary outcomes. Lastly, we aim to extend semiBART to handle common link functions such as logit and log.
We imagine that our work can benefit a number of different types of researchers. In particular, we hope that semiBART can be a viable alternative to the researcher who uses linear regression as the default statistical method in applications. Second, we also hope to convince researchers who prefer flexible machine learning algorithms, such as BART, but need interpretable coefficients such as a treatment effect and its modifiers that semiBART is a dependable option.
Supplementary Material
Appendix A Simulation code  data generation
a.1 Scenario 1
library(mvnfast) p1 < 20 # no. of continuous covariates p2 < 5 # no. of binary makedata < function(nsim, n, p1, p2, sd) { dat4 < array(0, dim = c(nsim, n, p1 + p2 + 1) ) for (k in 1:nsim) { cov < matrix(0, p1, p1) diag(cov) < rep(1, p1) for(i in 1:5) { for(j in 1:5) { if (i != j) cov[i,j] < cov[j, i] < 0.20 } } for(i in 6:10) { for(j in 6:10) { if (i != j) cov[i,j] < cov[j, i] < 0.15 } } for(i in 11:15) { for(j in 11:15) { if (i != j) cov[i,j] < cov[j, i] < 0.10 } } for(i in 16:20) { for(j in 16:20) { if (i != j) cov[i,j] < cov[j, i] < 0.05 } } mu < c(rep(2.0, 5), rep(1.5, 5), rep(1.0, 5), rep(0.0, 5)) cont.covs < rmvn(n, mu, sigma = cov) bin.covs < cbind(rbinom(n, size = 1, prob = 0.25), rbinom(n, size = 1, prob = 0.5), rbinom(n, size = 1, prob = 0.5), rbinom(n, size = 1, prob = 0.75), rbinom(n, size = 1, prob = 0.75)) x < cbind(bin.covs, cont.covs) ## nonlinear continuous single mu4 < 1 + 2 * x[ , 1] + 2 * x[ , 6] + sin( pi * x[ , 2] * x[ ,7] )  2 * exp( x[ , 3] * x[ , 5] ) + log( abs( cos ( pi / 2 * x[ , 8] ) ) )  1.8 * cos( x[ , 9]) + 3 * x[ , 3] * abs(x[ ,7]) ^ 1.5 y4 < rnorm(n, mu4 , sd = sd) mydat4 < cbind(x, y4) dat4[k, , ] < mydat4 } return(list( dat4 = dat4) ) } n500_sd1 < makedata(500, 500, 20, 5, 1) n5000_sd1 < makedata(500, 5000, 20, 5, 1)
a.2 Scenario 2
a.2.1 Part a
library(mvnfast) expit < function(x) exp(x) / (1 + exp(x)) makedata < function(nsim, n) { dat8 < array(0, dim = c(nsim, n, 30 + 2) ) for (k in 1:nsim) { p < 30 mu < rep(0, p) sig < matrix(0, nrow = p, ncol = p) diag(sig) < 1 rho < 0.5 for(i in 1:(p1)) { for(j in (i+1):p) { kk < j  i sig[i, j] < sig[j, i] < rho^kk } } x < rmvn(n, mu, sigma = sig) prob.a < expit(0.1 + 0.2 * x[,1]  sin(x[,3])/3  0.1 * x[,22]) a < rbinom(n, 1, prob.a) x < cbind(a, x) ## nonlinear continuous multi mu8 < 1 + 2 * x[ , 1]  1 * x[ , 1] * x[ , 2] + 2 * x[ , 2] + sin( pi * x[ , 22] * x[ ,7] )  1 * exp( x[ , 6]/5 * x[ , 5] ) + log( abs( cos ( pi / 2 * x[ , 8] ) ) )  1.8 * cos( x[ , 9]) + 0.2 * x[ , 11] * abs(x[ ,7]) ^ 1.5 y8 < rnorm(n, mu8 , sd = 1) mydat8 < cbind(x, y8) dat8[k, , ] < mydat8 } return(list( dat8 = dat8) ) } n500 < makedata(500, 500) n5000 < makedata(500, 5000)
a.2.2 Part b
library(mvnfast) expit < function(x) exp(x) / (1 + exp(x)) makedata < function(nsim, n) { dat8 < array(0, dim = c(nsim, n, 30 + 2) ) for (k in 1:nsim) { p < 30 mu < rep(0, p) sig < matrix(0, nrow = p, ncol = p) diag(sig) < 1 rho < 0.5 for(i in 1:(p1)) { for(j in (i+1):p) { kk < j  i sig[i, j] < sig[j, i] < rho^kk } } x < rmvn(n, mu, sigma = sig) prob.a < expit(0.1 + 0.2 * x[,1]  0.5*x[,2]0.1*x[,1]*x[,2] + 0.3 *x[,4]+0.1*x[,5]+ 0.7*x[,4]*x[,5]  0.4*x[,11]*x[,22]  + 0.4*x[,10]^2*x[,15]  0.1 * x[,22]) a < rbinom(n, 1, prob.a) x < cbind(a, x) ## nonlinear continuous multi mu8 < 1 + 2 * x[ , 1]  1 * x[ , 1] * x[ , 2] + 2 * x[ , 2] + 2 * sin( pi * x[ , 22] * x[ ,7] )  1 * exp( x[ , 6]/5 * x[ , 5] )  log( abs( cos ( pi / 2 * x[ , 8] ) ) )  1.8 * cos( x[ , 9]) + 1.2 * x[ , 11] * abs(x[ ,7]) ^ 1.5 mu8 < 1 + 2 * x[ , 1]  1 * x[ , 1] * x[ , 2] + 2 * x[ , 2]  1 * x[ , 3] + 2 * x[ , 4]  1.5 * x[ , 5]  0.5 * x[ , 6]  2*x[,7] + x[,4]^2  x[,7]^2 + 2*x[,4]*x[,5]  1 *x[,3]*x[,7] + 0.5 * x[,6]*x[,7]  0.2*x[,3]*x[,4]*x[,5] + x[,7]*x[,9]*x[,10]  x[,8]*x[,22]*x[,25]*x[,26] + x[,11]*x[,14]*x[,15]*x[,27]  x[,25]*x[,26]^2*x[,11] + 3*x[,4]*x[,17]^2  3*x[,5]*x[,18]^2 + x[,4]*x[,5]*x[,10]*x[,15] x[,4]*x[,5]*x[,10]*x[,15]^2+ 1.5*x[,11]*x[,22] y8 < rnorm(n, mu8 , sd = 1) mydat8 < cbind(x, y8) dat8[k, , ] < mydat8 } return(list(dat8 = dat8) ) } n500 < makedata(500, 500) n5000 < makedata(500, 5000)
a.3 Scenario 3
library(mvnfast) expit < function(x) exp(x) / (1 + exp(x)) makedata < function(nsim, n) { dat6 < array(0, dim = c(nsim, n, 30 + 2) ) for (k in 1:nsim) { p < 30 mu < rep(0, p) sig < matrix(0, nrow = p, ncol = p) diag(sig) < 1 rho < 0.5 for(i in 1:(p1)) { for(j in (i+1):p) { kk < j  i sig[i, j] < sig[j, i] < rho^kk } } x < rmvn(n, mu, sigma = sig) prob.a < expit(0.1 + 0.2 * x[,1]  sin(x[,3])/3  0.1 * x[,22]) a < rbinom(n, 1, prob.a) x < cbind(a, x) ## nonlinear binary  multi prob.y6 < pnorm(0.1 + 0.3 * x[ , 1]  0.1 * x[ , 1] * x[ , 2] + 0.1 * x[ , 2]  sin( pi / 4 * x[ , 22] * x[ , 7] ) + exp(x[,7] / 5)*x[,11]/4  0.12 * x[, 22] * x[ , 9] * x[ ,10] + 0.05*x[,8] * x[,10]*x[,11]^2) y6 < rbinom(n, size = 1, prob = prob.y6) mydat6 < cbind(x, y6) dat6[k, , ] < mydat6 } return(list( dat6 = dat6) ) } n500 < makedata(500, 500) n5000 < makedata(500, 5000)
a.4 Scenario 4
library(mvnfast) expit < function(x) exp(x) / (1 + exp(x)) makedata < function(nsim, n) { dat8 < array(0, dim = c(nsim, n, 30 + 2) ) for (k in 1:nsim) { p < 30 mu < rep(0, p) sig < matrix(0, nrow = p, ncol = p) diag(sig) < 1 rho < 0.5 for(i in 1:(p1)) { for(j in (i+1):p) { kk < j  i sig[i, j] < sig[j, i] < rho^kk } } x < rmvn(n, mu, sigma = sig) prob.a < expit(0.1 + 0.2 * x[,1]  sin(x[,3])/3  0.1 * x[,22]) a < rbinom(n, 1, prob.a) x < cbind(a, x) ## nonlinear continuous multi mu8 < 1 + 2 * x[ , 1] + sin( pi * x[ , 22] * x[ ,7] )  1 * exp( x[ , 6]/5 * x[ , 5] ) + log( abs( cos ( pi / 2 * x[ , 8] ) ) )  1.8 * cos( x[ , 9]) + 0.2 * x[ , 11] * abs(x[ ,7]) ^ 1.5 + x[,2]*x[,3]  0.5*x[,2]^2  cos(x[,2]) y8 < rnorm(n, mu8 , sd = 1) mydat8 < cbind(x, y8) dat8[k, , ] < mydat8 } return(list( dat8 = dat8) ) } n500 < makedata(500, 500) n5000 < makedata(500, 5000)
Appendix B Additional results from data analysis
Trace plots for analysis
References
 pre [2015] Fact sheet: President Obama Precision Medicine Initiative. https://www.whitehouse.gov/thepressoffice/2015/01/30/factsheetpresidentobamasprecisionmedicineinitiative, 2015. Accessed: 20161025.
 Chakraborty and Murphy [2014] Bibhas Chakraborty and Susan A Murphy. Dynamic treatment regimes. Annual review of statistics and its application, 1:447–464, 2014.
 Kan et al. [2008] Haidong Kan, Stephanie J London, Guohai Chen, Yunhui Zhang, Guixiang Song, Naiqing Zhao, Lili Jiang, and Bingheng Chen. Season, sex, age, and education as modifiers of the effects of outdoor air pollution on daily mortality in shanghai, china: The public health and air pollution in asia (papa) study. Environmental health perspectives, 116(9):1183, 2008.
 Chipman et al. [2010] H A Chipman, E I George, and R E McCulloch. Bart: Bayesian additive regression trees. The Annals of Applied Statistics, 4:266–298, 2010.
 McCullagh [1984] Peter McCullagh. Generalized linear models. European Journal of Operational Research, 16(3):285–292, 1984.
 VanderWeele [2015] Tyler VanderWeele. Explanation in causal inference: methods for mediation and interaction. Oxford University Press, 2015.
 Tuut and Hense [2001] Mariska Tuut and HansWerner Hense. Smoking, other risk factors and fibrinogen levels: evidence of effect modification. Annals of epidemiology, 11(4):232–238, 2001.
 Raschenberger et al. [2015] Julia Raschenberger, Barbara Kollerits, James Ritchie, Beverley Lane, Philip A Kalra, Eberhard Ritz, and Florian Kronenberg. Association of relative telomere length with progression of chronic kidney disease in two cohorts: effect modification by smoking and diabetes. Scientific reports, 5:11887, 2015.
 Vansteelandt and Goetghebeur [2003] Stijn Vansteelandt and Els Goetghebeur. Causal inference with generalized structural mean models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(4):817–835, 2003.
 Hill [2012] Jennifer L Hill. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 2012.
 Lo Re et al. [2017] Vincent Lo Re, Bret Zeldow, Michael J Kallan, Janet P Tate, Dena M Carbonari, Sean Hennessy, Jay R Kostman, Joseph K Lim, Matthew Bidwell Goetz, Robert Gross, et al. Risk of liver decompensation with cumulative use of mitochondrial toxic nucleoside analogues in hiv/hepatitis c virus coinfection. Pharmacoepidemiology and drug safety, 26(10):1172–1181, 2017.
 Rasmussen [2006] Carl Edward Rasmussen. Gaussian processes for machine learning. 2006.
 Müller et al. [2015] Peter Müller, Fernando Andrés Quintana, Alejandro Jara, and Tim Hanson. Bayesian nonparametric data analysis. Springer, 2015.
 Eilers and Marx [1996] Paul HC Eilers and Brian D Marx. Flexible smoothing with bsplines and penalties. Statistical science, pages 89–102, 1996.
 Biller [2000] Clemens Biller. Adaptive bayesian regression splines in semiparametric generalized linear models. Journal of Computational and Graphical Statistics, 9(1):122–140, 2000.
 Holmes and Mallick [2001] CC Holmes and BK Mallick. Bayesian regression with multivariate linear splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(1):3–17, 2001.
 Denison et al. [1998a] DGT Denison, BK Mallick, and AFM Smith. Automatic bayesian curve fitting. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(2):333–350, 1998a.
 Biller and Fahrmeir [2001] Clemens Biller and Ludwig Fahrmeir. Bayesian varyingcoefficient models using adaptive regression splines. Statistical Modelling, 1(3):195–211, 2001.
 Hastie and Tibshirani [1990] Trevor J Hastie and Robert J Tibshirani. Generalized additive models, volume 43. CRC Press, 1990.
 Friedman [1991] Jerome H Friedman. Multivariate adaptive regression splines. The annals of statistics, pages 1–67, 1991.
 Brezger and Lang [2006] Andreas Brezger and Stefan Lang. Generalized structured additive regression based on bayesian psplines. Computational Statistics & Data Analysis, 50(4):967–991, 2006.
 Denison et al. [1998b] David GT Denison, Bani K Mallick, and Adrian FM Smith. Bayesian mars. Statistics and Computing, 8(4):337–346, 1998b.
 Hastie et al. [2000] Trevor Hastie, Robert Tibshirani, et al. Bayesian backfitting (with comments and a rejoinder by the authors). Statistical Science, 15(3):196–223, 2000.
 Robins [1994] James M Robins. Correcting for noncompliance in randomized trials using structural nested mean models. Communications in StatisticsTheory and methods, 23(8):2379–2412, 1994.
 Robins [2000] James M Robins. Marginal structural models versus structural nested models as tools for causal inference. In Statistical models in epidemiology, the environment, and clinical trials, pages 95–133. Springer, 2000.
 Vansteelandt and Joffe [2014] S Vansteelandt and M Joffe. Structural nested models and gestimation: The partially realized promise. Statistical Science, 29:707–731, 2014.
 Chipman et al. [1998] H A Chipman, E I George, and R E McCulloch. Bayesian cart model search. Journal of the American Statistical Association, 93:935–948, 1998.
 Gelman et al. [2014] Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin. Bayesian data analysis, volume 2. Taylor & Francis, 2014.
 Albert and Chib [1993] James H Albert and Siddhartha Chib. Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, 88(422):669–679, 1993.
 Wood and Wood [2015] Simon Wood and Maintainer Simon Wood. Package ‘mgcv’. R package version, pages 1–7, 2015.
 Chipman and McCulloch [2010] H Chipman and R McCulloch. Bayestree: Bayesian methods for tree based models, 2010.
 Soriano et al. [2008] Vincent Soriano, Massimo Puoti, Pilar GarciaGasco, Juergen K Rockstroh, Yves Benhamou, Pablo Barreiro, and Barbara McGovern. Antiretroviral drugs and liver injury. Aids, 22(1):1–13, 2008.
 Günthard et al. [2016] Huldrych F Günthard, Michael S Saag, Constance A Benson, Carlos Del Rio, Joseph J Eron, Joel E Gallant, Jennifer F Hoy, Michael J Mugavero, Paul E Sax, Melanie A Thompson, et al. Antiretroviral drugs for treatment and prevention of hiv infection in adults: 2016 recommendations of the international antiviral society–usa panel. Jama, 316(2):191–210, 2016.
 Scourfield et al. [2011] Andrew Scourfield, Akil Jackson, Laura Waters, Brian Gazzard, and Mark Nelson. The value of screening hivinfected individuals for didanosinerelated liver disease? Antiviral therapy, 16:941–942, 2011.
 Sterling et al. [2006] Richard K Sterling, Eduardo Lissen, Nathan Clumeck, Ricard Sola, Mendes Cassia Correa, Julio Montaner, Mark S Sulkowski, Francesca J Torriani, Doug T Dieterich, David L Thomas, et al. Development of a simple noninvasive index to predict significant fibrosis in patients with hiv/hcv coinfection. Hepatology, 43(6):1317–1325, 2006.
Comments
There are no comments yet.