SP-BART
R scripts and data sets that can be used to reproduce the results presented in the paper "Semi-parametric Bayesian Additive Regression Trees".
view repo
We propose a new semi-parametric model based on Bayesian Additive Regression Trees (BART). In our approach, the response variable is approximated by a linear predictor and a BART model, where the first component is responsible for estimating the main effects and BART accounts for the non-specified interactions and non-linearities. The novelty in our approach lies in the way we change tree generation moves in BART to deal with confounding between the parametric and non-parametric components when they have covariates in common. Through synthetic and real-world examples, we demonstrate that the performance of the new semi-parametric BART is competitive when compared to regression models and other tree-based methods. The implementation of the proposed method is available at https://github.com/ebprado/SP-BART.
READ FULL TEXT VIEW PDFR scripts and data sets that can be used to reproduce the results presented in the paper "Semi-parametric Bayesian Additive Regression Trees".
Generalised Linear Models GLMs ^{30}, ^{26} are frequently used in different applications to predict a univariate response due to the ease of interpretation of the parameter estimates as well as the large availability of software that facilitates simple analyses. A common assumption in GLMs is that the covariates specified (including potential interaction terms) have a linear relationship with the mean of the response after transformation through the link function. Extensions such as Generalised Additive Models GAMs ^{16}, ^{42} require the specification of the main and interaction effects via a sum of (potentially non-linear) predictors. In GAMs, the non-linear relationship is usually captured via basis expansions of the covariates and constrained by a smoothing parameter. However, in problems where the numbers of covariates and/or observations are large, the linearity assumption may not be verified and, more importantly, it may not be simple to specify the covariates and their interactions that impact most on the response. Semi-parametric models ^{14} have been proposed for situations where a mixture of both linear and non-linear trends and interactions are required for the data at hand.
Bayesian Additive Regression Trees BART ^{4}
is a statistical model based on an ensemble of trees that can be used for classification and regression problems. Through a backfitting algorithm, BART sequentially generates a set of trees that, when summed together, return the predicted values. To control how deep or shallow a tree can be, a prior is placed on the tree structure. In addition, the covariates and split-points that are used to define the tree structure (i.e. splitting rules) are randomly selected without the optimisation of a loss function, such as in Random Forests
^{3}^{11}. Compared to regression models, BART is more flexible in the sense that it does not assume linearity between covariates and the response, and it does not require the specification of a linear predictor. In contrast, however, BART does not provide an easy way to quantify the effect of a covariate on the response as in a regression model, which in some applications may be the main goal.In this work, we introduce a BART extension to a semi-parametric regression setting. In this new method, which we name SP-BART, we combine a linear predictor where the effects of covariates of primary interest can be quantified, along with a BART component which deals with covariates that are not of primary interest but may be related to the response. Our approach aims to tackle two issues found when analysing datasets with a large number of covariates using black-box type algorithms. Firstly, there exists a difficulty when quantifying the effects of some covariates on the response. It is known that algorithms similar to BART and Random Forest are flexible and can produce good prediction, but for many situations (e.g. studies in medicine ^{43}, ^{13}, ^{19}), the prediction is not the most important aspect. Instead, knowing how the covariates impact the response is crucial; but this quantification is not possible with the vanilla BART or Random Forest. The second issue is that usually regression models are utilised to measure the effects that some covariates may have on a response. However, these models require the specification of the interaction terms, which is complicated to pre-specify in large or high-dimensional datasets, though is a requirement for most standard GLM settings. Lastly, they assume linearity between the response and the covariates, which is an overly rigid assumption in some cases.
In our novel semi-parametric BART, we have made structural changes in the way the trees are grown to avoid confounding between the estimates from the linear and non-linear components in the model. In particular, in the linear predictor we allow the specification of fixed and random effects, as in a Linear Mixed Model, in which the intercept and parameter estimates can vary by a grouping factor. In contrast, the interactions and non-linearities are dealt with by the BART component. Through two simulation experiments, we compare SP-BART with its main competitors and show that it is able to recover the true effects in either the presence or absence of interactions. In the real-world applications, we demonstrate that SP-BART presents excellent performance when compared to other semi-parametric models.
This paper is organised as follows. In Section 1.1, we present some related works. In Section 2, we introduce the BART model. In Section 3, we state the proposed semi-parametric BART. In Sections 4 and 5, we compare the performance of SP-BART with other algorithms on synthetic and real-world datasets. To conclude, in Section 6 we present a discussion.
BART has been used and extended to different applications, and its theoretical properties have also gained attention more recently. For instance, BART has been applied to credit risk modelling ^{44}, survival/competing analysis ^{39}, ^{38}, biomarker discovery ^{17}, plant-based genetics ^{37}, and causal inference ^{19}, ^{12}, ^{13}
. Furthermore, it has also been extended to high-dimensional data
^{23}, ^{18}, polychotomous response ^{21}, ^{29}, zero-inflated and semi-continuous data ^{29}, ^{24}, heteroscedastic data
^{32}, and to estimate linear, smooth and monotone functions ^{31}, ^{40}, ^{5}. Regarding theoretical developments, we highlight the works of ^{35}, ^{34}, and ^{25}, who provide results related to the convergence of the posterior distribution generated by the BART model.A related work to our novel model is the semi-parametric BART proposed by ^{43}. In their model, the design matrix is split into two disjoint subsets and , which contain covariates of primary and non-primary interest, respectively. The covariates of primary interest in are specified in a linear predictor and the others are exclusively used by BART. Our work differs in that i) we do not assume that and are disjoint, i.e. we assume that or even that
. Moreover, ii) we change the way the trees in BART are grown by introducing a ‘double grow’ and ‘double prune’ move to account for confounding, and iii) we place a prior on the variance-covariance matrix of the main effects so that we are able to analyse the covariance/correlation among them.
^{6} introduce Varying Coefficient BART (VCBART), which combines the idea of varying coefficient models ^{15} with BART and extends the work of ^{13}
to a framework with multiple covariates. In VCBART, the response is modelled via a linear predictor where the effect of each covariate is approximated by a BART model based on a set of modifiers (i.e. covariates that are not of primary interest). In addition, they provide theoretical results related to the asymptotic concentration of the VCBART posterior distribution around the true posterior considering non-i.i.d. errors. Although VCBART and SP-BART have aspects in common, such as the use of a linear predictor along with BART, our work is structurally different as we do not estimate the parameters in the linear predictor via BART. Instead, they are obtained in the same fashion as a Bayesian linear regression approach.
Introduced by ^{4}, BART is a Bayesian tree-based model that was initially proposed in the context of regression and classification problems. Unlike some regression models that assume linearity between the covariates and response, and require the specification of the main effects and interaction terms via a linear predictor, BART deals with non-linearities and interactions without any pre-specification. In BART, a univariate response is approximated by a sum of trees with:
where is a function that assigns the predicted value to all observations that fall into the terminal node of the tree , is the -th row of the design matrix X, represents the topology of the tree , and
denotes the vector with all predicted values of the tree
. In relation to the number of trees , ^{4} recommend as a default , although they point out that can also be selected via cross-validation depending on the application.Unlike other tree-based algorithms where a loss function is minimised to define the splitting rules in the growing process, in BART the splitting rules are uniformly defined (i.e. the covariates and their split points are selected at random based on a uniform distribution). In addition, in the BART model the structure of the trees can be modified by four moves: grow, prune, change, and swap (see Figure
1). For instance, in the grow move a terminal node is randomly selected and two new terminal nodes are created below it. During a prune move, a parent of two terminal nodes is picked at random and its children are removed. In the change movement, an internal node is randomly selected and its splitting rule is changed. Finally, in the swap move the splitting rules associated to two parents of terminal nodes are exchanged, with the parent nodes being selected at random.At the beginning of the BART model, it is common to set all trees as stumps. Hence, within an MCMC structure, each tree is updated in turn by a grow, prune, change, or swap move and then compared to its previous version considering the partial residuals and the structure of both trees via a marginal likelihood calculation. This comparison is carried out via a Metropolis-Hastings step, and is needed to select only splitting rules that improve the final prediction, since they are chosen based on a uniform distribution. After this, all node-level parameters () are generated and the residual variance () is updated; see Appendix A for details. As a Bayesian model, BART places priors on the parameters of interest assuming that and , where
represents the Inverse Gamma distribution and
with . In addition, a branching process prior is considered to control the depth of the trees in the form of , where and . With this prior, each internal node is observed at depthwith probability
; ^{4} recommend and , which tends to favour shallow trees. In practice, BART is available through the R packages bartMachine ^{20}, BART ^{27}, and dbarts ^{7}.In our novel semi-parametric approach, we separate the design matrix X into two subsets: and . The first matrix contains the covariates we wish to include in a linear component to quantify the main effects, and is a matrix that contains covariates that might contribute for predicting the response, but are not of primary interest. Unlike ^{43}, who assume that , we consider that and may have covariates in common. In this sense, we consider a linear predictor inside the BART framework with:
where , , and . In ^{4} and ^{43}, the set of trees used in BART can be modified by four moves (grow, prune, change and swap). However, to allow for and sharing covariates, we propose a change in the moves in order to minimise confounding between the linear and BART components. In this sense, if , we propose a ‘double grow’ step for any covariate that is common to both matrices. For instance, if is a stump and is selected to define a splitting rule, then another covariate (e.g., ) will also be picked and the proposed tree will have at least and in its structure. The idea of this modification is to induce interaction for those covariates that are in both components and let the linear component capture their main effects. If no double growing is done when and have at least one covariate in common, both components (linear predictor and BART) try to estimate the effect of the covariates that are in the linear predictor and inevitably there is some degree of confounding. However, when we introduce the double grow move, we allow the linear component to estimate the main effects and force BART to work specifically on the interactions and non-linearities.
Equations (1) and (2) present the full conditionals for and . These expressions are needed due to the addition of the linear predictor into the BART model; see Appendix B for full details. An outline algorithm for the process is given as:
Update the linear predictor with as
(1) | ||||
(2) |
Then, sequentially update all trees, one at a time, via .
Finally, update
(3) |
where and .
In step 1, the parameter estimates and the variance-covariance matrix are updated taking into account the difference between the response and the prediction from all trees. In steps 2 and 3, each tree is modified considering the brand new parameter estimates and the error variance is updated. The main benefits of our approach are i) lower RMSEs when compared to GLMs and GAMs (as demonstrated below) and ii) we are able to obtain interactions and non-linearities without requiring any pre-specification. Regarding the computational cost, we have found that the proposed method adds a negligible time overhead to traditional implementations of BART, especially for small to medium datasets.
In this section, we compare our new SP-BART with GAM, semi-parametric BART of ^{43} (hereafter referred to as separated semi-BART), and VCBART in terms of bias using two sets of synthetic data. The results were generated using R ^{33} version 3.6.3 and the R packages mgcv ^{42}, semibart ^{43}, and VCBART ^{6}. For both, our SP-BART and separated semi-BART, we use 50 trees, 2,000 MCMC iterations as burn-in, and 2,000 as post-burn-in. As the GAMs require the specification of the terms that should be included in the linear predictor, for both scenarios we specify the true structure used to simulate the data. This gives the GAMs an unfair advantage over the other approaches, but does provide a baseline that the BART-based approaches can aim for.
In this first scenario, we compare SP-BART with GAM, separated semi-BART, and VCBART considering the Friedman equation ^{10}:
(4) |
where , , . This equation is used for benchmarking tree-based methods using synthetic data, and has been used in many other papers ^{4}, ^{6}, ^{23}. In this experiment, we set , , , totalling four scenarios. To evaluate the models’ performances, we use the bias of the parameter estimates as the accuracy measure considering 10 Monte Carlo repetitions. As the Friedman equation considers only 5 covariates to generate the response, the additional are noise and have no impact on . In this simulation, we aim to estimate the linear effects associated to and (denoted by and , respectively) using the linear predictor, i.e. we set up so that it contains only the covariates and . In contrast, we let BART take care of the non-linear and interaction terms by setting to contain all covariates (including and ).
Figure 2 shows the results of bias obtained from GAM, separated semi-BART, VCBART and new SP-BART for each combination of and . We can see that when , the bias of the parameter estimates is low and SP-BART is able to recover the true effects. As GAM requires the specification of all terms that are estimated by the model, we set up the true structure of the Friedman equation so that it can be used as a reference in the comparison. Furthermore, the estimates from SP-BART and separated semi-BART are very similar. This similarity is expected and can be attributed to the fact that the effect of and have no interactions with other covariates. In this sense, the trees in SP-BART tend not to contain and as both effects are captured solely by the linear predictor.
In the scenario above, we have shown that new SP-BART correctly estimates the main effects when they do not have any interaction with other effects. However, in practice the covariates of primary interest may interact with other effects, and this should be taken into account. In this scenario, we compare the methods using the regression function:
(5) |
where , , , and represents the tree structure that can be seen in Figure 3. As in the Friedman equation, we consider that , , , where the additional covariates have no impact on the response. We are now interested in estimating the effects associated to and (denoted by and , respectively), which can be carried out by specifying to contain only and . In turn, contains all covariates (including and ).
Figure 4 shows the bias of the parameter estimates for and . Again, we set up the true structure for the GAM as it is unable to capture interactions that are not specified. We can see that separated semi-BART gives extremely large bias for both parameters when . When , the bias is more evident for . This bias occurs because and are not used as splits by the BART model in the separated semi-BART, which affects the estimation of the effect associated to . In contrast, our new SP-BART estimates and with a very low bias, regardless of the number of covariates () and the amount of variability (). Furthermore, it can be seen that VCBART and SP-BART provide similar bias for both parameters, and match well with the baseline GAM model. However, it is worth mentioning that VCBART uses a BART model to estimate each parameter in the linear predictor. In this example, VCBART used trees in total to estimate and , as we set up 50 trees for each parameter. In this sense, the greater the number of parameters to be estimated in the linear predictor, the more computationally intensive VCBART becomes, since the total number of trees used to estimate all covariate effects is a function of the number of covariates in the linear predictor and of the number of trees used to approximate each effect.
In this section, we illustrate SP-BART on the Trends in International Mathematics and Science Study (TIMSS) dataset. TIMSS is an international series of assessments which takes place every four years. The TIMSS 2019 data provides students’ achievement in mathematics and science at the fourth and eighth grade level in 64 countries ^{28}, ^{8}. The dataset also contains information from surveys of students, teachers and school principals. Here, we are interested in quantifying the impact of some covariates on students’ mathematics score (variable BSMMAT01). In our analysis, we selected data from Ireland (4118 observations), students from the eighth grade level, and pre-selected 20 covariates; see Appendix C for full details.
Initially, we compare SP-BART with the Bayesian Causal Forest (BCF) proposed by ^{13}. To do so, we consider the covariate ‘school discipline problems’ in the linear predictor of the SP-BART and as the treatment variable for BCF. Originally, this covariate is a factor with 3 levels (‘hardly any problems’, ‘minor problems’, and ‘moderate to severe problems’), but we binarise it by grouping the first two categories into one as BCF deals specifically with a binary covariate as the treatment effect. The goal is to quantify the impact of discipline problems on student’s mathematics score along with another 19 covariates; see Appendix C. In Table 1, we show the posterior distributions of the parameter estimates for BCF and SP-BART. We can see that the effect of discipline problems based on both methods is negative, which means that students who study in schools with moderate to severe discipline issues tend to have lower mathematics scores than students who study in schools with hardly or minor discipline problems. In addition, we observe that the range of the credible interval for BCF is greater than SP-BART.
Method | Mean | 5th percentile | 95th percentile |
---|---|---|---|
BCF | -34.53 | -63.81 | -6.15 |
SP-BART | -48.20 | -60.18 | -38.74 |
One of BCF’s limitations is that it allows only one binary covariate as the variable of interest, which is not sensible for some problems. However, we can use VCBART as it is an extension to BCF that permits us to evaluate the effect of more covariates of interest, regardless of whether they are continuous, factor, or binary. In this sense, we can consider more covariates in the linear predictor, here parents’ education level (factor with 6 levels), minutes spent on homework (factor with 6 levels), and school discipline problems (factor with 3 levels). For the BART component, we selected 17 other covariates (see Appendix C) plus the three present in the linear predictor. As there are common variables in both components, we are able to capture interactions and non-linearity involving the covariates of primary and non-primary interest without any pre-specification. For instance, if we were to fit a linear (mixed) model to these data, it would be necessary to specify all main effects terms as well as the interactions between them.
Table 2 presents the parameter estimates and the corresponding credible interval associated with the covariates parents’ education, minutes spent on homework, and school discipline problems. We can see that in general the higher the parents’ education level, the greater the mathematics scores. For instance, parents who studied at a ‘university or higher’ or did a ‘post-secondary’ qualification have children with higher scores than children with parents that had formal education up to secondary level, on average. In addition, we can see that students with parents who have ‘primary, secondary or no formal education’ tend to have lower scores on average. Furthermore, schools with ‘moderate to severe‘ discipline problems have students with lower scores when compared with schools with ‘hardly’ or ‘minor’ discipline issues. As we consider that the linear and BART components may have covariates in common, we verified that covariates that are in the linear predictor interact with some covariates of primary and non-primary interest. To detect the interaction effects, we adopted the approach of ^{20} in which variables are considered to interact with each other if they are adjacent in the same tree. For instance, we have observed in the results of SP-BART that parents’ education shows interactions with minutes spent on homework. Similarly, school discipline problems appears to interact with absenteeism and minutes spent on homework. To analyse which interactions are important in VCBART, it would be needed to examine all trees for all covariates in the linear predictor. For instance, if there are 3 covariates in the linear predictor, it would be necessary to evaluate 3 times 50 = 150 trees (along the MCMC iterations), as the effect associated to each covariate is approximated by 50 trees (by default).
SP-BART | VCBART | ||||
Covariate | Category | Estimate | 95CI | Estimate | 95CI |
Parents’ education | University or higher | 17.59 | (-0.37;35.34) | (10.06;34.12) | |
Post-secondary but not University | 17.31 | (-0.55;34.17) | (-11.78;23.84) | ||
Upper Secondary | -2.68 | (-21.13;15.63) | (-15.67;34.02) | ||
Lower Secondary | -11.02 | (-29.94;6.99) | (-20.83;8.81) | ||
Primary, secondary or no school | -16.69 | (-36.45;4.24) | (-61.01;39.74) | ||
Not informed | -4.50 | (-22.31;12.74) | (-27.62;8.64) | ||
Minutes spent on homework | no homework | -19.21 | (-65.77;21.53) | (-129.09;18.44) | |
1 to 15 minutes | -6.33 | (-37.67;18.4) | (-35.52;36.87) | ||
16 to 30 minutes | 10.49 | (-8.67;30.15) | (-5.67;16.44) | ||
31 to 60 minutes | 10.14 | (-10.55;33.82) | (4.14;17.76) | ||
61 to 90 minutes | 7.68 | (-13.91;29.55) | (-1.79;27.22) | ||
More than 90 minutes | -2.76 | (-24.87;20.15) | (-86.68;56.45) | ||
School Discipline Problems | Hardly any problems | 14.86 | (-8.36;39.95) | (-7.84;27.42) | |
Minor problems | 3.05 | (-21.6;28.41) | (5.46;11) | ||
Moderate to severe problems | -17.92 | (-43.62;9.64) | (-30.24;30.19) |
As another example to illustrate the applicability of SP-BART, we analyse the well-known Pima Indians Diabetes dataset from the UCI repository ^{2}, which is available in R through the mlbench package ^{22}. The goal is to predict whether or not a patient has diabetes based on blood pressure, glucose concentration, age, body mass index, and 4 other covariates. The dataset used is the corrected version of the original data, which considers NA rather than zero for values of the covariates glucose, blood, triceps, insulin and mass ^{22}. As the response variable is binary, we used a probit link function following the data augmentation approach of ^{1}. Here, we are interested in measuring the effects of glucose and age through the linear predictor. For the BART component, set all the 8 covariates available, including glucose and age.
In Table 3, we present the parameter estimates and the 95 credible interval for glucose and age. For these data, we compare SP-BART with separated semi-BART as the implementation of VCBART cannot deal with binary response variables. The estimates for both covariates indicate that as they increase, the probability of observing a positive diagnosis of diabetes is also increased, regardless of the method being used. However, when we look at the misclassification rate, we see that our SP-BART and separated semi-BART have and , respectively. We recall that the (separated) semi-BART of ^{43} assumes that the covariates that are of primary interest (i.e., glucose and age) are not in the BART component. However, in SP-BART we assume that the linear predictor and the BART components may have common variables, so we also ran the separated semi-BART with glucose and age in both the linear and BART components, which returned a slightly better misclassification rate of . Here, we observed that the inclusion of glucose and age into the design matrix used by BART generates trees that occasionally use only glucose or age. In this case, both the linear predictor and the BART model try to estimate the effects of these covariates, which is not sensible as it brings confounding/bias to the parameter estimates and does not contribute to the predictive power. Once again, we can see the benefits that arise from sharing covariates among the components along with the double grow move.
SP-BART | separated Semi-BART | |||
---|---|---|---|---|
Covariate | Estimate | 95CI | Estimate | 95CI |
Intercept | 0.0623 | (-1.117; 1.152) | (-4.6225; -2.0874) | |
Glucose | 0.0352 | (0.027; 0.043) | (0.0164; 0.0295) | |
Age | 0.0569 | (0.020; 0.093) | (0.0006; 0.0399) |
In this work, we have extended BART to a semi-parametric framework that does not contain many of the restrictions found in other so-called ‘semi-parametric‘ versions of BART. In our novel SP-BART, the main effects are estimated via a linear predictor and the interactions and non-linearities are dealt with by BART. The main novelty of SP-BART is the modification of the grow and prune moves to force the BART component to work on the interactions that are not covered by the linear predictor. The double growing/pruning moves aim to induce interaction between covariates that are both in the linear predictor and in the subset available for BART. In addition, via simulation studies and real data applications, we have demonstrated that SP-BART is able to estimate the main effects with low bias, and at same time, it does not require the specification of interactions. We have implemented SP-BART as an R package, which is currently available at https://github.com/ebprado/SP-BART.
In future work, we believe that other BART-based models, such as SBART ^{23}, BART for Gamma and Log-normal hurdle data ^{24}, and log-linear BART ^{29}, could be extended to a semi-parametric approach. In addition, theoretical results underlying SP-BART could be developed in order to explore the properties of the proposed method in relation to the convergence of the posterior distribution.
Estevão Prado’s work was supported by a Science Foundation Ireland Career Development Award grant number 17/CDA/4695 and SFI research centre 12/RC/2289P2. Andrew Parnell’s work was supported by: a Science Foundation Ireland Career Development Award (17/CDA/4695); an investigator award (16/IA/4520); a Marine Research Programme funded by the Irish Government, co-financed by the European Regional Development Fund (Grant-Aid Agreement No. PBA/CC/18/01); European Union’s Horizon 2020 research and innovation programme under grant agreement No 818144; SFI Centre for Research Training 18CRT/6049, and SFI Research Centre awards 16/RC/3872 and 12/RC/2289P2.
The authors declare no potential conflict of interests.
In this section, we provide the mathematical details of the BART model following ^{31} and ^{41}. First, we define the likelihood function associated to the response variable as
where is the function that returns the predicted value given the design matrix X and the tree structure , and represents the vector that stores the predicted values of . In order to obtain the full conditionals for and , ^{4} assume the following priors
where with . To control the tree structure/depth, ^{4} place the following prior on :
where is the depth of the node , and
are hyperparameters that control the shape of the tree, and
and denote the sets of internal and terminal nodes of tree , respectively. Hence, the joint posterior distribution can be written as(6) | ||||
where is a set of rules that define node of the tree . ^{4} noted that it is possible to sample from (6) in two step substituting the response variable y by the partial residuals .
A new tree is proposed by a grow, prune, change, or swap step and then compare to its previous version via
where , , and is the number of observations that belong to node of the tree . This sampling is carried out through a Metropolis-Hastings step.
Since are i.i.d, their posterior is given by
(7) |
Finally, the full conditional of is given as
(8) |
where is the predicted response. In Algorithm 1, the full structure of the BART model is presented.
In this section, we provide details for the implementation of SP-BART. Here, we consider that y denotes a univariate response and that the likelihood associated to it is given as
where , , and . In addition, we define the partial residuals as . In Algorithm 2, the structure of SP-BART is presented. First, the response and the design matrices and are set up along with the numbers of trees (e.g., ) and MCMC iterations. Initially, all trees are initialised as stump as well as the hyperparameters of the prior distributions associated to , , , and . Second, the parameter vector and the variance-covariance matrix are updated. Hence, candidate trees () are sequentially proposed, one at time, and compared with their previous versions () via a Metropolis-Hastings step. Later, the node-level parameters are generated and then the variance is updated. For a large enough number of iterations, samples from the posterior distribution of the trees are obtained after convergence.
In Section 5.1, we selected students at eighth grade level and data from Ireland. For simplicity and to demonstrate the new methodology, we did not consider multiple plausible values of student achievements in the sciences and ignored sampling weights. However, to rigorously evaluate the effects and generalise the results, it is necessary to consider five different plausible values and sampling weights; see ^{36} and ^{9} for details. Below, we present the 20 covariates that we pre-selected to illustrate the SP-BART. To select these covariates, we ran the vanilla BART on TIMSS dataset and then we chose the most 20 used variables. In the comparisons with BCF and VCBART, all the 20 covariates were included in , which is the matrix used by BART component in SP-BART. In the first comparison with BCF, contained the binarised version of the covariate ‘Does the school have discipline problems?’ along with a column with ones representing the intercept. When comparing SP-BART with VCBART, contained a column of ones representing an intercept and the covariates ‘Parents’ education level’, ‘Minutes spent on homework’, and ‘School discipline problems’.
Covariate | Label | Source |
---|---|---|
BSBG10 | How often student is absent from school | Student |
BSBM26AA | How often teacher gives homework | Student |
BSBM17A | Does the student know what the teacher expects them to do? | Student |
BSBG11B | How often student feels hungry | Student |
BSBM20I | Does the student think it is important to do well in mathematics? | Student |
BSBG05A | Does the student have a tablet or a computer at home? | Student |
BCBG13BC | Does the school have library resources? | Principal |
BSBG13D | Does the student think that teachers are fair in their school? | Student |
BCBG13AD | Does the school have heating systems? | Principal |
BCBG06C | How many instructional days in one calendar week does the school have? | Principal |
BSBG05I | Country-specific indicator of wealth | Student |
BSBM19A | Does the student do usually well in mathematics? | Student |
BSBM42BA | Minutes spent on homework | Student |
BSBM43BA | For how many of the last 12 months have the student attended extra lessons or tutoring in mathematics? | Student |
BCDGDAS | Does the school have discipline problems? | Principal |
BSBG11A | How often does the student feel tired? | Student |
BCBG13AB | Does the school have shortage of supplies? | Principal |
BCDGMRS | Are the instructions affected by the material resources shortage? | Principal |
BSDGEDUP | Parents’ highest education Level | Student |
ITSEX | Sex of students | Student |
The 22nd International Conference on Artificial Intelligence and Statistics The 22nd international conference on artificial intelligence and statistics ( 2839–2848).