Prediction of treatment outcome for autism from structure of the brain based on sure independence screening

10/17/2018 ∙ by Juntang Zhuang, et al. ∙ 0

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder, and behavioral treatment interventions have shown promise for young children with ASD. However, there is limited progress in understanding the effect of each type of treatment. In this project, we aim to detect structural changes in the brain after treatment and select structural features associated with treatment outcomes. The difficulty in building large databases of patients who have received specific treatments and the high dimensionality of medical image analysis problems are the challenges in this work. To select predictive features and build accurate models, we use the sure independence screening (SIS) method. SIS is a theoretically and empirically validated method for ultra-high dimensional general linear models, and it achieves both predictive accuracy and correct feature selection by iterative feature selection. Compared with step-wise feature selection methods, SIS removes multiple features in each iteration and is computationally efficient. Compared with other linear models such as elastic-net regression, support vector regression (SVR) and partial least squares regression (PSLR), SIS achieves higher accuracy. We validated the superior performance of SIS in various experiments: First, we extract brain structural features from FreeSurfer, including cortical thickness, surface area, mean curvature and cortical volume. Next, we predict different measures of treatment outcomes based on structural features. We show that SIS achieves the highest correlation between prediction and measurements in all tasks. Furthermore, we report regions selected by SIS as biomarkers for ASD.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by deficits in social communications and repetitive behaviors [1]. Structural changes of the brain have been identified in patients with ASD in the literature: Sparks et al. found children with ASD have significantly increased cerebral volumes compared with typically developing (TD) and delayed developing (DD) children [2]. Pereira et al. found changes in cingulate gyrus, paracentral lobule and superior frontal gyrus on subjects with ASD [3].

Behavioral-based treatments are widely used for ASD, and Pivotal Response Treatment (PRT) is an empirically supported therapy [4] which addresses core deficits in social communication skills. Although structural changes in the brain have been studied for ASD, there has been limited progress in identifying the effect of PRT on brain structure. Therefore, predictive models for treatment outcomes based on brain structural changes is essential for understanding the mechanism of ASD. Furthermore, accurate predictive models are more robust than analytical models which tend to overfit to small datasets.

The difficulty in building large databases and the high dimensionality of medical images are the main challenges. We introduce sure independence screening (SIS) [5], a feature selection method for ultra-high dimensional general linear models. Although the screening method is widely used in genetics research, the neuroscience community tends to use simpler linear models such as elastic-net [6]. However, these simple models cannot deal with ultra-high dimensional problems, yet more straightforward step-wise feature selection methods are computationally expensive. In this paper, we demonstrate the superior performance of SIS over other models in the prediction of changes in severity score based on structural features of the brain. Furthermore, we analyze selected features as biomarkers for ASD.

2 Methods

2.1 Difficulties for high-dimensional problems

The high dimensional problem refers to problems where the dimension is larger than the sample size . The high dimensionality causes the following problems: (a) Design matrix has more columns than rows, which causes matrix

to be singular and large. Linear regression with no constraint will generate an infinite number of solutions to training data, and it’s hard to determine which is the correct model and generalizes to test data. (b) In a high dimensional case, an unimportant variable can have a high correlation with the response variable or predictive variables, and this adds to the difficulty of variable selection. (c) The high dimension

makes step-wise feature selection methods computationally infeasible.

There are mainly three types of feature selection methods: wrapper methods, embedded methods, and filter methods. Wrapper methods such as forward selection, backward elimination, and recursive feature elimination have a huge computational burden because each subset of features has to be tested. Embedded methods with built-in feature selection, such as LASSO and elastic-net regression [6], usually cannot deal with ultra-high dimensional problems. Filter methods usually pre-select variables based on some importance measures before learning; however, this pre-processing step usually cannot accurately select predictive features. Zhuang et al. proposed a two-level feature selection approach specially for brain image analysis [7]

; however, the region-level feature selection in their method depends on the assumption of local smoothness of the input image, therefore is suited for voxel-wise features but not ROI-wise features. Zhuang et al. proposed a non-linear feature selection method based on random forest

[8]; however, their method is well suited for non-linear models but not linear models. Therefore, we introduce SIS [5], a fast and accurate feature selection model for ultra-high dimensional general linear models.

2.2 Sure independence screening

Sure screening means a property that all the important variables survive after applying a variable screening procedure with probability tending to 1 (Theorem 3 in

[5]) as sample size increases. This asymptotic property gives a theoretical guarantee on the performance of SIS.

Suppose the design matrix

is centered and normalized to unit variance for each column, and the dimension of

is , where is the number of observations, and is the number of variables. Denote response variable as , and is a vector of length . Denote the true predictive variables as , where the true model is . Denote the non-sparsity rate as . A component-wise regression is defined as:


Note that each column of is already normalized. The th component of , denoted as , is proportional to the linear correlation between and . For a given parameter , the largest components in are selected. The subset of preserved features are defined as:


The full algorithm is summarized in algorithm 1. The iterative SIS algorithm performs feature selection recursively, until some sparsity criterion is satisfied.

1 Set ;
2 ;
3 ;
4 ;
5 while   do
6       ;
7       Select model ;
9 end while
Algorithm 1 Iterative sure independence screening algorithm

2.3 Penalized maximum likelihood for general linear models

Within each iteration in Algorithm 1, the algorithm fits a generalized linear model by penalized maximum likelihood. Different sparsity penalties can be applied considering the computation capability, and this problem can be efficiently solved with standard algorithms such as coordinate descent.

We briefly introduce several types of sparsity penalty in this paragraph. penalty is widely used for feature selection and reduces the problem to LASSO, and its variant adaptive LASSO [9] introduces the sparsity penalty term as


where is the weight for the th component, and is a pre-defined penalty hyper-parameter.

Other model selection criteria can be applied:

(a) Akaike information criterion (AIC) [10], where the parameters are chosen as



is the degree of freedom of the fitted model, and

is the parameter to be estimated.

(b) Bayesian information criterion (BIC) [11], where the parameters are chosen as


(c) Extended bayesian information criterion (EBIC) [12], where the model is determined as


where is a pre-defined parameter, and represents number of choices to choose variables from a total of variables.

(d) Minimax concave penalty (MCP) [13], where the problem is defined as:


where is defined as:


3 Experiments

3.1 Participants and measures

Nineteen children (13 males, 6 females, mean age = 5.87 years, s.d. = 1.09 years) with ASD participated in 16 weeks of PRT treatment. IQ was measured using the Wechsler Abbreviated Scale of Intelligence (WASI). All participants were highly functioning (IQ 70, Mean IQ = 104.5, SD = 16.7) regarding full-scale IQ. All participants met the diagnostic criteria for ASD determined by the results of the Autism Diagnostic Observation Schedule (ADOS) [14]. The regression targets() are the differences between pre and post-treatment scores, including ADOS calibrated severity score, ADOS social affect total score, ADOS restricted and repetitive behavior total score and social responsiveness scale (SRS) total score [15].

3.2 Imaging acquisition and processing

Each child underwent pre-treatment and post-treatment scans on a Siemens MAGNETOM 3T Tim Trio scanner. A structural MRI image series was acquired with a 12-channel head coil and a high-resolution T1-weighted MPRAGE sequence with the following imaging parameters: 176 slices, TR = 2530 ms, TE = 3.31ms, flip angle = 7 °, slice thickness = 1.0 mm, voxel size = , matrix = .

The structural MRI was processed with FreeSurfer [16] using the ”recon-all” command with 31 stages of processing, and the key steps include: (a) motion correction and conform, (b) non-uniform intensity normalization, (c) skull strip and removal of neck, (d) registration, (e) white matter segmentation, (f) cortical parcellation, (g) cortical parcellation statistics. We used freesurfer_stats2table_bash commands from FreeSurfer official website to extract structural statistics. Four features of each region of interest (ROI) in the Destrieux atlas [17] were extracted, including volume, cortical thickness, surface area and mean curvature, resulting in a total number of 592 features (4 features 148 cortical ROIs).

3.3 Predictive modeling and method comparison

We predicted treatment outcomes (changes in ASD severity scores) from changes of structural information (post-treatment structural features minus pre-treatment features) extracted from FreeSurfer, and we included phenotype information such as age, gender and pre-treatment IQ in our model as confounding factors. We performed leave-one-out cross-validation (LOOCV) and measured cross-correlation and root mean squared error (RMSE) between prediction and ground-truth outcomes. We chose LOOCV instead of correlation analysis to validate the predictability of selected features and reduce the false discovery rate.

The analysis was conducted in R with default parameters unless stated. We used “gaussian” family in package “SIS”, and set “penalty” as “MCP” and set “tune” as “bic”. Besides SIS, we applied other linear regression models as comparison, including: (a) elastic-net regression with nested cross-validation to select parameter (100 default values) and (ranging from 0 to 1 with a step size of 0.1) with package “glmnet”, (b) support vector regression (SVR)[18] with package “e1701”, and (c) partial least squares regression (PLSR) [19] with nested cross-validation to select the number of components (ranging from 1 to maximum number of components) with package “pls”.

4 Results

Results of different methods on predicting changes of SRS scores are shown in Fig. 1. Compared to other methods, results from SIS method lie nearest to the ideal line . In the high-dimensional case, it’s easy to select features that have a high correlation with responses of training data but lack predictive ability on test data. Sometimes the model even produces a negative correlation as shown in Fig. 1 with PLSR and SVR, because the maximum spurious correlation grows with dimensionality, while SIS produces predictive results in the cross-validation.

Figure 1: Prediction of change of SRS score from different methods
Figure 2: Performance of different methods on different tasks. RMSE and Correlation are reported. Correlations are multiplied by 10 for display purpose.
Task SRS ADOS calibrated severity ADOS social affect ADOS restricted and repetitive behavior
Method SIS PLS Elastic-net SVR SIS PLS Elatic-net SVR SIS PLS Elastic-net SVR SIS PLS Elastic-net SVR
RMSE 13.34 18.06 15.91 17.92 2.03 2.10 2.23 2.73 3.18 3.85 3.74 4.48 2.28 2.12 2.40 2.12
Correlation 0.71 -0.09 0.39 -0.20 0.44 -0.15 -0.36 -0.64 0.67 -0.2 0.22 -0.78 0.51 0.20 0.11 0.47
rh G
rh S orbital
lh G
lh G temp
lh S
rh S cingul.
lh G and S
rh S oc
.temp lat
lh G and S
lh G and S
lh S collat
transv post
rh G Ins lg
and S cent ins
rh G rectus
Thick 0 0 0 0 0 0 1 1 0 0 0 0 0
Vol 1 1 0 0 0 0 0 0 1 1 1 1 1
Area 1 1 0 0 0 0 0 0 0 0 0 0 0
Curv 0 0 1 1 1 1 0 0 0 0 0 0 0
Table 1: Upper table: Quantitative results of different methods. Lower table: Features related to change of SRS selected by SIS using a2009s atlas in FreeSurfer, 1 (0) represents selected (unselected)

Performance of different methods on various tasks are summarized in Fig.2 and the upper part of table 1. RMSE and correlation between prediction and ground-truth outcomes are reported. Corrleation is multiplied by 10 for display purpose. Compared with other methods, SIS produces the highest correlation in all tasks, and it produces a correlation of above 0.4 in ADOS calibrated severity task, while all other methods generate negative correlations because the wrong features are selected. SIS generates the lowest RMSE in the prediction of changes in SRS, ADOS calibrated severity and ADOS social affect score. For results on ADOS restricted and repetitive behavior, four methods produce very similar RMSE, while SIS generates the highest correlation.

Features selected by SIS on the SRS task are reported in the lower part of table 1. Our findings match previous literature: group differences between ASD and control have been found in anterior subcallosal gyrus [20], precuneus [21], cingulate gyrus, paracentral lobule, superior frontal gyrus and paracentral gyrus[3]. The selected features are not only correlated with treatment outcomes, but also are predictive for treatment outcomes.

5 Discussion and conclusion

The high dimensionality is a huge difficulty in many medical image analysis problems. We introduce SIS for general linear models in the ultra-high dimensional case, and validate its superior performance over traditional methods on different tasks. SIS selects structural changes in the brain that are predictive for treatment outcomes, which is useful for understanding the effect of behavioral treatments on the autism brain. \frame{}


  • [1] Catherine Lord et al., “Autism spectrum disorders,” Neuron, 2000.
  • [2] BF Sparks et al., “Brain structural abnormalities in young children with autism spectrum disorder,” Neurology, 2002.
  • [3] Alessandra M Pereira et al., “Differences in cortical structure and functional mri connectivity in high functioning autism,” Frontiers in neurology, 2018.
  • [4] Robert L Koegel and Lynn Kern Koegel, Pivotal response treatments for autism: Communication, social, & academic development., 2006.
  • [5] Jianqing Fan and Jinchi Lv, “Sure independence screening for ultrahigh dimensional feature space,” J. Royal Stat. Soc, 2008.
  • [6] Hui Zou and Trevor Hastie, “Regularization and variable selection via the elastic net,” J. Royal Stat. Soc, 2005.
  • [7] Juntang Zhuang et al., “Prediction of severity and treatment outcome for asd from fmri,” in PRIME workshop, MICCAI 2018, 2018.
  • [8] Juntang Zhuang et al., “Prediction of pivotal response treatment outcome with task fmri using random forest and variable selection,” in ISBI 2018, 2018.
  • [9] Hui Zou, “The adaptive lasso and its oracle properties,” JASA, 2006.
  • [10] Hamparsum Bozdogan, “Model selection and akaike’s information criterion (aic): The general theory and its analytical extensions,” Psychometrika, 1987.
  • [11] Scott Chen et al., “Speaker, environment and channel change detection and clustering via the bayesian information criterion,” 1998.
  • [12] Jiahua Chen and Zehua Chen, “Extended bayesian information criteria for model selection with large model spaces,” Biometrika, 2008.
  • [13] Cun-Hui Zhang et al., “Nearly unbiased variable selection under minimax concave penalty,” The Annals of statistics, 2010.
  • [14] Catherine Lord et al., “Autism diagnostic observation schedule: a standardized observation of communicative and social behavior.,” J Autism Dev Disord, 1989.
  • [15] John N Constantino and Christian P Gruber, Social responsiveness scale (SRS), 2012.
  • [16] Bruce Fischl, “Freesurfer,” Neuroimage, 2012.
  • [17] Christophe Destrieux et al., “Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature,” Neuroimage, 2010.
  • [18] Debasish Basak et al., “Support vector regression,” Neural Information Processing-Letters and Reviews, 2007.
  • [19] Paul Geladi and Bruce R Kowalski, “Partial least-squares regression: a tutorial,” Analytica chimica acta, 1986.
  • [20] Jyri-Johan Paakki et al., “Alterations in regional homogeneity of resting-state brain activity in autism spectrum disorders,” Brain research, 2010.
  • [21] Joëlle Martineau, Frédéric Andersson, et al., “Atypical activation of the mirror neuron system during perception of hand motion in autism,” Brain research, 2010.