1 Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by deficits in social communications and repetitive behaviors [1]. Structural changes of the brain have been identified in patients with ASD in the literature: Sparks et al. found children with ASD have significantly increased cerebral volumes compared with typically developing (TD) and delayed developing (DD) children [2]. Pereira et al. found changes in cingulate gyrus, paracentral lobule and superior frontal gyrus on subjects with ASD [3].
Behavioralbased treatments are widely used for ASD, and Pivotal Response Treatment (PRT) is an empirically supported therapy [4] which addresses core deficits in social communication skills. Although structural changes in the brain have been studied for ASD, there has been limited progress in identifying the effect of PRT on brain structure. Therefore, predictive models for treatment outcomes based on brain structural changes is essential for understanding the mechanism of ASD. Furthermore, accurate predictive models are more robust than analytical models which tend to overfit to small datasets.
The difficulty in building large databases and the high dimensionality of medical images are the main challenges. We introduce sure independence screening (SIS) [5], a feature selection method for ultrahigh dimensional general linear models. Although the screening method is widely used in genetics research, the neuroscience community tends to use simpler linear models such as elasticnet [6]. However, these simple models cannot deal with ultrahigh dimensional problems, yet more straightforward stepwise feature selection methods are computationally expensive. In this paper, we demonstrate the superior performance of SIS over other models in the prediction of changes in severity score based on structural features of the brain. Furthermore, we analyze selected features as biomarkers for ASD.
2 Methods
2.1 Difficulties for highdimensional problems
The high dimensional problem refers to problems where the dimension is larger than the sample size . The high dimensionality causes the following problems: (a) Design matrix has more columns than rows, which causes matrix
to be singular and large. Linear regression with no constraint will generate an infinite number of solutions to training data, and it’s hard to determine which is the correct model and generalizes to test data. (b) In a high dimensional case, an unimportant variable can have a high correlation with the response variable or predictive variables, and this adds to the difficulty of variable selection. (c) The high dimension
makes stepwise feature selection methods computationally infeasible.There are mainly three types of feature selection methods: wrapper methods, embedded methods, and filter methods. Wrapper methods such as forward selection, backward elimination, and recursive feature elimination have a huge computational burden because each subset of features has to be tested. Embedded methods with builtin feature selection, such as LASSO and elasticnet regression [6], usually cannot deal with ultrahigh dimensional problems. Filter methods usually preselect variables based on some importance measures before learning; however, this preprocessing step usually cannot accurately select predictive features. Zhuang et al. proposed a twolevel feature selection approach specially for brain image analysis [7]
; however, the regionlevel feature selection in their method depends on the assumption of local smoothness of the input image, therefore is suited for voxelwise features but not ROIwise features. Zhuang et al. proposed a nonlinear feature selection method based on random forest
[8]; however, their method is well suited for nonlinear models but not linear models. Therefore, we introduce SIS [5], a fast and accurate feature selection model for ultrahigh dimensional general linear models.2.2 Sure independence screening
Sure screening means a property that all the important variables survive after applying a variable screening procedure with probability tending to 1 (Theorem 3 in
[5]) as sample size increases. This asymptotic property gives a theoretical guarantee on the performance of SIS.Suppose the design matrix
is centered and normalized to unit variance for each column, and the dimension of
is , where is the number of observations, and is the number of variables. Denote response variable as , and is a vector of length . Denote the true predictive variables as , where the true model is . Denote the nonsparsity rate as . A componentwise regression is defined as:(1) 
Note that each column of is already normalized. The th component of , denoted as , is proportional to the linear correlation between and . For a given parameter , the largest components in are selected. The subset of preserved features are defined as:
(2) 
The full algorithm is summarized in algorithm 1. The iterative SIS algorithm performs feature selection recursively, until some sparsity criterion is satisfied.
2.3 Penalized maximum likelihood for general linear models
Within each iteration in Algorithm 1, the algorithm fits a generalized linear model by penalized maximum likelihood. Different sparsity penalties can be applied considering the computation capability, and this problem can be efficiently solved with standard algorithms such as coordinate descent.
We briefly introduce several types of sparsity penalty in this paragraph. penalty is widely used for feature selection and reduces the problem to LASSO, and its variant adaptive LASSO [9] introduces the sparsity penalty term as
(3) 
where is the weight for the th component, and is a predefined penalty hyperparameter.
Other model selection criteria can be applied:
(a) Akaike information criterion (AIC) [10], where the parameters are chosen as
(4) 
where
is the degree of freedom of the fitted model, and
is the parameter to be estimated.
(b) Bayesian information criterion (BIC) [11], where the parameters are chosen as
(5) 
(c) Extended bayesian information criterion (EBIC) [12], where the model is determined as
(6) 
where is a predefined parameter, and represents number of choices to choose variables from a total of variables.
(d) Minimax concave penalty (MCP) [13], where the problem is defined as:
(7) 
where is defined as:
(8) 
3 Experiments
3.1 Participants and measures
Nineteen children (13 males, 6 females, mean age = 5.87 years, s.d. = 1.09 years) with ASD participated in 16 weeks of PRT treatment. IQ was measured using the Wechsler Abbreviated Scale of Intelligence (WASI). All participants were highly functioning (IQ 70, Mean IQ = 104.5, SD = 16.7) regarding fullscale IQ. All participants met the diagnostic criteria for ASD determined by the results of the Autism Diagnostic Observation Schedule (ADOS) [14]. The regression targets() are the differences between pre and posttreatment scores, including ADOS calibrated severity score, ADOS social affect total score, ADOS restricted and repetitive behavior total score and social responsiveness scale (SRS) total score [15].
3.2 Imaging acquisition and processing
Each child underwent pretreatment and posttreatment scans on a Siemens MAGNETOM 3T Tim Trio scanner. A structural MRI image series was acquired with a 12channel head coil and a highresolution T1weighted MPRAGE sequence with the following imaging parameters: 176 slices, TR = 2530 ms, TE = 3.31ms, flip angle = 7 °, slice thickness = 1.0 mm, voxel size = , matrix = .
The structural MRI was processed with FreeSurfer [16] using the ”reconall” command with 31 stages of processing, and the key steps include: (a) motion correction and conform, (b) nonuniform intensity normalization, (c) skull strip and removal of neck, (d) registration, (e) white matter segmentation, (f) cortical parcellation, (g) cortical parcellation statistics. We used freesurfer_stats2table_bash commands from FreeSurfer official website to extract structural statistics. Four features of each region of interest (ROI) in the Destrieux atlas [17] were extracted, including volume, cortical thickness, surface area and mean curvature, resulting in a total number of 592 features (4 features 148 cortical ROIs).
3.3 Predictive modeling and method comparison
We predicted treatment outcomes (changes in ASD severity scores) from changes of structural information (posttreatment structural features minus pretreatment features) extracted from FreeSurfer, and we included phenotype information such as age, gender and pretreatment IQ in our model as confounding factors. We performed leaveoneout crossvalidation (LOOCV) and measured crosscorrelation and root mean squared error (RMSE) between prediction and groundtruth outcomes. We chose LOOCV instead of correlation analysis to validate the predictability of selected features and reduce the false discovery rate.
The analysis was conducted in R with default parameters unless stated. We used “gaussian” family in package “SIS”, and set “penalty” as “MCP” and set “tune” as “bic”. Besides SIS, we applied other linear regression models as comparison, including: (a) elasticnet regression with nested crossvalidation to select parameter (100 default values) and (ranging from 0 to 1 with a step size of 0.1) with package “glmnet”, (b) support vector regression (SVR)[18] with package “e1701”, and (c) partial least squares regression (PLSR) [19] with nested crossvalidation to select the number of components (ranging from 1 to maximum number of components) with package “pls”.
4 Results
Results of different methods on predicting changes of SRS scores are shown in Fig. 1. Compared to other methods, results from SIS method lie nearest to the ideal line . In the highdimensional case, it’s easy to select features that have a high correlation with responses of training data but lack predictive ability on test data. Sometimes the model even produces a negative correlation as shown in Fig. 1 with PLSR and SVR, because the maximum spurious correlation grows with dimensionality, while SIS produces predictive results in the crossvalidation.
Task  SRS  ADOS calibrated severity  ADOS social affect  ADOS restricted and repetitive behavior  
Method  SIS  PLS  Elasticnet  SVR  SIS  PLS  Elaticnet  SVR  SIS  PLS  Elasticnet  SVR  SIS  PLS  Elasticnet  SVR 
RMSE  13.34  18.06  15.91  17.92  2.03  2.10  2.23  2.73  3.18  3.85  3.74  4.48  2.28  2.12  2.40  2.12 
Correlation  0.71  0.09  0.39  0.20  0.44  0.15  0.36  0.64  0.67  0.2  0.22  0.78  0.51  0.20  0.11  0.47 
ROI 












rh G rectus  
Thick  0  0  0  0  0  0  1  1  0  0  0  0  0  
Vol  1  1  0  0  0  0  0  0  1  1  1  1  1  
Area  1  1  0  0  0  0  0  0  0  0  0  0  0  
Curv  0  0  1  1  1  1  0  0  0  0  0  0  0 
Performance of different methods on various tasks are summarized in Fig.2 and the upper part of table 1. RMSE and correlation between prediction and groundtruth outcomes are reported. Corrleation is multiplied by 10 for display purpose. Compared with other methods, SIS produces the highest correlation in all tasks, and it produces a correlation of above 0.4 in ADOS calibrated severity task, while all other methods generate negative correlations because the wrong features are selected. SIS generates the lowest RMSE in the prediction of changes in SRS, ADOS calibrated severity and ADOS social affect score. For results on ADOS restricted and repetitive behavior, four methods produce very similar RMSE, while SIS generates the highest correlation.
Features selected by SIS on the SRS task are reported in the lower part of table 1. Our findings match previous literature: group differences between ASD and control have been found in anterior subcallosal gyrus [20], precuneus [21], cingulate gyrus, paracentral lobule, superior frontal gyrus and paracentral gyrus[3]. The selected features are not only correlated with treatment outcomes, but also are predictive for treatment outcomes.
5 Discussion and conclusion
The high dimensionality is a huge difficulty in many medical image analysis problems. We introduce SIS for general linear models in the ultrahigh dimensional case, and validate its superior performance over traditional methods on different tasks. SIS selects structural changes in the brain that are predictive for treatment outcomes, which is useful for understanding the effect of behavioral treatments on the autism brain. \frame{}
References
 [1] Catherine Lord et al., “Autism spectrum disorders,” Neuron, 2000.
 [2] BF Sparks et al., “Brain structural abnormalities in young children with autism spectrum disorder,” Neurology, 2002.
 [3] Alessandra M Pereira et al., “Differences in cortical structure and functional mri connectivity in high functioning autism,” Frontiers in neurology, 2018.
 [4] Robert L Koegel and Lynn Kern Koegel, Pivotal response treatments for autism: Communication, social, & academic development., 2006.
 [5] Jianqing Fan and Jinchi Lv, “Sure independence screening for ultrahigh dimensional feature space,” J. Royal Stat. Soc, 2008.
 [6] Hui Zou and Trevor Hastie, “Regularization and variable selection via the elastic net,” J. Royal Stat. Soc, 2005.
 [7] Juntang Zhuang et al., “Prediction of severity and treatment outcome for asd from fmri,” in PRIME workshop, MICCAI 2018, 2018.
 [8] Juntang Zhuang et al., “Prediction of pivotal response treatment outcome with task fmri using random forest and variable selection,” in ISBI 2018, 2018.
 [9] Hui Zou, “The adaptive lasso and its oracle properties,” JASA, 2006.
 [10] Hamparsum Bozdogan, “Model selection and akaike’s information criterion (aic): The general theory and its analytical extensions,” Psychometrika, 1987.
 [11] Scott Chen et al., “Speaker, environment and channel change detection and clustering via the bayesian information criterion,” 1998.
 [12] Jiahua Chen and Zehua Chen, “Extended bayesian information criteria for model selection with large model spaces,” Biometrika, 2008.
 [13] CunHui Zhang et al., “Nearly unbiased variable selection under minimax concave penalty,” The Annals of statistics, 2010.
 [14] Catherine Lord et al., “Autism diagnostic observation schedule: a standardized observation of communicative and social behavior.,” J Autism Dev Disord, 1989.
 [15] John N Constantino and Christian P Gruber, Social responsiveness scale (SRS), 2012.
 [16] Bruce Fischl, “Freesurfer,” Neuroimage, 2012.
 [17] Christophe Destrieux et al., “Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature,” Neuroimage, 2010.
 [18] Debasish Basak et al., “Support vector regression,” Neural Information ProcessingLetters and Reviews, 2007.
 [19] Paul Geladi and Bruce R Kowalski, “Partial leastsquares regression: a tutorial,” Analytica chimica acta, 1986.
 [20] JyriJohan Paakki et al., “Alterations in regional homogeneity of restingstate brain activity in autism spectrum disorders,” Brain research, 2010.
 [21] Joëlle Martineau, Frédéric Andersson, et al., “Atypical activation of the mirror neuron system during perception of hand motion in autism,” Brain research, 2010.
Comments
There are no comments yet.