Deep Learning for Neuroimaging-based Diagnosis and Rehabilitation of Autism Spectrum Disorder: A Review

07/02/2020 ∙ by Marjane Khodatars, et al. ∙ Synacor, Inc. 9

Accurate diagnosis of Autism Spectrum Disorder (ASD) is essential for management and rehabilitation. Neuro-imaging techniques that are non-invasive are disease markers and may be leveraged to aid ASD diagnosis. Structural and functional neural imaging techniques provide physicians substantial information about the structure (anatomy and structural communication) and function (activity and functional communication) of the brain. Due to the intricate structure and function of the brain, diagnosing ASD with neuro-imaging data without exploiting artificial intelligence (AI) techniques is extremely challenging. AI techniques comprise traditional machine learning (ML) approaches and deep learning (DL) techniques. Conventional ML methods employ various feature extraction and classification techniques, but in DL, the process of feature extraction and classification is accomplished intelligently and integrally. In this paper, studies conducted with the aid of DL networks to distinguish ASD were investigated. Rehabilitation tools provided by supporting ASD patients utilizing DL networks were also assessed. Finally, we presented important challenges in this automated detection and rehabilitation of ASD.



There are no comments yet.


page 1

page 3

page 4

page 5

page 6

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

ASD is a disorder of the nervous system that affects the brain and result in difficulties in speech, social interaction and communication deficits, repetitive behaviors, and delays in motor abilities [one]. The disease can generally be distinguished with extant diagnostic protocols from the age of three onwards. Autism influences many parts of the brain. This disorder also involves a genetic influences via by gene interactions or polymorphisms [two, three]

. One in 70 children worldwide is affected by autism. In 2018, the prevalence of ASDs was estimated to occur in 168 out of 10,000 children in the United States, one of the higher prevalence rates worldwide. Autism is significantly more common in boys than in girls. In the United States, about 3.63 percent of boys aged 3 to 17 have autism spectrum disorder, compared with approximately 1.25 percent of girls


Diagnosing ASD is difficult because there is no pathophysiological marker, relying instead psychological criteria [five]. Psychological tools can identify individual behaviors, levels of social interaction, facilitating early diagnosis. Behavioral evaluations embrace various instruments and questionnaires to assist the physician to specify the particular type of delay in a child’s development, including clinical observations, medical history, autism diagnosis instructions, and growth and intelligence tests [six].

Several investigations for the diagnosis of ASD have recently been conducted on neuro-imaging data (structural and functional).

Analyzing anatomical connections of brain areas with structural neuroimaging is an essential tool for studying structural disorders of the brain in ASD. The principal tools for structural brain imaging are magnetic resonance imaging (MRI) techniques [seven, a8, a9]

. Cerebral anatomy is defined by structrul MRI (sMRI) images and anatomical connections are assesed by diffusion tensor imaging MRI (DTI-MR)

[a10]. Investigating the activity and functional connections of brain areas using functional neuroimaging can be used for studying ASD. Brain functional diagnostic tools are older approaches than the previous two methods for studying ASD. The most basic modality of functional neural imaging is electroencephalography (EEG), which records the electrical activity of the brain from the surface of the head with a high time resolution (in milliseconds order) [a11]. Studies employing EEG signals in ASD have been useful [a12, a13, a14]. Functional MRI (fMRI) is one of the more promising imaging modalities in functional brain disorders, used as a task-based T-fMRI or resting‐state functional magnetic resonance imaging (rs-fMRI) [a15, a16]. fMRI-based techniques have a high spatial resolution (in the order of millimeters) but a low temporal resolution due to the response of the hemodynamic system of the brain as well as fMRI imaging time constraints and is not ideal for recording the fast dynamics of brain activities.

In addition, these procedures have a high sensitivity to motion artifacts. It should be stressed that in consonance with studies, three less prevalent modalities of electrocorticography (ECoG) [a17], functional near-infrared spectroscopy (fNIRS) [a18], and Magnetoencephalography (MEG) [a19] can also attain reasonable performance in ASD. An appropriate approach is to utilize machine-learning techniques alongside functional and structural data to collaborate with physicians in the process of accurately assessing ASD. In the field of ASD, applying machine learning methods generally entail two categories of traditional [a20] methods and DL methods [a21]. As opposed to traditional methods, much less work has been done on DL methods to explore ASD or design rehabilitation tools.

This study reviews ASD assesment methods and patients’ rehabilitation with DL networks. The outline of this paper is as follows. Section 2 is search strategy. Section 3 concisely presents the DL networks employed in the field of ASD. In section 4, existing computer-aided diagnosis systems (CADS) are reviewed using brain functional and structural data. In section 5, DL-based rehabilitation tools are introduced to support ASD patients. Section 6 discusses the reviewed papers. Section 7 reveals the challenges of ASD diagnosis and rehabilitation with DL. Finally, the paper concludes and suggests future work in section 8.

Ii Search Strategy

In this review, IEEE Xplore, ScienceDirect, SpringerLink, ACM, as well as other conferences or journals were used to acquire papers on ASD diagnosis using DL methods. Further, the keywords ”ASD”, ”Autism Spectrum Disorder” and ”Deep Learning” are used to select the papers. The papers are analyzed till June 03th, 2020 by the authors (AK, SN). Figure 1 depicts the number of considered papers using DL methods for the automated detection of ASD each year.

Fig. 1: Number of papers published every year.

Iii Deep Learning Techniques for ASD Diagnosis and Rehabilitation

Nowadays, DL algorithms are used in many areas of medicine including structural and functional neuroimaging. The application of DL in neural imaging ranges from brain MR image segmentation [a22], detection of brain lesions such as tumors [a23], diagnosis of brain functional disorders such as ASD [a65], and production of artificial structural or functional brain images [a24]

. Machine learning techniques are categorized into three fundamental categories of learning: supervised learning


, unsupervised learning


, and reinforcement learning

[a27], and a variety of DL networks are provided for each type. So far, most studies applied to identify ASD using DL have been based on supervised or unsupervised approaches. In Figure 2 illustrates generally employed types of families with DL networks with supervised or unsupervised networks to study ASD.

Fig. 2: Illustration of various types of deep learning methods.

Iv CADS-based deep learning techniques for ADS diagnosis by neuroimaging data

A traditional artificial intelligence (AI)-based CADS encompasses several stages of data acquisition, data pre-processing, feature extraction, and classification [a28, a29, a30, a146]. In these investigations [a31, a32, a33] existing traditional algorithms have been used for diagnosing ASD. In DL-based CADS, however, feature extraction, and classification are performed intelligently within the model. Also, due to the structure of DL networks, using large dataset to train DL networks and recognize intricate patterns in datasets is incumbent. The components of DL-based CADS for ASD detection are shown in Figure 3. It can be noted from the figure that, large and free databases are first introduced to diagnose ASD. In the second step, various types of pre-processing techniques are used on functional and structural data to be scrutinized. Finally, the DL networks are applied on the preprocessed data.

Fig. 3: Block diagram of CAD system using DL architecture for ASD detection.

Iv-a Neuroimaging ASD Datasets

Datasets are fed as input to the development of CADS and the power of CADS depends primarily on the affluence of the input data. To diagnose ASD, varied brain functional and structural datasets are availbale. The most complete free dataset available is ABIDE [a34] dataset with two subsets: ABIDE-I and ABIDE-II, which encompasses sMRI, rs-fMRI, and phenotypic data. ABIDE-I involves data from 17 international sites, yielding a total of 1112 datasets, including 539 from individuals with ASD and 573 healthy individuals (ages 64-7). In accordance with HIPAA guidelines and 1000 FCP / INDI protocols, these data are anonymized. In contrast, ABIDE-II contains data from 19 international sites, with a total of 1114 datasets from 521 ASDs individuals and 593 healthy individuals (ages 5-64). Also, preprocessed images of the ABIDE-I series called PCP [a35] can be freely downloaded by the researchers. The second recently released ASD diagnostic database is called NDAR, which comprises variant modalities, and more information is provided in [a36].

Iv-B Preprocessing Techniques

Neuroimaging data (especially functional ones) is relatively complicated structure, and if it is not pre-processed properly, it may affect the final diagnosis. Preprocessing of this data typically entails multiple common steps performed by different software as standard. Indeed, occasionally prepared pipelines are applied on the dataset to yield pre-processed data for future research. In the following section, preprocessing steps are briefly explained for fMRI data.

Iv-B1 Standard (Low-level) fMRI preprocessing steps

Low-level pre-processing of fMRI images normally has fixed number of steps exerted on the data, and prepared toolboxes are usually used to reduce execution time and yield better accuracy. Some of these reputable toolboxes contain FMRIB software libraries (FSL) [a37], BET [a38], FreeSurfer [a39], and SPM [a40]. Also, important and vital fMRI preprocessing incorporates brain extraction, spatial smoothing, temporal filtering, motion correction, slice timing correction, intensity normalization, and registration to standard atlas, which are summarized.

Brain extraction: the goal is to remove the skull and cerebellum from the fMRI image and maintain the brain tissue [a41, a42, a43].

Spatial smoothing: involves averaging the adjacent voxels signal. This process is persuasive on account of neighboring brain voxels being usually closely related in function and blood supply [a41, a42, a43].

Temporal filtering: the aim is to eliminate unwanted components from the time series of voxels without impairing the signal of interest [a41, a42, a43].

Realignment (Motion Correction): During the fMRI test, people often move their heads. The objective of motion correction is to align all images to a reference image so that the coordinates and orientation of the voxels be identical in all fMRI volumetric images [a41, a42, a43].

Slice Timing Correction: The purpose of modifying the slice time is to adjust the time series of the voxels so that all the voxels in each fMRI volume image have a common reference time. Usually, the corresponding time of the first slice recording in each fMRI volume image is selected as the reference time [a41, a42, a43].

Intensity Normalization: at this stage, the average intensity of fMRI signals are rescaled to compensate for global deviations within and between the recording sessions [a41, a42, a43]. Registration to a standard atlas: The human brain entails hundreds of cortical and subcortical areas with variant structures and functions, each of which is very time-consuming and complex to study. To overcome the problem, brain atlases are employed to partition brain images into a confined number of ROIs, following which the mean time series of each ROI can be extracted [a44]. ABIDE datasets exert a manifold of atlases, including Automated Anatomical Labeling (AAL) [a45], Eickhoff-Zilles (EZ) [a46], Harvard-Oxford (HO) [a47], Talaraich and Tournoux (TT) [a48], Dosenbach 160 [a49], Craddock 200 (CC200) [a50] and Craddock 400 (CC400) [a51] and more information is provided in [a52]. Table I provides complete information on preprocessing tools, atlases, and few other preprocessing information.

Fig. 4: Overall block diagram of 2D-CNN used for ASD detection.
Fig. 5: Overall block diagram of a 3D-CNN used for ASD detection.

Iv-B2 Pipeline Methods

Pipelines present preprocessed images of ABIDE databases. They embrace generic pre-processing procedures. Employing pipelines, distinct methods can be compared with each other. In ABIDE datasets, pre-processing is performed by four pipeline techniques: neuroimaging analysis kit (NIAK) [a53], data processing assistant for rs- fMRI (DPARSF) [a54], the configurable pipeline for the analysis of connectomes (CPAC) [a55], or connectome computation system (CCS) [a56]. The preprocessing steps carried out by the various pipelines are comparatively analogous. The chief differences are in the particular algorithms for each step, the software simulations, and the parameters applied. Details of each pipeline technique are provided in [a52]. Table I demonstrates the pipeline techniques used in autism detection investigation exploiting DL.

Iv-B3 High-level preprocessing Steps

High-level techniques for pre-processing brain data are important, and using them accompanying preliminary pre-processing methods can enhance the accuracy of ASD recognition. These methods are applied after the standard pre-processing of functional and structural brain data. These include sliding window (SW) [a65], data augmentation (DA) [a68], functional connectivity matrix (FCM) [a92, a93]

and fast Fourier transformation (FFT)

[a78]. Furthermore, some research utilized feature extraction [a106]

techniques and other feature selection methods. Precise information on assayed studies in Table

I is indicated in detail.

Iv-C Deep Neural Networks

Deep learning in various medical applications, including the diagnosis of ASD, has become extremely popular in recent years. In this section of the paper, the types of Deep Learning networks used in ASD detection are examined, which include CNN, RNN, AE, DBN, CNN-RNN, and CNN-AE models.

Iv-C1 Convolutional Neural Networks (CNNs)

In this discussion, the types of popular convolutional networks used in ASD diagnosis are surveyed. These networks involve 1D-CNN, 2D-CNN, 3D-CNN models, and a variety of pre-trained networks such as VGG.

1D and 2D-CNN

There are many spatial dependancies present in the data and it is difficult to extract these hidden signatures from the data. Convolution network uses a structure alike to convolution filters to extract these features properly and contribute to the knowledge that features should be processed taking into account spatial dependencies, and the number of network parameters are significantly reduced. The principal application of these networks is in image processing and due to the two-dimensional (2D) image inputs, convolution layers form 2D structures, which is why these networks are 2D convolutional neural network (2D-CNN). By transforming data, in to one-dimensional signals, the convolution layers’ structure also resembles the data structure

[a57]. In convolution networks, assuming that variant data sections do not require learning different filters, the number of parameters are markedly lessened and make it feasible to train these networks with more bounded databases [a21]. Figure 4 shows the block digram of 2D-CNN used for ASD detection.


By transforming the data into three dimensions, the convolution network will xalso be altered to a three-dimensional format (Figure 5). It should be noted that the manipulation of three dimensional CNN (3D-CNN) networks is less beneficial than 1D-CNN and 2D-CNN networks for diverse reasons. First, the data required to train these networks must be much large which conventionally such datasets are not utilizable and methods such as pre-training, which are extensively exploited in 2D networks, cannot be usedhere. Another reason is that with more complicated structure of networks, it becomes much tougher to fix the number of layers, and network. The 3D activation map generated during the convolution of a 3D CNN is essential for analyzing data where volumetric or temporal context is crucial. This ability to analyze a series of frames or images in context has led to the use of 3D CNNs as tools for action detection and evaluation of medical imaging. [a58].

Iv-C2 Deep Belief Networks (DBNs)

Although DBNs are not popular today as they used to be, and have been substituted by new models to perform various applications ( autoencoders for unsupervised learning , generative adversarial networks (GAN) for generative modes

[a59], variational autoencoders (VAE) [a60]), disregarding the restricted use of this network in this era, their influence on the advancement of neural networks cannot be overlooked. The use of these networks in this paper is related to the feature extraction without a supervisor or pre-training of networks. These networks serve as unsupervised, consisting of several layers after the input layer, which is shown in Figure 6. The training of these networks is done greedily and from bottom to top, in other words, each separate layer is trained and then the next layer is appended. After training, these networks are used for feature extraction method or a network with trained weights [a21].

Fig. 6: Overall block diagram of DBN used for ASD detection.

Iv-C3 Autoencoders (AEs)

Autoencoders (AEs) are more than 30 years old, and have undergone dramatic changes over the years to enhance their performance. But the overall structure of these networks has remained the same [a21].These networks consist of two parts: coder and decoder so that the first part of the input leads to coding in the latent space, and the decoder part endeavors to convert the code into preliminary data (Figure 7). Autoencoders are a special type of feedforward neural networks where the input is the same as the output. They compress the input into a lower-dimensional code and then reconstruct the output from this representation. The code is a compact “summary” or “compression” of the input, also called the latent-space representation. Various methods have been proposed to blockthe data memorizing by the network, including sparse AE (SpAE) and denoising AE (DAE) [a21].If the Autoencoder is properly trained, the coder layer can extract the featuresin unsupervised pre-training in this type of networks.

Fig. 7: Overall block diagram of an AE used for ASD detection.

Iv-C4 Recurrent Neural Networks (RNNs)

In convolution networks, a kind of spatial dependencies in the data are addressed. But interdependencies between data are not confined to this model. For example in time-series dependencies may be highly distant from each other, on the other hand, the long-term and variable length of these sequences results in that the ordinary networks do not perform well enough to process these data. To overcome these problems, RNN networks can be used. LSTM structures are proposed to extract long term and short term dependencies in the data (Figure 8). Another well-known structure called GRU is developed after LSTM, and since then, most efforts have been madeto enhance these two structures and make them resistant to challenges (eg GRU-D, [a61] is used to find the lost data).

Fig. 8: Overall block diagram of LSTM used for ASD detection.

Iv-C5 Cnn-Rnn

The initial idea in these networks was to utilize convolution layers to amend the performance of RNNs so that the advantages of both networks can be applied; CNN-RNN, on the one hand, makes it achievable to receive temporal dependencies with the relief of RNN, and on the other hand, it discovers the possibility of receiving spatial dependencies in data with the help of convolution layers [a62]. These networks are highly beneficial for analyzing time series with more than one dimension (such as video) [a63] but further to the simpler matter, these networks also yield the analysis of three-dimensional data so that instead of a more complex design of a 3D-CNN, a 2D-CNN with an RNN network is occasionally used. The superiority of this model is due to the feasibility of employing pre-trained models. Figure 9 demonstrates the CNN-RNN model.

Fig. 9: Overall block diagram of CNN-RNN for ASD detection.

Iv-C6 Cnn-Ae

In the construction of these networks, the principal aim and prerequisite have been to decrease the number of parameters. Just changing the network layers of convolution markedly lessens the number of parameters, combining AE with convolution structures also makes significant contribution. This helps to exploit more dimensional data and extracts more information from the data without changing the size of the database. Similar structures, with or without some modification, are widely deployed in image segmentation [a64], and likewise unsupervised network can be applied for network pre-training or feature extraction. Figure 10 depicts the CNN-AE network used for ASD detection. In Tables I and II, provide the summary of papers published on detection and rehabilitation of ASD patients using DL respectively.

Fig. 10: Overall block diagram of a CNN-AE for ASD detection.

V Deep learning techniques for ADS rehabilitation

Rehabilitation tools are employed in multiple fields of medicine and the main purpose is to help the patients to recover after the treatment. Various and multiple rehabilitation tools using DL algorithms have been presented. Rehabilitation tools used to help ASD patients using mobile, computer applications, robotic devices, cloud systems, and eye tracking, which will be discussed below. Also, the summary of papers published on rehabilitation of ASD patients using DL algorithm are shown in table II.

V-a Mobile and Software Applications

Facial expressions are a key mode of non-verbal communication in children with ASD and play a pivotal role in social interactions. Use of BCI systems provides insight into the user’s inner-emotional state. Valles et al. [a126] conducted research focused on mobile software design to provide assistance to children with ASD. They aimed to design a smart iOS app based on facial images according to Figure 11. In this way, people’s faces at different angles and brightness are first photographed, and are turned into various emoji so that the autistic child can express his/her feelings and emotionals. In the group’s major investigation [a126], Kaggle’s (The Facial Expression Recognition 2013) and KDEF (Kaggle’s FER2013 and Karolinska Directed Emotional Faces) databases were used to train the VGG-16 is established. In addition, the LEAP system has been adapted to train the model at the University of Texas. The research provides the highest rate accuracy of 86.44%. In another similar study, they achieved an accuracy of 78.32% [a124].

V-B Cloud Systems

Mohammadian et al. [a143] have proposed a new application of DL to facilitate automatic stereotypical motor movement (SMM) for identification by applying multi-axis inertial measurement units (IMUs). They have applied CNN to transform multi-sensor time series into feature space. An LSTM network is then combined with CNN to obtain the temporal patterns in SMM identification. Finally, they employed the classifier selection voting approach to combine an ensemble of the best base learners. After various experiments, the superiority of their proposed procedure over other base methods has been proven. Figure 12 shows the real-time SMM detection system. First, IMUs, which are wearable sensors, are used for data collection; the data can then be analyzed locally or remotely (using Wi-Fi to transfer data to tablets, cell phones, medical center servers, etc.) to identify SMMs. If abnormal movements are detected, an alarm will be sent to a therapist or parents.

V-C Eye Tracking

Wu et al. [a136]

proposed a model of DL saliency prediction for autistic children. They used DCN in their proposed paradigm, with a SM saliency map output. The fixation density map (FDM) is then processed by the single-side clipping (SSC) to optimize the proposed loss function as a true label along with the SM saliency map. Finally, they exploited an autism eye-tracking dataset to test the model. Their proposed model outperformed other base methods. Elbattah et al.


aimed to employ unsupervised machine learning to detect clusters in ASD. Their key goal was to learn eye-tracking scan paths based on visual representation clusters. The first step involved the visualization of the eye-tracking path, and the images captured from this step were fed to an autoencoder to learn the features. Using autoencoder features, clustering models are developed using the K-Means algorithm. Their method performed better than other state-of-art techniques.

Fig. 11: Block diagram of ios application for ASD rehabilitation.
Fig. 12: Cloud system design for ASD rehabilitation.
Work Datasets Neuroimaging Modalities Number of Cases Pipelines Image Atlas Preprocessing Toolbox High level Preprocessing Inputs DNN DNN Toolbox DNNs Number of Layers Classifier K fold Performance Criteria (%)
[a65] Clinical acquisition T-fMRI 82 ASD 48 HC NA MNI152 BET SW single mean channel input NA 2CC3D 17 Majority Voting No F1-Score = 89
residual -fMRI FSL single std channel input
combined 2-channel input
[a66] Clinical acquisition T-fMRI 82 ASD, 48 HC NA AAL NA SVE, C-SVE, H-SVE, Monte Mean channel sequence NA 2CC3D 14 Sigmoid NA Acc = 97.32
Carlo Approximation STD channel
[a67] HCP dataset in the HAFNI project T-fMRI 68 subjects with 7 tasks and 1 rs-fMRI data NA NA FSL dictionary learning and sparse coding functional RNSs maps NA 3D-CNN 8 Softmax NA Acc = 94.61
[a68] Clinical acquisition T-fMRI(T1-weighted MP-RAGE s-MRI, BOLD T2*-weighted fMRI sequence) 21 ASD 19 HC NA AAL FSL DA ROIs time-series Keras LSTM 7 Sigmoid 10 Acc=69.8
[a69] Different datasets T-fMRI 1711 ASD 15903 HC A AAL SPM Wavelet and different Techniques FCMs Keras CNN 14 Softmax NA Ensemble AUROC=0.92 Ensemble Acc=85.19
rs-fMRI SpeedyPP
[a70] Clinical acquisition T-fMRI 82 ASD, 48 HC NA AAL Neurosynth SW Original fMRI sequence NA 2CC3D 16 Sigmoid NA Acc= 87.1
corrupting strategy mean-channel sequence
[a70] ABIDE-I rs-fMRI 41 ASD 54 HC NA AAL FSL prediction distribution analysis std-channel sequence NA 2CC3D 16 Sigmoid NA Acc= 85.3
corrupt a ROI of the original image
[a71] ABIDE-I rs-fMRI 379 ASD, 395 HC CPAC All atlases ABIDE NA Connectivity matrix calculation concatenating voxel-level maps of connectivity fingerprints NA 3D-CNN 7 Sigmoid 10 Acc= 73.3
ABIDE II 163 ASD, 230 HC
[a72] ABIDE-I rs-fMRI 505 ASD 530 HC CPAC CC-200 NA FCM, DA Masking correlations PyTorch AE NA SLP 10 Acc=70.1 Sen=67.8 Spec=72.8
[a73] ABIDE-I rs-fMRI 872 subjects CPAC HO Nilearn NA Raw images NA G-CNNs 5 Softmax 10 Acc=70.86
[a74] ABIDE-I rs-fMRI 474 ASD 539 HC CCS AAL NA FCM Functional connectomes NA BrainNetCNN with proposed layers 15 Softmax 5 Acc= 68.7 Sen= 69.2 Spe= 68.3
[a75] ABIDE-I rs-fMRI 13 ASD 22 HC NA AAL SPM8 Qcut, NMI statistic matrix Pearson Correlation coefficient Matrix NA DAE NA NA NA Acc=54.49
[a76] ABIDE rs-fMRI 11 ASD 16 HC NA NA FSL convert NII files to PNG images preprocessed PNG images Caffe LeNet-5 Standard Softmax NA Acc=100 Sen=99.99 Spec=100
[a77] ABIDE-I rs-fMRI 55 ASD 55 HC NIAK AAL NA FCM, feature selection whole-brain FCPs NA Multiple SAEs 4 Softmax regression 5 Acc=86.36
[a78] ABIDE-I rs-fMRI 54 ASD 62 HC NA NA SPM8 Dimension Reduction images with 95 × 68, 79 × 68, and 79 × 95 dimensions, around the x, y, and z axes Keras with Theano backend MCNNEs 9 Binary SR 10 Acc=72.73 Sen=71.2 Spec=73.48
156 ASD 187 HC FFT
[a79] ABIDE rs-fMRI 542 ASD 625 HC CPAC All atlases NA Creating stochastic parcellations by Poisson Disk Sampling gray matter mask parcellations NA 3D-CNN 6 Various Methods 10 Acc=72
[a80] ABIDE-I rs-fMRI 465 ASD 507 HC DPARSF AAL NA FCM edge weights of subjects’ brain graph Keras VAE 3 NA NA NA
[a81] ABIDE-I rs-fMRI 539 ASD 573 HC CCS Craddock 200 Neurosynth DA mean time courses from ROIs Keras LSTM 5 Sigmoid 10 Acc=68.5
[a82] ABIDE Rs-fMRI, phenotypic info 505 ASD 530 HC NA CC200 DPABI Slicetiming, spatial standardization, smoothing, filtering, removing covariates, FCM, AE-MKFC 4005-dimensional eigenvector NA SAE 3 Clustering NA Acc=61 NMI=3.7 F-measure=60.2
[a83] ABIDE rs-fMRI 42 ASD 42 HC NA NA FSL Independent components (time course, power spectrum and spatial map) time courses of each subject NA SAE 9 Softmax 21 Acc=87.21 Sen=89.49 Spec=83.73
[a84] ABIDE-I rs-fMRI NY site CCS AAL Neurosynth DA fMRI ROI time-series, functional communities Keras LSTM 6 Sigmoid 10 Acc=74.8
UM site
US site
UC site
[a85] ABIDE-I rs-fMRI 408 ASD 401 HC CPAC HO FSL NA 3 different FCM+ Demographic data Keras DANN 25 Sigmoid 10 Acc=73.2 Sen=74.5 Spec=71.7
[a86] ABIDE rs-fMRI at least 60 subjects CCS AAL FMRIB’s linear & nonlinear image regist- ration tools DTL-NN framework: offline learning, Transfer Learning FCM using Pearson’s correlation FC patterns NA SSAE 4 Softmax regression 5 Avg Acc= 67.1 Avg Sen=65.7 Avg spec=68.3 AUC=0.71
[a87] ABIDE I+II rs-fMRI 993 ASD 1092 HC NA AAL FAST NA Mean Time-Series within each ROI NA 1D-CNN 5 Softmax 10 Acc=68
Schaefer-100 BET
Schaefer-400 FAST
[a88] ABIDE-I rs-fMRI 529 ASD 573 HC All pipelines NA NA Single Volume Image Generator Glass Brain and Stat Map Images Keras 4 Deep Ensemble Classifier techniques (CNN) 16 Sigmoid NA Acc=87 F1-score=86 Recall=85.2 Precision=86.8
[a89] ABIDE-II rs-fMRI 303 ASD, 390 HC NA NA FSL NA 1D time series from voxels NA 1D-CAE 14 NA NA Acc= 65.3
[a90] ABIDE rs-fMRI 40 ASD, 40 HC CCS NA NA Threshold segmentation WM, GM, CSF NA AlexNet Standard Softmax NA Acc=82.61
[a91] ABIDE rs-fMRI Whole dataset All pipelines parcellated into 200 regions NA DA using SMOTE and graph network motifs, FCM calculation upper triangle part of the correlation matrix NA ASD- DiagNet Proposed SLP NA Acc=82 Sen=79.1 Spec=83.3
[a92] ABIDE-I rs-fMRI 12 ASD 14 HC C-PAC SCSC NA Time series extraction from different regions, connectivity matrix, SMOTE algorithm FCM PyTorch Auto- ASD- Network Proposed SVM 5 Acc=80 Sen=73 Spec=83
[a93] ABIDE-I rs-fMRI 505 ASD 530 HC CPAC CC400 NA FCM computation FCM NA CNN 20 MLP 10 Acc=70.20 Sen=77.00 Spec=61.00
[a94] ABIDE rs-fMRI 505 ASD 530 HC NA NA NA Connectivity matrix converting connectivity matrix to one

dimensional vector

NA 1D CNN -AE 7 Softmax NA Acc=70 Sen=74 Spec=63
[a95] ABIDE-I rs-fMRI 539 ASD 573 HC NA NA FreeSurfer NA single 3D image Theano 3D-FCNN 13 Softmax 6 Mean DSC=91.56 Mean MHD=14.05
[a96] ABIDE-I rs-fMRI 501 ASD 553 HC DPARSF AAL NA converting FCM to one dimensional vector 1000 features selected by the SVM-RFE NA SSAE 3 Softmax different folds Acc=93.59 Sen=92.52 Spec=94.56
[a97] ABIDE-I rs-fMRI 100 ASD 100 HC NA NA FSL online dictionary learning and sparse representation techniques, generating spatial overlap patterns 4D matrix with 150 3D network overlap maps Theano 3D-CNN 14 NA 10 Average Acc= 70.5 Average Sen= 74 Average Spec= 67
[a98] ABIDE-I rs-fMRI & phenotypic 529 ASD 571 HC CPAC HO FSL Population graph construction, Feature Selection Strategies (RFE, PCA, MLP, AE) Population graph Scikit -learn GCN 11 Softmax 10 Acc=80.0
[a99] ABIDE-I rs-fMRI & phenotypic 403 ASD 468 HC CCS CC200 NA DA mean time-series from ROIs Keras LSTM 6 Sigmoid 10 Acc=70.1
[a100] ABIDE-I rs-fMRI & S-MRI & phenotypic (T1 weighted) 505 ASD 530 HC CPAC Craddock 200 NA Flattening Functional connectivity matrix one-dimension vector NA two SdAE + MLP NA Softmax 10 Acc=70 Sen=74 Spec=63
[a101] Clinical acquisition rs-fMRI 75 qualified subjects NA NA FSL Extraction of fetal brain fMRI data, SW Mean time series of 3D fMRI volumes PyTorch 3D-CNN 7 Sigmoid NA F1-score=84 AUC=91
Fetal BOLD fMRI Brainsuite
[a102] ABIDE-I rs-fMRI 116 ASD 69 HC NA AAL SPM8 Segmentation, average mean time series of each ROI Rs-fMRI + GM+WM data fusions Theano DBN 6 LR 10 Acc=65.56 Sen=84 Spec=32.96
[a103] IMPAC rs-fMRI 418 ASD 497 HC NA All atlases NA flattening FCM from Rs-fMRI, features extraction from S-MRI FCM vector Keras Different Networks 8 Various Methods 3 AUC= 80
anatomical features Tensor-
s-MRI Combination of both Flow
anatomical and connectivity Caffe
FCM vector
[a104] ABIDE-I rs-fMRI 368 ASD 449 HC CPAC AAL Freesurfer FCM computation and flattening into a 1D vector, Fisher Score 1D vector NA ensemble of 5 stacked AEs and MLP for classification 31 label fus- ion using the average of softmax probabilities 10 Acc= 85.06 Sen= 81 Spec= 89
s-MRI CC200
[a105] NDAR rs-fMRI 61ASD 215 HC NA NA NA data-driven landmark discovery algorithm, Patch Extraction 50 patches extracted from 50 landmarks NA Multi- Channel CNN 13 Softmax 10 Acc=76.24
MRI (T1-weighted
MR images)
[a106] NDAR All modalities 78 ASD 124 HC NA Proposed atlas FSL PICA, PSD feature extraction PSDs of 34 components NA 34 SAEs Each SAE has 2 layers PSVM NA Acc=88.5 Sen=85.1 Spec=90.4
[a107] NDAR T1-weighted 60 ASD, 211 HC NA NA in-house 3D Patches extraction Patch size 16×64×16 NA DDUNET 11 NA 5 NA
MR images tools blocks
[a108] ABIDE-I s-MRI 21 ASD, 21 HC NA NA FSL segmentation, Shape feature extraction CDF values of features NA SNCAE NA Softmax NA Acc=96.88
NDAR/Pitt 16 ASD, 16 HC iBEAT
[a109] ABIDE-I s-MRI 78 ASD 104 HC NA Destrieux FreeSurfer construction of individual

network, F-score

3000 top features NA SAE 3 Softmax 10 Acc=90.39 Sen=84.37 Spec=95.88
[a110] HCP s-MRI 1113 HC NA Desikan– Killia FreeSurfer Normalization, apply one-hot coding Preprocessed images Tensor- DEA 3 NA 10 AUC-ROC=63.9
ABIDE-I 83 ASD, 105 HC Flow
ABIDE s-MRI 1112 subjects NA NA SPM12 NA 32 slices along each axial, coronal, and sagittal Keras DCNN 17 Sigmoid NA Acc=84 Sen=77 Spec=85
CombiRx 1112 subjects
[a112] ABIDE-II MRI NA NA DKT FreeSurfer segmentation coronal, axial and sagittal PyTorch FastSur- Proposed Softmax NA NA
2D slices fer CNN
[a113] ABIDE-I MRI 500 ASD 500 HC NA HO cortical and subcortical structural atlas FSL GABM method, new chromosome encoding scheme Preprocessed MRI scans NA 3D-CNN 11 Softmax 5 Acc=70
[a114] Clinical acquisition MRI 48 HC NA NA FreeSurfer sparse annotations, DA image patch Caffe 3D-CNN 18 Softmax NA Acc=91.6 ROC=94.1
[a115] ABIDE-I rs-fMRI 270 ASD 305 HC C-PAC Brain- Netome atlas (BNA) NA Filtering, calculating mean time series for ROIs using BrainNetome atlas (BNA), normalization mean time series data stacked across ROIs NA CNN- GRU 14 Sigmoid 5 Acc=74.54 Sen=63.46 Spec=84.33
[a116] Clinical acquisition fNIRS 25 ASD 22 HC No No No transformation of the time series to three variants PM, GM, SM Keras 1D CNN- LSTM NA Bagging NA Acc=95.7 Sen=97.1 Spec=94.3
[a117] Clinical acquisition fNIRS 25 ASD 22 HC No No No SW 3D tensor NA CGRNN 7 NA NA Acc=92.2 Sen=85.0 Spec=99.4
converted into the 3D tensor
[a118] Different datasets MRI NA NA Various Methods FreeSurfer geometric DA 3D cortical mask Theano ConvNet U-Nets NA 8 NA
[a119] ABIDE I+II rs-fMRI 620 ASD 2085 HC C-PAC HO FSL performed an automatic quality control, visually inspection, 9 temporal summary measures, mean and STD of the summary measures, normalization, Occlusion of Brain Regions each summary measure NA MM- ensemble (3D-CNN) 7 Majority voting 5 Acc=64 F1-scre=66
[a120] ABIDE-I rs-fMRI 184 ASD 110 HC C-PAC NA NA Down Sampling raw 4D volume NA 3D-CNN C-LSTM 21 Softmax 5 Acc=77 F1-score=78
[a121] ABIDE-I rs-fMRI and T1-weighted images 403 ASD 468 HC NA 264 ROIs AFNI FSL MATLAB Connectivity matrix, feature extraction (tDifferent Features) Normalized features NA AE 7 DNN 10 Acc= 79.2 AUC= 82.4
[a122] ABIDE-I rs-fMRI 505 ASD 530 HC C-PAC CC200 NA FCM vector of FC measures PyTorch CapsNet Standard k-means clustering 10 Acc=71 Sen=73 Spec=66
TABLE I: Summmary of articles published using DL methods for ASD detection.
Work Datasets Type of Applications Number of Cases Preprocessing Inputs DNN DNN Toolbox DNNs Number of Layers Classifier K fold Performance Criteria (%)
[a123] OSIE 20 ASD 19 HC HFM Construction, Filtering Normalizing, DA HFMs, Natural Scene Images Caffe VGGNeT 50 Softmax 13 Acc=85 Sen=80 Spec=89
[a124] KDEF Facial Expression Recognition 70 Individuals DA RGB Images (562×762) Keras DCNN 44 Softmax NA Acc=78.32
[a125] Clinical Acquisition Detect Audio Regimes That Directly Estimate ASD Severity Social Affect scores 33 ASD MFCC Spectrograms 32 Spectrograms NA Noisemes Network Standard Network Synthetic RF Acc=84.7
DiarTK Diarization
[a126] Kaggle’s Facial Expression Recognition NA No 48×48-Pixel Images Keras DCNN 44 Softmax NA Acc=86.44
FER2013 (TensorFlow
KDEF Backend)
[a127] SALICON ASD Classification 14 ASD 14 HC SalGAN Model, Feature Extraction Sequence of Image Patches NA SP-ASDNet 11 NA NA Acc=57.90 Rec=59.21 Pre=56.26
[a128] BigFaceX Facial Expression Recognition 196 Subjects SW, Merge in the Channel Dimension, DA 5-channel Sub-Sequence Stacks within a Specific Time Window Keras TimeConvNet PreTrain Nets Softmax NA Acc=97.9
[a129] Different Datasets Suitable Courseware for Children with ASD NA Interactive and Intelligent Chatbot , NLP, Visual Aid Different Inputs NA Different Nets NA NA NA NA
[a130] Camera Images Estimating Visual Attention in Robot-Assisted Therapy 6 ASD and ID Resizing, Frame Extraction, Visual Inspection 5 Facial Landmarks - 36 HOG Descriptors NA R-CNN VGG-16 K-NN 10 Acc= 88.2 Pre=83.3 Sen=83.0 Spec=87.3
Face Detection (Viola–Jones), Feature Extraction (HOG Descriptors) MTCNN Cascaded CNNs Architecture Naïve
[a131] Sensor Automatic SMM detection 6 ASD Resampling, Filtering, SW Time-Series of Multiple Sensors Keras CNN-LSTM 13 Majority NA NA
Data 5 HC Voting
[a132] KOMAA Facial Expression Recognition 55 subjects Segmentation, Different

Features, Z-scores

Greedy Forward Feature Selection NA CNN 9 SVM NA Acc=96
[a133] Story-Telling Narrative Corpora ASD Classification 31 ASD 36 HC DA, ChineseWord2Vec 32-Dimensional Word Vector NA LSTM 1 Coherence Representation of LSTM Forget Gate NA Acc=92
[a134] Ext-Dataset (video dataset) ASD Classification using Eye Tracking 136 ASD 136 HC TLD Method, Accumulative Histogram Computation Angle Histogram, Length Histogram and Fused Histogram, Keras LSTM 4 NA 10 Acc=92.6
[a135] MIT1003 Predicting Visual Attention of Children with ASD. 300 Images NA Raw Images NA DCN 26 NA NA SIM=67.8
[a136] Scan Path Data, ASD Classification 14 ASD 14 HC DA Methods Image, Data Points Pytorch ResNet18) Standard Softmax NA Acc=55.13
Including Location Sen=63.5
and Duration Spec=47.1
[a137] UCI Machine Learning Repository ASD classification Number of Instances= 704 Different Methods Preprocessed Data NA CNN 7 NA NA Acc=99.53 Sens=99.39 Spec=100
[a138] Eye Tracking Scanpath ASD Classification 29 ASD 30 HC Visualization of Eye-Tracking Scanpaths Scaling Down, PCA 100*100 Image Keras, Scikit- Learn AE 8 K-Means Clustering NA Silhouette score=60
[a139] Video Data Engagement Estimation of Children with ASD During a Robot-Assisted Autism Therapy 30 children NA cropped face images (256 *256) Keras CultureNet R-CNN + ResNet50+ 5FC layers Softmax Na ICC=43.35 CCC=43.18 PC=45.17
[a140] YouTube ASD Dataset Modeling Typical and Atypical Behaviors in ASD Children 68 video Clips Different Methods Sequences of Individual Frames at a Rate of 30 fps openCV, Caffe DCNN NA DT 5 Avg Pre=73 Avg Recall=75 Avg Acc=71
[a141] Video Dataset Behavioral Data Extracted from Video Analysis of Child-Robot Interactions. 5 ASD 7 HC Segmentation, Upper Body tracking, Laban Movement Analysis to Drive Weight, Different features 3 Movement Features with 68 Facial Key-Points NA CNN 10 Softmax NA Acc=88.46 Pre=89.12 Recall=88.53
[a142] Video Dataset Developing Automatic SMM Detection Systems 6 ASD Resampling, Filtering, SW, Data Balancing, Normalizing Time-Series of Multiple Accelerometer Sensors Deeppy Library CNN 8 SVM NA F1-score=95
[a144] ASD Screening Autism Screening 513 ASD 189 HC

Cleaning Missing Values and Outliers,

Visualization, Identity Mapping

The Embedded Categorical Variables

are Concatenated with Numerical Features as New Feature Vectors
NA DENN 4 Sigmoid NA Acc=100 Spec=99 Sen=100 F1-score=99
[a145] ASD Screening Datasets Classification of Adults with ASD Handling of Missing Values, Variable Normalized Variables Keras DNN 7 Sigmoid NA Acc=99.40
Reduction, Normalization, and Sen=97.89
Label Encoding Spec=100
TABLE II: Summary of papers published onrehabilitation of ASD patients using DL algorithm.

Vi Discussion

In this study, we performed a comprehensive overview of the investigations conducted in the scope of ASD diagnostic CADS systems as well as DL based rehabilitation tools for ASD patients. In the field of ASD diagnosis, numerous papers have been published using functional and structural data as well as rehabilitation tools, as illustrated in table III in the appendix. A variety of DL toolboxes have been proposed for implementing deep networks. In tables I and I the types of DL toolboxes utilized for each study are are depicted in Figure 13. The Keras tool is used in majority of the studies due to its simplicity and offers consistent high level aplication programming interface (APIs) to build the models.

Fig. 13: Number of deep learning tools used for the diagnosis and rehabilitation of ASD patients in reviewed papers.

Number of of deep learning networks used for the ASD detection in the reviewed works is shown in Figure 14. Among the various DL architectures, CNN is found to be more popular as it has achieved more promising results compared to other deep methodologies. The autoencoder, as well as RNN, have yielded favorable results.

Fig. 14: Number of of deep learning networks used for ASD detection in the reviewed works.

The number of various classification algorithms used in DL networks are shown in Figure 15. One of the best and most widely used is the Softmax algorithm (Tables I and II). It is most popular as it is differentiable in the entire domain and computationally less expensive.

Fig. 15: Illustration of number of various algorithms used for the detection of ASD in deep learning.

Vii Challenges

Some of the most substantial challenges in ASD diagnosis scope have been addressed using DL-based techniques in this section, which comprise database and algorithmic problems. There are only two-class brain structural and functional datasets (ASD and healthy) available in the public domain. Hence, researchers are not able to broaden their investigation to other types of ASD disorders. One of the cheapest and most pragmatic functional neuro-screening modalities for diagnosis are ASD are EEG, and fNIRS. But unfortunately the deficiency of freely available datasets has resulted in little research in this area. Another obstacle is that multi-modality databases such as EEG-fMRI are not available to researchers to evaluate the effectiveness of incorporating information in different imaging modalities to detect ASD. However, although fMRI and sMRI data are ubiquitous in the ABIDE dataset, the results of merging these structural and functional data for ASD diagnosis with DL have not yet been investigated. Another problem grappling the researchers is designing the DL-based rehabilitation systems with hardware resources. Nowadays, researcers are allocated with assistive tools such as Google Colab to improve the processing power, the problems still prevail when implementing these systems in the real world scenarios.

Viii Conclusion and Future Works

ASD is typically characterized by social disorders, communication deficits, and stereotypical behaviors. Numerous computer-aided diagnosis systems and rehabilitation tools have been developed to assist patients with autism disorders. In this survey, research on ASD diagnosis applying DL and functional and structural data were first assessed.The researchers have taken advantage of deep CNNs, RNNs, AEs, and CNN-RNN networks to improve the performance of their system. Boosting the accuracy of the system, the capability of generalizing and adapting to differing data and real-world challenges, as well as reducing the hardware power requirements to the extent that the final system can be utilized by all are the principal challenges of these systems. To enhance the accuracy and performance of CADS for ASD detection in the future, deep reinforcement networks (RL) or GANs can be exploited. Scarcity of data is always aparamount problem in the medical field that can be resolved relatively with the help of these deep GANs.

Many researchers have proposed various DL-based rehabilitation tools to aid the ASD patients. Designing a reliable, accurate, and wearable low power consumption DL algorithm based device is the future tool for ASD patients. The achievable rehabilitation tool is to wear smart glasses to help the children with ASD. These glasses with the built-in cameraswill acquire the images from the different directions of environment. Then the DL algorithm processing these images and produces meaningful images to the ASD child to better communicate with their surroundings.

Appendix A

Table III shows details about all the works reviewed in this study.


MB is supported by a NHMRC Senior Principal Research Fellowship (1059660 and 1156072). MB has received Grant/Research Support from the NIH, Cooperative Research Centre, Simons Autism Foundation, Cancer Council of Victoria, Stanley Medical Research Foundation, Medical Benefits Fund, National Health and Medical Research Council, Medical Research Futures Fund, Beyond Blue, Rotary Health, A2 milk company, Meat and Livestock Board, Woolworths, Avant and the Harry Windsor Foundation, has been a speaker for Astra Zeneca, Lundbeck, Merck, Pfizer, and served as a consultant to Allergan, Astra Zeneca, Bioadvantex, Bionomics, Collaborative Medicinal Development, Lundbeck Merck, Pfizer and Servier - all unrelated to this work.

Author Network Details for Deep Networks Dropout Classifier Optimizer Loss function
[a65] 2CC3D CNN Layers (6) + Pooling Layers (4) + FC Layers (2) 2 (rate=0.5) Sigmoid NA BCE
2 (rate=0.65)
[a66] 2CC3D CNN Layers (6) + Pooling Layers (4) + FC Layers (3) NA Sigmoid NA NA
[a67] 3D-CNN CNN Layers (2) + LReLu Actication + Pooling Layers (1) + FC Layers (1) 3 (rate=NA) Softmax SGD MNLL
[a68] LSTM LSTM Layers (1) + Pooling Layers (1) + FC Layers (3) 1 (rate=0.5) Sigmoid Adadelta MSE
[a69] CNN

CNN Layers (2) + ReLu Activation + BN Layers (4) + FC Layers (3)

2 (rate=0.3) Softmax Adam NA
2 (rate=0.7)
[a70] 2CC3D CNN Layers (6) + Pooling Layers (4) + FC Layers (2) 2 (rate=NA) Sigmoid NA NA
[a71] CNN CNN Layers (2) + ELU Activation + Pooling Layers (2) + FC Layers (2) NA Sigmoid SGD NA
[a72] AE Standard AE with Tanh Actication NA SLP NA MSE
[a73] G-CNN Proposed G-CNN with 3 Layer CNN (rate=0.3) Softmax Adam NA
[a74] BrainNetCNN with Element-wise layer (1) + E2E layers (2) + E2N layer (1) + N2G layer (1) 5 (rate=0.5) Softmax Adam Proposed Loss Function
proposed layers + FC layers (3)+ Leaky ReLU activation+ htan activation 1 (rate= 0.6)
[a75] DAE Standard DAE NA NA NA Proposed Loss function
[a76] LeNet-5 Standard LeNet-5 Architecture NA Softmax NA NA
[a77] SAE SAE with LSF Activation NA Softmax LBFGS NA
[a78] MCNNEs CNN Layers (3) + ReLU Activation + Pooling Layers (3) + FC Layers (1) 1 (rate=0.5) Binary SR Adam BCE
[a79] 3D-CNN CNN Layers (2) + ELU Activation + Pooling Layers (2) + FC Layers (3) NA Sigmoid SGD BCE
Adam MSD
[a80] VAE VAE with 3 Layers NA NA Adadelta Proposed loss function
[a81] LSTM LSTM Layers (1) + Pooling Layers (1) + FC Layers (1) 1(rate=0.5) Sigmoid Adadelta BCE
[a82] SAE SAE Layers (3) + Sigmoid Activation NA Clustering Proposed Opt. NA
[a83] SAE SAE Layers (8) + Sigmoid Activation NA SR L-BFGS MSE
[a84] LSTM LSTM Layers (2) + Pooling Layers (1) + FC Layers (2) NA Sigmoid Adam BCE
[a85] Multichannel DANN 3 MLP (1 dropout layer and 4 dense layers) + Self-attention (3) + Fusion (3) 1 (rate=NA) Sigmoid NA CE
+ Aggregation layer + dense layer (1) + relu, elu, tanh activations
[a86] SSAE 3 SSAE Layers NA Softmax scaled conjugate Proposed loss function
gradient descent
[a87] 1D-CNN CNN Layers (1) + Pooling Layers (1) + FC Layers (1) (rate=0.2) Softmax Adam NA
[a88] CNN CNN layers (6) + pooling layers (4) + BN layers (2) + FC layers (2) 1 (rate=0.25) Sigmoid Adam Propose loss function
[a89] 1D CAE-CNN Encoder (4 layers) + Decoder (4 layers) + CNN layers (2) + pooling layers (2) + FC layers (2) NA NA NA NA
[a90] AlexNet Standard AlexNet Architecture NA Softmax NA CE
[a91] ASD-DiagNet Proposed DiagNet NA SLP NA NA
[a92] Auto-ASD-Network Proposed Auto-ASD-Network NA SVM NA NLLF
[a93] CNN CNN layers (7) + Pooling layers (7) + FC layers (3) 1 (rate=0.25) MLP NA NA
[a94] 2 SdAE-CNN Proposed SDAE-CNN with 7 Layes CNN NA Softmax NA NA
[a95] 3D-FCNN CNN Layers (9) + PReLU Activation + FC Layers (3) NA Softmax SGD CE
[a96] SSAE 2 Layers SSAE NA Softmax NA NA
[a97] 3D-CNN CNN layers (7) + Pooling Layers (3) + FC Layers (2) + log-likelihood activation 2 (rate=0.2) NA SGD MNLL
[a98] GCN GCN with ReLU and Sigmoid Actication (rate=0.3) Softmax NA CE
AE SAE wth Tanh Activation MSE
[a99] LSTM Proposed Deep Nework (rate=0.5) Sigmoid Adadelta BCE
[a100] 2 SdAE-MLP Proposed 2-SDAE-MLP Network NA Softmax NA MSE
[a101] 3D-CNN CNN Layers (2) + ReLU Activation + Pooling Layers (2) + FC Layers (2) NA Sigmoid SGD BCE
[a102] DBN DBN with 5 Hidden Layers NA LR NA NA
[a103] FeedFWD Dense layers (5) + LReLU activation 3 (rate= NA) NA Adam BCE
[a104] ensemble of 5 SAE and MLP 5 [ AE (3) + MLP (2)] + softmax activation 5 (rate=NA) averaging the NA NA
[a105] Multi-Channel CNN CNN Layers (5) + ReLU Activation + Pooling Layers (2) + FC Layers (5) NA Softmax NA CE
[a106] 34 SAEs 34 [ SAE network (2)] NA PSVM L-BFGS NA
[a107] DDUNET Proposed DDUNET with 11 blocks and ReLU achtivation (rate=0.1) NA SGD CE
[a108] SNCAE Proposed SNCAE Newtork NA Softmax NA NA
[a109] SpAE SpAE with 2 Networks NA Softmax NA MSE
[a110] DAE AE (3) + SELU Activation NA NA Adam Sum of MSE + 2 CE + CC
[a111] DCNN CNN Layers (6) + ReLU Activation + Pooling Layers (6) + FC Layers (4) NA Sigmoid Adam BCE
[a112] FastSurfer CNN Proposed FastSurfer CNN Network NA Softmax Adam Logistic & Dice Losses
[a113] 3D-CNN CNN Layers (3) + ReLU Activation + Pooling Layers (3) + FC Layers (2) 2 (rate=0.5) Softmax Adadelta CE
[a114] 3D-UNET DCNN Layers (7) + ReLU Activation + Pooling Layers (2) + BN Layers (6) 2 (rate=0.5) Softmax SGD weighted CE
[a115] CNN-GRU CNN Layers (4) + GRU Layers (2) + ReLU Activation + Pooling Layers (2) + FC Layers (5) NA Sigmoid Adam BCE
[a116] 1D CNN - LSTM Proposed 1D-CNN LSTM with ReLU Activation (rate=0.2) Softmax Adam CCE
[a117] CGRNN CNN layers (3) + ReLU activation + Pooling layers (1) 1 (rate=0.5) NA Adam BCE
+ GRU layers (1) + sigmoid activation + FC layer (1)
[a118] ConvNet variation of the U-net convolutional architecture NA NA ADAM Proposed Loss function
[a119] 3D-CNN CNN Layers (2) + ELU Activations + Pooling Layers (2) + FC Layers (2) NA Sigmoid SGD BCE
[a120] 3DCNN C-LSTM CNN Layers (8) + Conv-Bi LSTM Layers (2) + Sigmoid Activation (for LSTM) 8 (rate=0.2) Softmax Adam CE
+ Pooling Layers (1) + FC Layers (1)
[a121] AE Proposed AE with 7 Layers NA DNN NA NA
[a122] CapsNets Standard Architecture NA K-Means Clustering Adam Proposed loss function
[a123] VGGNets + ASDNet CNN Layers (27) + ReLU Activation + Pooling Layers (10) + FC Layers (6) 6 (rate=0.5) Softmax SGD CE
[a124] DCNN CNN Layers (7) + activation+ Pooling Layers (13) + FC Layers (3) + BN Layers (10) 7 (rate=0.25) Softmax SGD NA
3 (rate=0.5)
[a125] Noisemes net Standard networks NA RF NA NA
DiarTK Diarization net
[a126] DCNN CNN Layers (7) + ELU Activation + Pooling Layers (13) + FC Layers (3) + BN Layers (10) 7 (rate=0.25) Softmax SGD NA
3 (rate=0.5)
[a127] SP-ASDNet CNN Layers (2) + LSTM Layers (2) + Pooling Layers (3) + FC Layers (2) 2 (rate=NA) NA Adam BCE
[a128] TimeConvNet convolutional spatiotemporal encoding layer+ backbone convolutional NA Softmax Adam CCE
neural network architecture (mini-Xception, ResNet20, MobileNetV2)
[a129] Different Networks Proposed structure NA NA NA NA
[a130] RCNN VGG-16 NA K-NN NA NA
MTCNN cascaded CNNs architecture Naïve
[a131] CNN-LSTM CNN Layers (3) + LSTM Layers (1) + ReLU Activation 1 (rate=0.5) Softmax SGD NA
+ Pooling Layers (3) + FC Layers (3) 1 (rate=0.2)
[a132] CNN CNN Layers (4) + Pooling Layers (2) + FC Layers (2) NA Softmax NA NA
[a133] LSTM LSTM layer (1) NA coherence representation NA NA
[a134] LSTM LSTM Layers (3) + Sigmoid Activation + FC Layers (1) NA NA NA CE
[a135] DCN CNN Layers (17) + Pooling Layers (3) + deconvolution layers (3) + learned priors (3) NA NA NA Proposed loss Function
[a136] Pretrained resnet18 Standard ResNet-18 Architecture Standard Standard Adam BCE
[a137] CNN CNN Layers (2) + ReLU Activation + Pooling Layers (2) + FC Layers (2) 1 (rate=0.5) NA Adam BCE
[a138] AE AE with 8 layers NA K-Means Clustering NA NA
[a139] CultureNet Faster R-CNN + modified ResNet50 + 5FC layers NA Softmax Adelta Proposed loss function
[a140] DCNN Proposed DCNN Architecture with Different Layers NA Decision Tree (DT) Manual Optimization NA
[a141] CNN CNN Layers (2) + ReLU Activation + FC Layers (3) 4 (rate=0.2) Softmax NA NA
[a142] SA-B3D with CNN Layers (5) + LSTM Layers (1) + Pooling Layers (4) + FC Layers (1) NA Sigmoid Adam CE
LSTM model Proposed loss function
[a143] CNN CNN Layers (3) + ReLU Activation + Pooling Layers (3) + FC Layers (1) NA SVM SGD NA
[a144] DENN Proposed DENN Architecture with ReLU Activation + FC Layers (2) NA Sigmoid mini-batch SGD CCE
[a145] DNN Propoded DNN with ReLU Activation + FC Layers (2) (rate =0.2) Sigmoid Adam BCE
(rate =0.4)
TABLE III: Details of Deep Nets. For ASD diagnosis and Rehabilitation.