ASD is a disorder of the nervous system that affects the brain and result in difficulties in speech, social interaction and communication deficits, repetitive behaviors, and delays in motor abilities [one]. The disease can generally be distinguished with extant diagnostic protocols from the age of three onwards. Autism influences many parts of the brain. This disorder also involves a genetic influences via by gene interactions or polymorphisms [two, three]
. One in 70 children worldwide is affected by autism. In 2018, the prevalence of ASDs was estimated to occur in 168 out of 10,000 children in the United States, one of the higher prevalence rates worldwide. Autism is significantly more common in boys than in girls. In the United States, about 3.63 percent of boys aged 3 to 17 have autism spectrum disorder, compared with approximately 1.25 percent of girls[four].
Diagnosing ASD is difficult because there is no pathophysiological marker, relying instead psychological criteria [five]. Psychological tools can identify individual behaviors, levels of social interaction, facilitating early diagnosis. Behavioral evaluations embrace various instruments and questionnaires to assist the physician to specify the particular type of delay in a child’s development, including clinical observations, medical history, autism diagnosis instructions, and growth and intelligence tests [six].
Several investigations for the diagnosis of ASD have recently been conducted on neuro-imaging data (structural and functional).
Analyzing anatomical connections of brain areas with structural neuroimaging is an essential tool for studying structural disorders of the brain in ASD. The principal tools for structural brain imaging are magnetic resonance imaging (MRI) techniques [seven, a8, a9]
. Cerebral anatomy is defined by structrul MRI (sMRI) images and anatomical connections are assesed by diffusion tensor imaging MRI (DTI-MR)[a10]. Investigating the activity and functional connections of brain areas using functional neuroimaging can be used for studying ASD. Brain functional diagnostic tools are older approaches than the previous two methods for studying ASD. The most basic modality of functional neural imaging is electroencephalography (EEG), which records the electrical activity of the brain from the surface of the head with a high time resolution (in milliseconds order) [a11]. Studies employing EEG signals in ASD have been useful [a12, a13, a14]. Functional MRI (fMRI) is one of the more promising imaging modalities in functional brain disorders, used as a task-based T-fMRI or resting‐state functional magnetic resonance imaging (rs-fMRI) [a15, a16]. fMRI-based techniques have a high spatial resolution (in the order of millimeters) but a low temporal resolution due to the response of the hemodynamic system of the brain as well as fMRI imaging time constraints and is not ideal for recording the fast dynamics of brain activities.
In addition, these procedures have a high sensitivity to motion artifacts. It should be stressed that in consonance with studies, three less prevalent modalities of electrocorticography (ECoG) [a17], functional near-infrared spectroscopy (fNIRS) [a18], and Magnetoencephalography (MEG) [a19] can also attain reasonable performance in ASD. An appropriate approach is to utilize machine-learning techniques alongside functional and structural data to collaborate with physicians in the process of accurately assessing ASD. In the field of ASD, applying machine learning methods generally entail two categories of traditional [a20] methods and DL methods [a21]. As opposed to traditional methods, much less work has been done on DL methods to explore ASD or design rehabilitation tools.
This study reviews ASD assesment methods and patients’ rehabilitation with DL networks. The outline of this paper is as follows. Section 2 is search strategy. Section 3 concisely presents the DL networks employed in the field of ASD. In section 4, existing computer-aided diagnosis systems (CADS) are reviewed using brain functional and structural data. In section 5, DL-based rehabilitation tools are introduced to support ASD patients. Section 6 discusses the reviewed papers. Section 7 reveals the challenges of ASD diagnosis and rehabilitation with DL. Finally, the paper concludes and suggests future work in section 8.
Ii Search Strategy
In this review, IEEE Xplore, ScienceDirect, SpringerLink, ACM, as well as other conferences or journals were used to acquire papers on ASD diagnosis using DL methods. Further, the keywords ”ASD”, ”Autism Spectrum Disorder” and ”Deep Learning” are used to select the papers. The papers are analyzed till June 03th, 2020 by the authors (AK, SN). Figure 1 depicts the number of considered papers using DL methods for the automated detection of ASD each year.
Iii Deep Learning Techniques for ASD Diagnosis and Rehabilitation
Nowadays, DL algorithms are used in many areas of medicine including structural and functional neuroimaging. The application of DL in neural imaging ranges from brain MR image segmentation [a22], detection of brain lesions such as tumors [a23], diagnosis of brain functional disorders such as ASD [a65], and production of artificial structural or functional brain images [a24]
. Machine learning techniques are categorized into three fundamental categories of learning: supervised learning[a25]a26]
, and reinforcement learning[a27], and a variety of DL networks are provided for each type. So far, most studies applied to identify ASD using DL have been based on supervised or unsupervised approaches. In Figure 2 illustrates generally employed types of families with DL networks with supervised or unsupervised networks to study ASD.
Iv CADS-based deep learning techniques for ADS diagnosis by neuroimaging data
A traditional artificial intelligence (AI)-based CADS encompasses several stages of data acquisition, data pre-processing, feature extraction, and classification [a28, a29, a30, a146]. In these investigations [a31, a32, a33] existing traditional algorithms have been used for diagnosing ASD. In DL-based CADS, however, feature extraction, and classification are performed intelligently within the model. Also, due to the structure of DL networks, using large dataset to train DL networks and recognize intricate patterns in datasets is incumbent. The components of DL-based CADS for ASD detection are shown in Figure 3. It can be noted from the figure that, large and free databases are first introduced to diagnose ASD. In the second step, various types of pre-processing techniques are used on functional and structural data to be scrutinized. Finally, the DL networks are applied on the preprocessed data.
Iv-a Neuroimaging ASD Datasets
Datasets are fed as input to the development of CADS and the power of CADS depends primarily on the affluence of the input data. To diagnose ASD, varied brain functional and structural datasets are availbale. The most complete free dataset available is ABIDE [a34] dataset with two subsets: ABIDE-I and ABIDE-II, which encompasses sMRI, rs-fMRI, and phenotypic data. ABIDE-I involves data from 17 international sites, yielding a total of 1112 datasets, including 539 from individuals with ASD and 573 healthy individuals (ages 64-7). In accordance with HIPAA guidelines and 1000 FCP / INDI protocols, these data are anonymized. In contrast, ABIDE-II contains data from 19 international sites, with a total of 1114 datasets from 521 ASDs individuals and 593 healthy individuals (ages 5-64). Also, preprocessed images of the ABIDE-I series called PCP [a35] can be freely downloaded by the researchers. The second recently released ASD diagnostic database is called NDAR, which comprises variant modalities, and more information is provided in [a36].
Iv-B Preprocessing Techniques
Neuroimaging data (especially functional ones) is relatively complicated structure, and if it is not pre-processed properly, it may affect the final diagnosis. Preprocessing of this data typically entails multiple common steps performed by different software as standard. Indeed, occasionally prepared pipelines are applied on the dataset to yield pre-processed data for future research. In the following section, preprocessing steps are briefly explained for fMRI data.
Iv-B1 Standard (Low-level) fMRI preprocessing steps
Low-level pre-processing of fMRI images normally has fixed number of steps exerted on the data, and prepared toolboxes are usually used to reduce execution time and yield better accuracy. Some of these reputable toolboxes contain FMRIB software libraries (FSL) [a37], BET [a38], FreeSurfer [a39], and SPM [a40]. Also, important and vital fMRI preprocessing incorporates brain extraction, spatial smoothing, temporal filtering, motion correction, slice timing correction, intensity normalization, and registration to standard atlas, which are summarized.
Brain extraction: the goal is to remove the skull and cerebellum from the fMRI image and maintain the brain tissue [a41, a42, a43].
Spatial smoothing: involves averaging the adjacent voxels signal. This process is persuasive on account of neighboring brain voxels being usually closely related in function and blood supply [a41, a42, a43].
Temporal filtering: the aim is to eliminate unwanted components from the time series of voxels without impairing the signal of interest [a41, a42, a43].
Realignment (Motion Correction): During the fMRI test, people often move their heads. The objective of motion correction is to align all images to a reference image so that the coordinates and orientation of the voxels be identical in all fMRI volumetric images [a41, a42, a43].
Slice Timing Correction: The purpose of modifying the slice time is to adjust the time series of the voxels so that all the voxels in each fMRI volume image have a common reference time. Usually, the corresponding time of the first slice recording in each fMRI volume image is selected as the reference time [a41, a42, a43].
Intensity Normalization: at this stage, the average intensity of fMRI signals are rescaled to compensate for global deviations within and between the recording sessions [a41, a42, a43]. Registration to a standard atlas: The human brain entails hundreds of cortical and subcortical areas with variant structures and functions, each of which is very time-consuming and complex to study. To overcome the problem, brain atlases are employed to partition brain images into a confined number of ROIs, following which the mean time series of each ROI can be extracted [a44]. ABIDE datasets exert a manifold of atlases, including Automated Anatomical Labeling (AAL) [a45], Eickhoff-Zilles (EZ) [a46], Harvard-Oxford (HO) [a47], Talaraich and Tournoux (TT) [a48], Dosenbach 160 [a49], Craddock 200 (CC200) [a50] and Craddock 400 (CC400) [a51] and more information is provided in [a52]. Table I provides complete information on preprocessing tools, atlases, and few other preprocessing information.
Iv-B2 Pipeline Methods
Pipelines present preprocessed images of ABIDE databases. They embrace generic pre-processing procedures. Employing pipelines, distinct methods can be compared with each other. In ABIDE datasets, pre-processing is performed by four pipeline techniques: neuroimaging analysis kit (NIAK) [a53], data processing assistant for rs- fMRI (DPARSF) [a54], the configurable pipeline for the analysis of connectomes (CPAC) [a55], or connectome computation system (CCS) [a56]. The preprocessing steps carried out by the various pipelines are comparatively analogous. The chief differences are in the particular algorithms for each step, the software simulations, and the parameters applied. Details of each pipeline technique are provided in [a52]. Table I demonstrates the pipeline techniques used in autism detection investigation exploiting DL.
Iv-B3 High-level preprocessing Steps
High-level techniques for pre-processing brain data are important, and using them accompanying preliminary pre-processing methods can enhance the accuracy of ASD recognition. These methods are applied after the standard pre-processing of functional and structural brain data. These include sliding window (SW) [a65], data augmentation (DA) [a68], functional connectivity matrix (FCM) [a92, a93]
and fast Fourier transformation (FFT)[a78]. Furthermore, some research utilized feature extraction [a106]
techniques and other feature selection methods. Precise information on assayed studies in TableI is indicated in detail.
Iv-C Deep Neural Networks
Deep learning in various medical applications, including the diagnosis of ASD, has become extremely popular in recent years. In this section of the paper, the types of Deep Learning networks used in ASD detection are examined, which include CNN, RNN, AE, DBN, CNN-RNN, and CNN-AE models.
Iv-C1 Convolutional Neural Networks (CNNs)
In this discussion, the types of popular convolutional networks used in ASD diagnosis are surveyed. These networks involve 1D-CNN, 2D-CNN, 3D-CNN models, and a variety of pre-trained networks such as VGG.
1D and 2D-CNN
There are many spatial dependancies present in the data and it is difficult to extract these hidden signatures from the data. Convolution network uses a structure alike to convolution filters to extract these features properly and contribute to the knowledge that features should be processed taking into account spatial dependencies, and the number of network parameters are significantly reduced. The principal application of these networks is in image processing and due to the two-dimensional (2D) image inputs, convolution layers form 2D structures, which is why these networks are 2D convolutional neural network (2D-CNN). By transforming data, in to one-dimensional signals, the convolution layers’ structure also resembles the data structure[a57]. In convolution networks, assuming that variant data sections do not require learning different filters, the number of parameters are markedly lessened and make it feasible to train these networks with more bounded databases [a21]. Figure 4 shows the block digram of 2D-CNN used for ASD detection.
By transforming the data into three dimensions, the convolution network will xalso be altered to a three-dimensional format (Figure 5). It should be noted that the manipulation of three dimensional CNN (3D-CNN) networks is less beneficial than 1D-CNN and 2D-CNN networks for diverse reasons. First, the data required to train these networks must be much large which conventionally such datasets are not utilizable and methods such as pre-training, which are extensively exploited in 2D networks, cannot be usedhere. Another reason is that with more complicated structure of networks, it becomes much tougher to fix the number of layers, and network. The 3D activation map generated during the convolution of a 3D CNN is essential for analyzing data where volumetric or temporal context is crucial. This ability to analyze a series of frames or images in context has led to the use of 3D CNNs as tools for action detection and evaluation of medical imaging. [a58].
Iv-C2 Deep Belief Networks (DBNs)
Although DBNs are not popular today as they used to be, and have been substituted by new models to perform various applications ( autoencoders for unsupervised learning , generative adversarial networks (GAN) for generative modes[a59], variational autoencoders (VAE) [a60]), disregarding the restricted use of this network in this era, their influence on the advancement of neural networks cannot be overlooked. The use of these networks in this paper is related to the feature extraction without a supervisor or pre-training of networks. These networks serve as unsupervised, consisting of several layers after the input layer, which is shown in Figure 6. The training of these networks is done greedily and from bottom to top, in other words, each separate layer is trained and then the next layer is appended. After training, these networks are used for feature extraction method or a network with trained weights [a21].
Iv-C3 Autoencoders (AEs)
Autoencoders (AEs) are more than 30 years old, and have undergone dramatic changes over the years to enhance their performance. But the overall structure of these networks has remained the same [a21].These networks consist of two parts: coder and decoder so that the first part of the input leads to coding in the latent space, and the decoder part endeavors to convert the code into preliminary data (Figure 7). Autoencoders are a special type of feedforward neural networks where the input is the same as the output. They compress the input into a lower-dimensional code and then reconstruct the output from this representation. The code is a compact “summary” or “compression” of the input, also called the latent-space representation. Various methods have been proposed to blockthe data memorizing by the network, including sparse AE (SpAE) and denoising AE (DAE) [a21].If the Autoencoder is properly trained, the coder layer can extract the featuresin unsupervised pre-training in this type of networks.
Iv-C4 Recurrent Neural Networks (RNNs)
In convolution networks, a kind of spatial dependencies in the data are addressed. But interdependencies between data are not confined to this model. For example in time-series dependencies may be highly distant from each other, on the other hand, the long-term and variable length of these sequences results in that the ordinary networks do not perform well enough to process these data. To overcome these problems, RNN networks can be used. LSTM structures are proposed to extract long term and short term dependencies in the data (Figure 8). Another well-known structure called GRU is developed after LSTM, and since then, most efforts have been madeto enhance these two structures and make them resistant to challenges (eg GRU-D, [a61] is used to find the lost data).
The initial idea in these networks was to utilize convolution layers to amend the performance of RNNs so that the advantages of both networks can be applied; CNN-RNN, on the one hand, makes it achievable to receive temporal dependencies with the relief of RNN, and on the other hand, it discovers the possibility of receiving spatial dependencies in data with the help of convolution layers [a62]. These networks are highly beneficial for analyzing time series with more than one dimension (such as video) [a63] but further to the simpler matter, these networks also yield the analysis of three-dimensional data so that instead of a more complex design of a 3D-CNN, a 2D-CNN with an RNN network is occasionally used. The superiority of this model is due to the feasibility of employing pre-trained models. Figure 9 demonstrates the CNN-RNN model.
In the construction of these networks, the principal aim and prerequisite have been to decrease the number of parameters. Just changing the network layers of convolution markedly lessens the number of parameters, combining AE with convolution structures also makes significant contribution. This helps to exploit more dimensional data and extracts more information from the data without changing the size of the database. Similar structures, with or without some modification, are widely deployed in image segmentation [a64], and likewise unsupervised network can be applied for network pre-training or feature extraction. Figure 10 depicts the CNN-AE network used for ASD detection. In Tables I and II, provide the summary of papers published on detection and rehabilitation of ASD patients using DL respectively.
V Deep learning techniques for ADS rehabilitation
Rehabilitation tools are employed in multiple fields of medicine and the main purpose is to help the patients to recover after the treatment. Various and multiple rehabilitation tools using DL algorithms have been presented. Rehabilitation tools used to help ASD patients using mobile, computer applications, robotic devices, cloud systems, and eye tracking, which will be discussed below. Also, the summary of papers published on rehabilitation of ASD patients using DL algorithm are shown in table II.
V-a Mobile and Software Applications
Facial expressions are a key mode of non-verbal communication in children with ASD and play a pivotal role in social interactions. Use of BCI systems provides insight into the user’s inner-emotional state. Valles et al. [a126] conducted research focused on mobile software design to provide assistance to children with ASD. They aimed to design a smart iOS app based on facial images according to Figure 11. In this way, people’s faces at different angles and brightness are first photographed, and are turned into various emoji so that the autistic child can express his/her feelings and emotionals. In the group’s major investigation [a126], Kaggle’s (The Facial Expression Recognition 2013) and KDEF (Kaggle’s FER2013 and Karolinska Directed Emotional Faces) databases were used to train the VGG-16 is established. In addition, the LEAP system has been adapted to train the model at the University of Texas. The research provides the highest rate accuracy of 86.44%. In another similar study, they achieved an accuracy of 78.32% [a124].
V-B Cloud Systems
Mohammadian et al. [a143] have proposed a new application of DL to facilitate automatic stereotypical motor movement (SMM) for identification by applying multi-axis inertial measurement units (IMUs). They have applied CNN to transform multi-sensor time series into feature space. An LSTM network is then combined with CNN to obtain the temporal patterns in SMM identification. Finally, they employed the classiﬁer selection voting approach to combine an ensemble of the best base learners. After various experiments, the superiority of their proposed procedure over other base methods has been proven. Figure 12 shows the real-time SMM detection system. First, IMUs, which are wearable sensors, are used for data collection; the data can then be analyzed locally or remotely (using Wi-Fi to transfer data to tablets, cell phones, medical center servers, etc.) to identify SMMs. If abnormal movements are detected, an alarm will be sent to a therapist or parents.
V-C Eye Tracking
Wu et al. [a136]
proposed a model of DL saliency prediction for autistic children. They used DCN in their proposed paradigm, with a SM saliency map output. The fixation density map (FDM) is then processed by the single-side clipping (SSC) to optimize the proposed loss function as a true label along with the SM saliency map. Finally, they exploited an autism eye-tracking dataset to test the model. Their proposed model outperformed other base methods. Elbattah et al.[a138]
aimed to employ unsupervised machine learning to detect clusters in ASD. Their key goal was to learn eye-tracking scan paths based on visual representation clusters. The first step involved the visualization of the eye-tracking path, and the images captured from this step were fed to an autoencoder to learn the features. Using autoencoder features, clustering models are developed using the K-Means algorithm. Their method performed better than other state-of-art techniques.
|Work||Datasets||Neuroimaging Modalities||Number of Cases||Pipelines||Image Atlas||Preprocessing Toolbox||High level Preprocessing||Inputs DNN||DNN Toolbox||DNNs||Number of Layers||Classifier||K fold||Performance Criteria (%)|
|[a65]||Clinical acquisition||T-fMRI||82 ASD 48 HC||NA||MNI152||BET||SW||single mean channel input||NA||2CC3D||17||Majority Voting||No||F1-Score = 89|
|residual -fMRI||FSL||single std channel input|
|combined 2-channel input|
|[a66]||Clinical acquisition||T-fMRI||82 ASD, 48 HC||NA||AAL||NA||SVE, C-SVE, H-SVE, Monte||Mean channel sequence||NA||2CC3D||14||Sigmoid||NA||Acc = 97.32|
|Carlo Approximation||STD channel|
|[a67]||HCP dataset in the HAFNI project||T-fMRI||68 subjects with 7 tasks and 1 rs-fMRI data||NA||NA||FSL||dictionary learning and sparse coding||functional RNSs maps||NA||3D-CNN||8||Softmax||NA||Acc = 94.61|
|[a68]||Clinical acquisition||T-fMRI(T1-weighted MP-RAGE s-MRI, BOLD T2*-weighted fMRI sequence)||21 ASD 19 HC||NA||AAL||FSL||DA||ROIs time-series||Keras||LSTM||7||Sigmoid||10||Acc=69.8|
|[a69]||Different datasets||T-fMRI||1711 ASD 15903 HC||A||AAL||SPM||Wavelet and different Techniques||FCMs||Keras||CNN||14||Softmax||NA||Ensemble AUROC=0.92 Ensemble Acc=85.19|
|[a70]||Clinical acquisition||T-fMRI||82 ASD, 48 HC||NA||AAL||Neurosynth||SW||Original fMRI sequence||NA||2CC3D||16||Sigmoid||NA||Acc= 87.1|
|corrupting strategy||mean-channel sequence|
|[a70]||ABIDE-I||rs-fMRI||41 ASD 54 HC||NA||AAL||FSL||prediction distribution analysis||std-channel sequence||NA||2CC3D||16||Sigmoid||NA||Acc= 85.3|
|corrupt a ROI of the original image|
|[a71]||ABIDE-I||rs-fMRI||379 ASD, 395 HC||CPAC||All atlases ABIDE||NA||Connectivity matrix calculation||concatenating voxel-level maps of connectivity fingerprints||NA||3D-CNN||7||Sigmoid||10||Acc= 73.3|
|ABIDE II||163 ASD, 230 HC|
|[a72]||ABIDE-I||rs-fMRI||505 ASD 530 HC||CPAC||CC-200||NA||FCM, DA||Masking correlations||PyTorch||AE||NA||SLP||10||Acc=70.1 Sen=67.8 Spec=72.8|
|[a73]||ABIDE-I||rs-fMRI||872 subjects||CPAC||HO||Nilearn||NA||Raw images||NA||G-CNNs||5||Softmax||10||Acc=70.86|
|[a74]||ABIDE-I||rs-fMRI||474 ASD 539 HC||CCS||AAL||NA||FCM||Functional connectomes||NA||BrainNetCNN with proposed layers||15||Softmax||5||Acc= 68.7 Sen= 69.2 Spe= 68.3|
|[a75]||ABIDE-I||rs-fMRI||13 ASD 22 HC||NA||AAL||SPM8||Qcut, NMI statistic matrix||Pearson Correlation coefficient Matrix||NA||DAE||NA||NA||NA||Acc=54.49|
|[a76]||ABIDE||rs-fMRI||11 ASD 16 HC||NA||NA||FSL||convert NII files to PNG images||preprocessed PNG images||Caffe||LeNet-5||Standard||Softmax||NA||Acc=100 Sen=99.99 Spec=100|
|[a77]||ABIDE-I||rs-fMRI||55 ASD 55 HC||NIAK||AAL||NA||FCM, feature selection||whole-brain FCPs||NA||Multiple SAEs||4||Softmax regression||5||Acc=86.36|
|[a78]||ABIDE-I||rs-fMRI||54 ASD 62 HC||NA||NA||SPM8||Dimension Reduction||images with 95 × 68, 79 × 68, and 79 × 95 dimensions, around the x, y, and z axes||Keras with Theano backend||MCNNEs||9||Binary SR||10||Acc=72.73 Sen=71.2 Spec=73.48|
|156 ASD 187 HC||FFT|
|ABIDE-I + II|
|[a79]||ABIDE||rs-fMRI||542 ASD 625 HC||CPAC||All atlases||NA||Creating stochastic parcellations by Poisson Disk Sampling||gray matter mask parcellations||NA||3D-CNN||6||Various Methods||10||Acc=72|
|[a80]||ABIDE-I||rs-fMRI||465 ASD 507 HC||DPARSF||AAL||NA||FCM||edge weights of subjects’ brain graph||Keras||VAE||3||NA||NA||NA|
|[a81]||ABIDE-I||rs-fMRI||539 ASD 573 HC||CCS||Craddock 200||Neurosynth||DA||mean time courses from ROIs||Keras||LSTM||5||Sigmoid||10||Acc=68.5|
|[a82]||ABIDE||Rs-fMRI, phenotypic info||505 ASD 530 HC||NA||CC200||DPABI||Slicetiming, spatial standardization, smoothing, filtering, removing covariates, FCM, AE-MKFC||4005-dimensional eigenvector||NA||SAE||3||Clustering||NA||Acc=61 NMI=3.7 F-measure=60.2|
|[a83]||ABIDE||rs-fMRI||42 ASD 42 HC||NA||NA||FSL||Independent components (time course, power spectrum and spatial map)||time courses of each subject||NA||SAE||9||Softmax||21||Acc=87.21 Sen=89.49 Spec=83.73|
|[a84]||ABIDE-I||rs-fMRI||NY site||CCS||AAL||Neurosynth||DA||fMRI ROI time-series, functional communities||Keras||LSTM||6||Sigmoid||10||Acc=74.8|
|[a85]||ABIDE-I||rs-fMRI||408 ASD 401 HC||CPAC||HO||FSL||NA||3 different FCM+ Demographic data||Keras||DANN||25||Sigmoid||10||Acc=73.2 Sen=74.5 Spec=71.7|
|[a86]||ABIDE||rs-fMRI||at least 60 subjects||CCS||AAL||FMRIB’s linear & nonlinear image regist- ration tools||DTL-NN framework: offline learning, Transfer Learning FCM using Pearson’s correlation||FC patterns||NA||SSAE||4||Softmax regression||5||Avg Acc= 67.1 Avg Sen=65.7 Avg spec=68.3 AUC=0.71|
|[a87]||ABIDE I+II||rs-fMRI||993 ASD 1092 HC||NA||AAL||FAST||NA||Mean Time-Series within each ROI||NA||1D-CNN||5||Softmax||10||Acc=68|
|[a88]||ABIDE-I||rs-fMRI||529 ASD 573 HC||All pipelines||NA||NA||Single Volume Image Generator||Glass Brain and Stat Map Images||Keras||4 Deep Ensemble Classifier techniques (CNN)||16||Sigmoid||NA||Acc=87 F1-score=86 Recall=85.2 Precision=86.8|
|[a89]||ABIDE-II||rs-fMRI||303 ASD, 390 HC||NA||NA||FSL||NA||1D time series from voxels||NA||1D-CAE||14||NA||NA||Acc= 65.3|
|[a90]||ABIDE||rs-fMRI||40 ASD, 40 HC||CCS||NA||NA||Threshold segmentation||WM, GM, CSF||NA||AlexNet||Standard||Softmax||NA||Acc=82.61|
|[a91]||ABIDE||rs-fMRI||Whole dataset||All pipelines||parcellated into 200 regions||NA||DA using SMOTE and graph network motifs, FCM calculation||upper triangle part of the correlation matrix||NA||ASD- DiagNet||Proposed||SLP||NA||Acc=82 Sen=79.1 Spec=83.3|
|[a92]||ABIDE-I||rs-fMRI||12 ASD 14 HC||C-PAC||SCSC||NA||Time series extraction from different regions, connectivity matrix, SMOTE algorithm||FCM||PyTorch||Auto- ASD- Network||Proposed||SVM||5||Acc=80 Sen=73 Spec=83|
|[a93]||ABIDE-I||rs-fMRI||505 ASD 530 HC||CPAC||CC400||NA||FCM computation||FCM||NA||CNN||20||MLP||10||Acc=70.20 Sen=77.00 Spec=61.00|
|[a94]||ABIDE||rs-fMRI||505 ASD 530 HC||NA||NA||NA||Connectivity matrix||
matrix to one
|NA||1D CNN -AE||7||Softmax||NA||Acc=70 Sen=74 Spec=63|
|[a95]||ABIDE-I||rs-fMRI||539 ASD 573 HC||NA||NA||FreeSurfer||NA||single 3D image||Theano||3D-FCNN||13||Softmax||6||Mean DSC=91.56 Mean MHD=14.05|
|[a96]||ABIDE-I||rs-fMRI||501 ASD 553 HC||DPARSF||AAL||NA||converting FCM to one dimensional vector||1000 features selected by the SVM-RFE||NA||SSAE||3||Softmax||different folds||Acc=93.59 Sen=92.52 Spec=94.56|
|[a97]||ABIDE-I||rs-fMRI||100 ASD 100 HC||NA||NA||FSL||online dictionary learning and sparse representation techniques, generating spatial overlap patterns||4D matrix with 150 3D network overlap maps||Theano||3D-CNN||14||NA||10||Average Acc= 70.5 Average Sen= 74 Average Spec= 67|
|[a98]||ABIDE-I||rs-fMRI & phenotypic||529 ASD 571 HC||CPAC||HO||FSL||Population graph construction, Feature Selection Strategies (RFE, PCA, MLP, AE)||Population graph||Scikit -learn||GCN||11||Softmax||10||Acc=80.0|
|[a99]||ABIDE-I||rs-fMRI & phenotypic||403 ASD 468 HC||CCS||CC200||NA||DA||mean time-series from ROIs||Keras||LSTM||6||Sigmoid||10||Acc=70.1|
|[a100]||ABIDE-I||rs-fMRI & S-MRI & phenotypic (T1 weighted)||505 ASD 530 HC||CPAC||Craddock 200||NA||Flattening Functional connectivity matrix||one-dimension vector||NA||two SdAE + MLP||NA||Softmax||10||Acc=70 Sen=74 Spec=63|
|[a101]||Clinical acquisition||rs-fMRI||75 qualified subjects||NA||NA||FSL||Extraction of fetal brain fMRI data, SW||Mean time series of 3D fMRI volumes||PyTorch||3D-CNN||7||Sigmoid||NA||F1-score=84 AUC=91|
|Fetal BOLD fMRI||Brainsuite|
|[a102]||ABIDE-I||rs-fMRI||116 ASD 69 HC||NA||AAL||SPM8||Segmentation, average mean time series of each ROI||Rs-fMRI + GM+WM data fusions||Theano||DBN||6||LR||10||Acc=65.56 Sen=84 Spec=32.96|
|[a103]||IMPAC||rs-fMRI||418 ASD 497 HC||NA||All atlases||NA||flattening FCM from Rs-fMRI, features extraction from S-MRI||FCM vector||Keras||Different Networks||8||Various Methods||3||AUC= 80|
|s-MRI||Combination of both||Flow|
|anatomical and connectivity||Caffe|
|[a104]||ABIDE-I||rs-fMRI||368 ASD 449 HC||CPAC||AAL||Freesurfer||FCM computation and flattening into a 1D vector, Fisher Score||1D vector||NA||ensemble of 5 stacked AEs and MLP for classification||31||label fus- ion using the average of softmax probabilities||10||Acc= 85.06 Sen= 81 Spec= 89|
|[a105]||NDAR||rs-fMRI||61ASD 215 HC||NA||NA||NA||data-driven landmark discovery algorithm, Patch Extraction||50 patches extracted from 50 landmarks||NA||Multi- Channel CNN||13||Softmax||10||Acc=76.24|
|[a106]||NDAR||All modalities||78 ASD 124 HC||NA||Proposed atlas||FSL||PICA, PSD feature extraction||PSDs of 34 components||NA||34 SAEs||Each SAE has 2 layers||PSVM||NA||Acc=88.5 Sen=85.1 Spec=90.4|
|[a107]||NDAR||T1-weighted||60 ASD, 211 HC||NA||NA||in-house||3D Patches extraction||Patch size 16×64×16||NA||DDUNET||11||NA||5||NA|
|[a108]||ABIDE-I||s-MRI||21 ASD, 21 HC||NA||NA||FSL||segmentation, Shape feature extraction||CDF values of features||NA||SNCAE||NA||Softmax||NA||Acc=96.88|
|NDAR/Pitt||16 ASD, 16 HC||iBEAT|
|NDAR/IBIS||10 ASD, 10 HC|
|[a109]||ABIDE-I||s-MRI||78 ASD 104 HC||NA||Destrieux||FreeSurfer||
construction of individual
|3000 top features||NA||SAE||3||Softmax||10||Acc=90.39 Sen=84.37 Spec=95.88|
|[a110]||HCP||s-MRI||1113 HC||NA||Desikan– Killia||FreeSurfer||Normalization, apply one-hot coding||Preprocessed images||Tensor-||DEA||3||NA||10||AUC-ROC=63.9|
|ABIDE-I||83 ASD, 105 HC||Flow|
|[a111]||ABIDE||s-MRI||1112 subjects||NA||NA||SPM12||NA||32 slices along each axial, coronal, and sagittal||Keras||DCNN||17||Sigmoid||NA||Acc=84 Sen=77 Spec=85|
|[a112]||ABIDE-II||MRI||NA||NA||DKT||FreeSurfer||segmentation||coronal, axial and sagittal||PyTorch||FastSur-||Proposed||Softmax||NA||NA|
|2D slices||fer CNN|
|[a113]||ABIDE-I||MRI||500 ASD 500 HC||NA||HO cortical and subcortical structural atlas||FSL||GABM method, new chromosome encoding scheme||Preprocessed MRI scans||NA||3D-CNN||11||Softmax||5||Acc=70|
|[a114]||Clinical acquisition||MRI||48 HC||NA||NA||FreeSurfer||sparse annotations, DA||image patch||Caffe||3D-CNN||18||Softmax||NA||Acc=91.6 ROC=94.1|
|[a115]||ABIDE-I||rs-fMRI||270 ASD 305 HC||C-PAC||Brain- Netome atlas (BNA)||NA||Filtering, calculating mean time series for ROIs using BrainNetome atlas (BNA), normalization||mean time series data stacked across ROIs||NA||CNN- GRU||14||Sigmoid||5||Acc=74.54 Sen=63.46 Spec=84.33|
|[a116]||Clinical acquisition||fNIRS||25 ASD 22 HC||No||No||No||transformation of the time series to three variants||PM, GM, SM||Keras||1D CNN- LSTM||NA||Bagging||NA||Acc=95.7 Sen=97.1 Spec=94.3|
|[a117]||Clinical acquisition||fNIRS||25 ASD 22 HC||No||No||No||SW||3D tensor||NA||CGRNN||7||NA||NA||Acc=92.2 Sen=85.0 Spec=99.4|
|converted into the 3D tensor|
|[a118]||Different datasets||MRI||NA||NA||Various Methods||FreeSurfer||geometric DA||3D cortical mask||Theano||ConvNet||U-Nets||NA||8||NA|
|[a119]||ABIDE I+II||rs-fMRI||620 ASD 2085 HC||C-PAC||HO||FSL||performed an automatic quality control, visually inspection, 9 temporal summary measures, mean and STD of the summary measures, normalization, Occlusion of Brain Regions||each summary measure||NA||MM- ensemble (3D-CNN)||7||Majority voting||5||Acc=64 F1-scre=66|
|[a120]||ABIDE-I||rs-fMRI||184 ASD 110 HC||C-PAC||NA||NA||Down Sampling||raw 4D volume||NA||3D-CNN C-LSTM||21||Softmax||5||Acc=77 F1-score=78|
|[a121]||ABIDE-I||rs-fMRI and T1-weighted images||403 ASD 468 HC||NA||264 ROIs||AFNI FSL MATLAB||Connectivity matrix, feature extraction (tDifferent Features)||Normalized features||NA||AE||7||DNN||10||Acc= 79.2 AUC= 82.4|
|[a122]||ABIDE-I||rs-fMRI||505 ASD 530 HC||C-PAC||CC200||NA||FCM||vector of FC measures||PyTorch||CapsNet||Standard||k-means clustering||10||Acc=71 Sen=73 Spec=66|
|Work||Datasets||Type of Applications||Number of Cases||Preprocessing||Inputs DNN||DNN Toolbox||DNNs||Number of Layers||Classifier||K fold||Performance Criteria (%)|
|[a123]||OSIE||—||20 ASD 19 HC||HFM Construction, Filtering Normalizing, DA||HFMs, Natural Scene Images||Caffe||VGGNeT||50||Softmax||13||Acc=85 Sen=80 Spec=89|
|[a124]||KDEF||Facial Expression Recognition||70 Individuals||DA||RGB Images (562×762)||Keras||DCNN||44||Softmax||NA||Acc=78.32|
|[a125]||Clinical Acquisition||Detect Audio Regimes That Directly Estimate ASD Severity Social Affect scores||33 ASD||MFCC Spectrograms||32 Spectrograms||NA||Noisemes Network||Standard Network||Synthetic RF||—||Acc=84.7|
|[a126]||Kaggle’s||Facial Expression Recognition||NA||No||48×48-Pixel Images||Keras||DCNN||44||Softmax||NA||Acc=86.44|
|[a127]||SALICON||ASD Classification||14 ASD 14 HC||SalGAN Model, Feature Extraction||Sequence of Image Patches||NA||SP-ASDNet||11||NA||NA||Acc=57.90 Rec=59.21 Pre=56.26|
|[a128]||BigFaceX||Facial Expression Recognition||196 Subjects||SW, Merge in the Channel Dimension, DA||5-channel Sub-Sequence Stacks within a Specific Time Window||Keras||TimeConvNet||PreTrain Nets||Softmax||NA||Acc=97.9|
|[a129]||Different Datasets||Suitable Courseware for Children with ASD||NA||Interactive and Intelligent Chatbot , NLP, Visual Aid||Different Inputs||NA||Different Nets||NA||NA||NA||NA|
|[a130]||Camera Images||Estimating Visual Attention in Robot-Assisted Therapy||6 ASD and ID||Resizing, Frame Extraction, Visual Inspection||5 Facial Landmarks - 36 HOG Descriptors||NA||R-CNN||VGG-16||K-NN||10||Acc= 88.2 Pre=83.3 Sen=83.0 Spec=87.3|
|Face Detection (Viola–Jones), Feature Extraction (HOG Descriptors)||MTCNN||Cascaded CNNs Architecture||Naïve|
|[a131]||Sensor||Automatic SMM detection||6 ASD||Resampling, Filtering, SW||Time-Series of Multiple Sensors||Keras||CNN-LSTM||13||Majority||NA||NA|
|[a132]||KOMAA||Facial Expression Recognition||55 subjects||
|Greedy Forward Feature Selection||NA||CNN||9||SVM||NA||Acc=96|
|[a133]||Story-Telling Narrative Corpora||ASD Classification||31 ASD 36 HC||DA, ChineseWord2Vec||32-Dimensional Word Vector||NA||LSTM||1||Coherence Representation of LSTM Forget Gate||NA||Acc=92|
|[a134]||Ext-Dataset (video dataset)||ASD Classification using Eye Tracking||136 ASD 136 HC||TLD Method, Accumulative Histogram Computation||Angle Histogram, Length Histogram and Fused Histogram,||Keras||LSTM||4||NA||10||Acc=92.6|
|[a135]||MIT1003||Predicting Visual Attention of Children with ASD.||300 Images||NA||Raw Images||NA||DCN||26||NA||NA||SIM=67.8|
|[a136]||Scan Path Data,||ASD Classification||14 ASD 14 HC||DA Methods||Image, Data Points||Pytorch||ResNet18)||Standard||Softmax||NA||Acc=55.13|
|[a137]||UCI Machine Learning Repository||ASD classification||Number of Instances= 704||Different Methods||Preprocessed Data||NA||CNN||7||NA||NA||Acc=99.53 Sens=99.39 Spec=100|
|[a138]||Eye Tracking Scanpath||ASD Classification||29 ASD 30 HC||Visualization of Eye-Tracking Scanpaths Scaling Down, PCA||100*100 Image||Keras, Scikit- Learn||AE||8||K-Means Clustering||NA||Silhouette score=60|
|[a139]||Video Data||Engagement Estimation of Children with ASD During a Robot-Assisted Autism Therapy||30 children||NA||cropped face images (256 *256)||Keras||CultureNet||R-CNN + ResNet50+ 5FC layers||Softmax||Na||ICC=43.35 CCC=43.18 PC=45.17|
|[a140]||YouTube ASD Dataset||Modeling Typical and Atypical Behaviors in ASD Children||68 video Clips||Different Methods||Sequences of Individual Frames at a Rate of 30 fps||openCV, Caffe||DCNN||NA||DT||5||Avg Pre=73 Avg Recall=75 Avg Acc=71|
|[a141]||Video Dataset||Behavioral Data Extracted from Video Analysis of Child-Robot Interactions.||5 ASD 7 HC||Segmentation, Upper Body tracking, Laban Movement Analysis to Drive Weight, Different features||3 Movement Features with 68 Facial Key-Points||NA||CNN||10||Softmax||NA||Acc=88.46 Pre=89.12 Recall=88.53|
|[a142]||Video Dataset||Developing Automatic SMM Detection Systems||6 ASD||Resampling, Filtering, SW, Data Balancing, Normalizing||Time-Series of Multiple Accelerometer Sensors||Deeppy Library||CNN||8||SVM||NA||F1-score=95|
|[a144]||ASD Screening||Autism Screening||513 ASD 189 HC||
Cleaning Missing Values and Outliers,Visualization, Identity Mapping
The Embedded Categorical Variablesare Concatenated with Numerical Features as New Feature Vectors
|NA||DENN||4||Sigmoid||NA||Acc=100 Spec=99 Sen=100 F1-score=99|
|[a145]||ASD Screening Datasets||Classification of Adults with ASD||—||Handling of Missing Values, Variable||Normalized Variables||Keras||DNN||7||Sigmoid||NA||Acc=99.40|
|Reduction, Normalization, and||Sen=97.89|
In this study, we performed a comprehensive overview of the investigations conducted in the scope of ASD diagnostic CADS systems as well as DL based rehabilitation tools for ASD patients. In the field of ASD diagnosis, numerous papers have been published using functional and structural data as well as rehabilitation tools, as illustrated in table III in the appendix. A variety of DL toolboxes have been proposed for implementing deep networks. In tables I and I the types of DL toolboxes utilized for each study are are depicted in Figure 13. The Keras tool is used in majority of the studies due to its simplicity and offers consistent high level aplication programming interface (APIs) to build the models.
Number of of deep learning networks used for the ASD detection in the reviewed works is shown in Figure 14. Among the various DL architectures, CNN is found to be more popular as it has achieved more promising results compared to other deep methodologies. The autoencoder, as well as RNN, have yielded favorable results.
Some of the most substantial challenges in ASD diagnosis scope have been addressed using DL-based techniques in this section, which comprise database and algorithmic problems. There are only two-class brain structural and functional datasets (ASD and healthy) available in the public domain. Hence, researchers are not able to broaden their investigation to other types of ASD disorders. One of the cheapest and most pragmatic functional neuro-screening modalities for diagnosis are ASD are EEG, and fNIRS. But unfortunately the deficiency of freely available datasets has resulted in little research in this area. Another obstacle is that multi-modality databases such as EEG-fMRI are not available to researchers to evaluate the effectiveness of incorporating information in different imaging modalities to detect ASD. However, although fMRI and sMRI data are ubiquitous in the ABIDE dataset, the results of merging these structural and functional data for ASD diagnosis with DL have not yet been investigated. Another problem grappling the researchers is designing the DL-based rehabilitation systems with hardware resources. Nowadays, researcers are allocated with assistive tools such as Google Colab to improve the processing power, the problems still prevail when implementing these systems in the real world scenarios.
Viii Conclusion and Future Works
ASD is typically characterized by social disorders, communication deficits, and stereotypical behaviors. Numerous computer-aided diagnosis systems and rehabilitation tools have been developed to assist patients with autism disorders. In this survey, research on ASD diagnosis applying DL and functional and structural data were first assessed.The researchers have taken advantage of deep CNNs, RNNs, AEs, and CNN-RNN networks to improve the performance of their system. Boosting the accuracy of the system, the capability of generalizing and adapting to differing data and real-world challenges, as well as reducing the hardware power requirements to the extent that the final system can be utilized by all are the principal challenges of these systems. To enhance the accuracy and performance of CADS for ASD detection in the future, deep reinforcement networks (RL) or GANs can be exploited. Scarcity of data is always aparamount problem in the medical field that can be resolved relatively with the help of these deep GANs.
Many researchers have proposed various DL-based rehabilitation tools to aid the ASD patients. Designing a reliable, accurate, and wearable low power consumption DL algorithm based device is the future tool for ASD patients. The achievable rehabilitation tool is to wear smart glasses to help the children with ASD. These glasses with the built-in cameraswill acquire the images from the different directions of environment. Then the DL algorithm processing these images and produces meaningful images to the ASD child to better communicate with their surroundings.
Table III shows details about all the works reviewed in this study.
MB is supported by a NHMRC Senior Principal Research Fellowship (1059660 and 1156072). MB has received Grant/Research Support from the NIH, Cooperative Research Centre, Simons Autism Foundation, Cancer Council of Victoria, Stanley Medical Research Foundation, Medical Benefits Fund, National Health and Medical Research Council, Medical Research Futures Fund, Beyond Blue, Rotary Health, A2 milk company, Meat and Livestock Board, Woolworths, Avant and the Harry Windsor Foundation, has been a speaker for Astra Zeneca, Lundbeck, Merck, Pfizer, and served as a consultant to Allergan, Astra Zeneca, Bioadvantex, Bionomics, Collaborative Medicinal Development, Lundbeck Merck, Pfizer and Servier - all unrelated to this work.
|Author||Network||Details for Deep Networks||Dropout||Classifier||Optimizer||Loss function|
|[a65]||2CC3D||CNN Layers (6) + Pooling Layers (4) + FC Layers (2)||2 (rate=0.5)||Sigmoid||NA||BCE|
|[a66]||2CC3D||CNN Layers (6) + Pooling Layers (4) + FC Layers (3)||NA||Sigmoid||NA||NA|
|[a67]||3D-CNN||CNN Layers (2) + LReLu Actication + Pooling Layers (1) + FC Layers (1)||3 (rate=NA)||Softmax||SGD||MNLL|
|[a68]||LSTM||LSTM Layers (1) + Pooling Layers (1) + FC Layers (3)||1 (rate=0.5)||Sigmoid||Adadelta||MSE|
CNN Layers (2) + ReLu Activation + BN Layers (4) + FC Layers (3)
|[a70]||2CC3D||CNN Layers (6) + Pooling Layers (4) + FC Layers (2)||2 (rate=NA)||Sigmoid||NA||NA|
|[a71]||CNN||CNN Layers (2) + ELU Activation + Pooling Layers (2) + FC Layers (2)||NA||Sigmoid||SGD||NA|
|[a72]||AE||Standard AE with Tanh Actication||NA||SLP||NA||MSE|
|[a73]||G-CNN||Proposed G-CNN with 3 Layer CNN||(rate=0.3)||Softmax||Adam||NA|
|[a74]||BrainNetCNN with||Element-wise layer (1) + E2E layers (2) + E2N layer (1) + N2G layer (1)||5 (rate=0.5)||Softmax||Adam||Proposed Loss Function|
|proposed layers||+ FC layers (3)+ Leaky ReLU activation+ htan activation||1 (rate= 0.6)|
|[a75]||DAE||Standard DAE||NA||NA||NA||Proposed Loss function|
|[a76]||LeNet-5||Standard LeNet-5 Architecture||NA||Softmax||NA||NA|
|[a77]||SAE||SAE with LSF Activation||NA||Softmax||LBFGS||NA|
|[a78]||MCNNEs||CNN Layers (3) + ReLU Activation + Pooling Layers (3) + FC Layers (1)||1 (rate=0.5)||Binary SR||Adam||BCE|
|[a79]||3D-CNN||CNN Layers (2) + ELU Activation + Pooling Layers (2) + FC Layers (3)||NA||Sigmoid||SGD||BCE|
|[a80]||VAE||VAE with 3 Layers||NA||NA||Adadelta||Proposed loss function|
|[a81]||LSTM||LSTM Layers (1) + Pooling Layers (1) + FC Layers (1)||1(rate=0.5)||Sigmoid||Adadelta||BCE|
|[a82]||SAE||SAE Layers (3) + Sigmoid Activation||NA||Clustering||Proposed Opt.||NA|
|[a83]||SAE||SAE Layers (8) + Sigmoid Activation||NA||SR||L-BFGS||MSE|
|[a84]||LSTM||LSTM Layers (2) + Pooling Layers (1) + FC Layers (2)||NA||Sigmoid||Adam||BCE|
|[a85]||Multichannel DANN||3 MLP (1 dropout layer and 4 dense layers) + Self-attention (3) + Fusion (3)||1 (rate=NA)||Sigmoid||NA||CE|
|+ Aggregation layer + dense layer (1) + relu, elu, tanh activations|
|[a86]||SSAE||3 SSAE Layers||NA||Softmax||scaled conjugate||Proposed loss function|
|[a87]||1D-CNN||CNN Layers (1) + Pooling Layers (1) + FC Layers (1)||(rate=0.2)||Softmax||Adam||NA|
|[a88]||CNN||CNN layers (6) + pooling layers (4) + BN layers (2) + FC layers (2)||1 (rate=0.25)||Sigmoid||Adam||Propose loss function|
|[a89]||1D CAE-CNN||Encoder (4 layers) + Decoder (4 layers) + CNN layers (2) + pooling layers (2) + FC layers (2)||NA||NA||NA||NA|
|[a90]||AlexNet||Standard AlexNet Architecture||NA||Softmax||NA||CE|
|[a93]||CNN||CNN layers (7) + Pooling layers (7) + FC layers (3)||1 (rate=0.25)||MLP||NA||NA|
|[a94]||2 SdAE-CNN||Proposed SDAE-CNN with 7 Layes CNN||NA||Softmax||NA||NA|
|[a95]||3D-FCNN||CNN Layers (9) + PReLU Activation + FC Layers (3)||NA||Softmax||SGD||CE|
|[a96]||SSAE||2 Layers SSAE||NA||Softmax||NA||NA|
|[a97]||3D-CNN||CNN layers (7) + Pooling Layers (3) + FC Layers (2) + log-likelihood activation||2 (rate=0.2)||NA||SGD||MNLL|
|[a98]||GCN||GCN with ReLU and Sigmoid Actication||(rate=0.3)||Softmax||NA||CE|
|AE||SAE wth Tanh Activation||MSE|
|[a99]||LSTM||Proposed Deep Nework||(rate=0.5)||Sigmoid||Adadelta||BCE|
|[a100]||2 SdAE-MLP||Proposed 2-SDAE-MLP Network||NA||Softmax||NA||MSE|
|[a101]||3D-CNN||CNN Layers (2) + ReLU Activation + Pooling Layers (2) + FC Layers (2)||NA||Sigmoid||SGD||BCE|
|[a102]||DBN||DBN with 5 Hidden Layers||NA||LR||NA||NA|
|[a103]||FeedFWD||Dense layers (5) + LReLU activation||3 (rate= NA)||NA||Adam||BCE|
|[a104]||ensemble of 5 SAE and MLP||5 [ AE (3) + MLP (2)] + softmax activation||5 (rate=NA)||averaging the||NA||NA|
|[a105]||Multi-Channel CNN||CNN Layers (5) + ReLU Activation + Pooling Layers (2) + FC Layers (5)||NA||Softmax||NA||CE|
|[a106]||34 SAEs||34 [ SAE network (2)]||NA||PSVM||L-BFGS||NA|
|[a107]||DDUNET||Proposed DDUNET with 11 blocks and ReLU achtivation||(rate=0.1)||NA||SGD||CE|
|[a108]||SNCAE||Proposed SNCAE Newtork||NA||Softmax||NA||NA|
|[a109]||SpAE||SpAE with 2 Networks||NA||Softmax||NA||MSE|
|[a110]||DAE||AE (3) + SELU Activation||NA||NA||Adam||Sum of MSE + 2 CE + CC|
|[a111]||DCNN||CNN Layers (6) + ReLU Activation + Pooling Layers (6) + FC Layers (4)||NA||Sigmoid||Adam||BCE|
|[a112]||FastSurfer CNN||Proposed FastSurfer CNN Network||NA||Softmax||Adam||Logistic & Dice Losses|
|[a113]||3D-CNN||CNN Layers (3) + ReLU Activation + Pooling Layers (3) + FC Layers (2)||2 (rate=0.5)||Softmax||Adadelta||CE|
|[a114]||3D-UNET||DCNN Layers (7) + ReLU Activation + Pooling Layers (2) + BN Layers (6)||2 (rate=0.5)||Softmax||SGD||weighted CE|
|[a115]||CNN-GRU||CNN Layers (4) + GRU Layers (2) + ReLU Activation + Pooling Layers (2) + FC Layers (5)||NA||Sigmoid||Adam||BCE|
|[a116]||1D CNN - LSTM||Proposed 1D-CNN LSTM with ReLU Activation||(rate=0.2)||Softmax||Adam||CCE|
|[a117]||CGRNN||CNN layers (3) + ReLU activation + Pooling layers (1)||1 (rate=0.5)||NA||Adam||BCE|
|+ GRU layers (1) + sigmoid activation + FC layer (1)|
|[a118]||ConvNet||variation of the U-net convolutional architecture||NA||NA||ADAM||Proposed Loss function|
|[a119]||3D-CNN||CNN Layers (2) + ELU Activations + Pooling Layers (2) + FC Layers (2)||NA||Sigmoid||SGD||BCE|
|[a120]||3DCNN C-LSTM||CNN Layers (8) + Conv-Bi LSTM Layers (2) + Sigmoid Activation (for LSTM)||8 (rate=0.2)||Softmax||Adam||CE|
|+ Pooling Layers (1) + FC Layers (1)|
|[a121]||AE||Proposed AE with 7 Layers||NA||DNN||NA||NA|
|[a122]||CapsNets||Standard Architecture||NA||K-Means Clustering||Adam||Proposed loss function|
|[a123]||VGGNets + ASDNet||CNN Layers (27) + ReLU Activation + Pooling Layers (10) + FC Layers (6)||6 (rate=0.5)||Softmax||SGD||CE|
|[a124]||DCNN||CNN Layers (7) + activation+ Pooling Layers (13) + FC Layers (3) + BN Layers (10)||7 (rate=0.25)||Softmax||SGD||NA|
|[a125]||Noisemes net||Standard networks||NA||RF||NA||NA|
|DiarTK Diarization net|
|[a126]||DCNN||CNN Layers (7) + ELU Activation + Pooling Layers (13) + FC Layers (3) + BN Layers (10)||7 (rate=0.25)||Softmax||SGD||NA|
|[a127]||SP-ASDNet||CNN Layers (2) + LSTM Layers (2) + Pooling Layers (3) + FC Layers (2)||2 (rate=NA)||NA||Adam||BCE|
|[a128]||TimeConvNet||convolutional spatiotemporal encoding layer+ backbone convolutional||NA||Softmax||Adam||CCE|
|neural network architecture (mini-Xception, ResNet20, MobileNetV2)|
|[a129]||Different Networks||Proposed structure||NA||NA||NA||NA|
|MTCNN||cascaded CNNs architecture||Naïve|
|[a131]||CNN-LSTM||CNN Layers (3) + LSTM Layers (1) + ReLU Activation||1 (rate=0.5)||Softmax||SGD||NA|
|+ Pooling Layers (3) + FC Layers (3)||1 (rate=0.2)|
|[a132]||CNN||CNN Layers (4) + Pooling Layers (2) + FC Layers (2)||NA||Softmax||NA||NA|
|[a133]||LSTM||LSTM layer (1)||NA||coherence representation||NA||NA|
|[a134]||LSTM||LSTM Layers (3) + Sigmoid Activation + FC Layers (1)||NA||NA||NA||CE|
|[a135]||DCN||CNN Layers (17) + Pooling Layers (3) + deconvolution layers (3) + learned priors (3)||NA||NA||NA||Proposed loss Function|
|[a136]||Pretrained resnet18||Standard ResNet-18 Architecture||Standard||Standard||Adam||BCE|
|[a137]||CNN||CNN Layers (2) + ReLU Activation + Pooling Layers (2) + FC Layers (2)||1 (rate=0.5)||NA||Adam||BCE|
|[a138]||AE||AE with 8 layers||NA||K-Means Clustering||NA||NA|
|[a139]||CultureNet||Faster R-CNN + modified ResNet50 + 5FC layers||NA||Softmax||Adelta||Proposed loss function|
|[a140]||DCNN||Proposed DCNN Architecture with Different Layers||NA||Decision Tree (DT)||Manual Optimization||NA|
|[a141]||CNN||CNN Layers (2) + ReLU Activation + FC Layers (3)||4 (rate=0.2)||Softmax||NA||NA|
|[a142]||SA-B3D with||CNN Layers (5) + LSTM Layers (1) + Pooling Layers (4) + FC Layers (1)||NA||Sigmoid||Adam||CE|
|LSTM model||Proposed loss function|
|[a143]||CNN||CNN Layers (3) + ReLU Activation + Pooling Layers (3) + FC Layers (1)||NA||SVM||SGD||NA|
|[a144]||DENN||Proposed DENN Architecture with ReLU Activation + FC Layers (2)||NA||Sigmoid||mini-batch SGD||CCE|
|[a145]||DNN||Propoded DNN with ReLU Activation + FC Layers (2)||(rate =0.2)||Sigmoid||Adam||BCE|