Over the past years, functional magnetic resonance imaging (fMRI) has been widely used for the identification of brain regions that are related to both functional segregation and integration. Regarding functional segregation, the conventional analysis relies on the identification of the activated voxels based on functional response models and multivariate statistics between experimental conditions (e.g. resting state vs. task-stimulated activity). A representative example is the General Linear Model (GLM) that is implemented in well established software packages such as SPM  and FSL . On the other hand, for the assessment of functional integration, there is a distinction between functional and effective connectivity . Functional connectivity (FC) analysis seeks for statistical dependencies (e.g. correlations, coherence) between brain regions. Effective connectivity (EC) analysis  tries to reveal the influence that one neural system exerts on another . A detailed review on the differences between FC and EC approaches can be found in .
Here, we focus on the construction of FCN based on resting-state fMRI (rsfMRI) recordings. In rsfMRI, there is no stimuli and thus the assessment of functional integration is more complex and not so straightforward compared to task-related experiments . Furthermore, spontaneous/ resting state brain activity as measured with fMRI has been also considered as a potential biomarker in psychiatric disorders (see e.g. the review of Zhou et al. ). In general, for the construction of FCN, two basic general frameworks are explored: (a) Seed-based Analysis (SBA) and (b) Independent Component-based Analysis (ICA). In the SBA , the (averaged) fMRI signals of the regions of interest (ROI) are correlated with each other; correlations above a threshold are considered functional connections between seeds/ROIs. Even though the SBA has been proved extremely useful in identifying functional networks of specific brain regions [7, 8, 9], its major disadvantage is the requirement of the a-priori knowledge of the functional organization of the brain, while possible correlations between seeds can be due to structured spatial confounds (e.g. scanner artifacts) . Furthermore, seed selection is based on standard coordinates, while at the subject level, anatomical differences may lead to the selection of functionally irrelevant voxels at the group level. Thus, despite the use of normalization techniques, the accuracy of this approach is shown to be limited . On the other hand, ICA  has arisen as an alternative/complementary approach since the early 2000s [13, 14, 15]. ICA decomposes the 4D fMRI data to a set of spatial components with maximum statistical independence and their associated time series. Smith et al. in a meta-analytic study of 30,000 rsfmRI scans with the aid of ICA revealed a functional “partition" of the brain into resting-state Networks (RSNs), such as the Sensorimotor, Default mode and Auditory networks. Applications of ICA include also data pre-processing, where noise-related components are regressed out from the original fMRI signals . However, while ICA produces spatial components that are statistically independent to each other, there is no clear link between the spatial components and specific brain functions and furthermore the spatial components cannot in general be ordered by relative importance . Another issue is that most of the standard algorithms that compute ICs utilize gradient based optimization algorithms that use an iterative scheme; the initial guesses in these algorithms are generated randomly making the whole process stochastic: for the same dataset, the obtained spatial components may differ significantly over repeated runs . Thus, the robustness/reproducibility of the ICA results over repeated runs may be questioned.
In order to tackle the above issues, several techniques have been proposed for the classification of ICs and the construction of subject specific ROIs [19, 20]
. Advances have been also made regarding the selection of the model order of the ICA decomposition, such as the Bayesian dimensionality estimation technique and the use of theoretical information criteria for model order selection . Finally, the so-called ranking and averaging ICA by reproducibility (RAICAR) [20, 18] (see also  for a critical discussion) aims at resolving issues regarding stochasticity and robustness of the ICA decomposition. RAICAR utilizes a sufficient number of ICA realizations and based on the reproducibility of the ICs aims to rank them in terms of the most “reliable" components. Reliable ICs among realizations are assessed via correlations and the final estimate of each component is averaged.
Alternatively and/or complementary to the above analysis, linear manifold learning algorithms such as Principal Component Analysis (PCA)[22, 23, 24] and Multidimensional Scaling (MDS) [25, 26] have been exploited. PCA has been succesfully applied in the pre-processing routine for dimensionality reduction (often prior to ICA) . Applications of PCA, include also the recovery of signals of interest  and the construction of FCN from fMRI scans in task-related experiments [23, 24]. In these studies, the performance of PCA with respect to the detection of regions of correlated voxels has been shown to be satisfactory but not without problems. For example, a study by Baumgartner et al.  highlighted the limits of PCA to correctly identify activation of brain regions in cases of low contrast-to-noise ratios (CNR) appearing when signal sources of e.g. physiological noise are present .
MDS has been also widely used in fMRI (mostly for task-based studies) mainly for the identification of similarities between brain regions in terms of voxel-wise connectivity [30, 31, 32, 33, 34, 35]. The implementation of MDS in neuroimaging dates back to the work of Friston et al. . There, it was investigated the embedded (voxel-wise) connectivity of fmRI data acquired during tasks of word generation between healthy and schizophrenia subjects. Salvador et al.  used MDS to investigate the embedded connectivity of anatomical regions of the brain from rsfMRI data. Benjaminsson et al.  used MDS to embed high-dimensional rsfMRI data from the mutual information space to a low dimensional euclidean space for the identification of RSNs. Herve et al.  used MDS to acquire a low dimensional approximation of interregional correlations for the investigation of the affective speech comprehension. Finally, in a meta-analytic study by Etkin et al. , MDS was exploited to provide a low-dimensional visualization of co-activation interrelations of ROIs. MDS has been also used in works investigating the functional (dys)connectivity associated with schizophrenia  and Asperger’s Syndrome .
However, thus far, only a few studies have exploited non-linear manifold learning algorithms such as Local Linear Embedding (LLE) , Isometric Feature Mapping (ISOMAP)  and Diffusion Maps  for the analysis of fMRI data and particularly for the construction of FCN. LLE has been applied in rsfMRI studies for the improvement of predictions in ageing studies  for low-dimensional clustering towards the classification of healthy subjects and patients with schizophrenia  and as an alternative method for dimensionality reduction before the application of ICA in task-related fMRI where non-linear relationships in the BOLD signal are introduced .
In Anderson et al. 
, ISOMAP was employed to a benchmark rsfMRI dataset of 146 subjects for the construction of embedded low-dimensional FCN for the classification between controls and schizophrenia patients. ROIs were selected using single-subject ICA and the similarities between the ICA components were assessed using a pseudo-distance measure based on lagged cross-correlation. Graph-theoretical measures were then used for classification between patients and healthy controls. Another study based on single-subject ICA, exploited ISOMAP to classify spatially unaligned fMRI scans. The study considered patients with schizophrenia versus healthy controls and different age groups (young/old) of healthy controls versus patients with alzheimer’s disease. Despite the relatively low sample sizes, results were promising with good classification rates. Recently, Haak et al.  utilized ISOMAP in an effort to create individualised connectopies from rsfMRI recordings taken from the WU-Minn Human Connectome Project in a fully data-driven manner.
Thus, only a handful of studies have used Diffusion Maps for the analysis of fMRI data, focused on the clustering of spatial maps of task-related experiments [51, 52]. Shen et al.  employed Diffusion Maps to separate activated from non-activated voxels. Sipola et al. used Diffusion Maps with a Gaussian kernel to cluster selected fMRI spatial maps that are derived by ICA. They demonstrated their approach using fMRI recordings acquired from healthy participants listening to a stimulus with a rich musical structure. Other applications of Diffusion Maps in neuroimaging concern the epileptic-seizure prediction and the identification pre-seizure state in EEG timeseries [53, 54]. A review on the intersection between manifold learning methods and the construction of FCN can be found in  and .
Here, we used MDS, ISOMAP and Diffusion Maps to construct embedded FCN from single-subject ICA analysis of rsfMRI data taken from healthy controls and schizophrenia patients. For our demonstrations, we used the COBRE rsfMRI data that are publicly available and have been used recently in many studies [57, 58, 48, 59]
. Based on key global graph-theoretical measures of the embedded graphs, we assessed their classification efficiency using several machine learning algorithms, namely Linear standard Support Vector Machines (LSVM), Radial (Radial basis function kernel) Support Vector Machines (RSVM), k-Nearest Neighbour (k-NN), and Artificial Neural Networks (ANN). We also investigated the performance of two most-commonly used metrics, namely the cross-correlation and the euclidean distance. Our analysis showed that Diffusion Maps outperformed all other methods and lagged cross-correlation proved to be a better choice than the euclidean metric.
At this point, we should note, that our study does not aim at the best classification performance by trying to find the best possible pre-processing pipe-line of the raw fMRI data and/or the selection of subjects and/or the selection of the best set of graph-theoretical measures that provide the maximum classification. Yet, we aim at using state-of-the-art manifold learning methods for the construction of the FCN and compare their classification efficiency using only a few key global theoretical graph-measures and compare the obtained results with those derived by similar studies (see e.g. ) using the same pipe-line for data pre-processing and single-subject ICA. To the best of our knowledge, this paper is the first to perform such a thorough comparative analysis of both linear and nonlinear manifold learning on rsfMRI data. It is also the first study to show how Diffusion Maps can be used for the construction of FCN from rsfMRI, assessing also the efficiency of two basic distance metrics, the cross-correlation-based and the Euclidean distance.
2 Materials and Methods
2.1 Pre-processing and signal extraction
For our demonstrations, we performed a basic pre-processing of the raw fMRI data using SPM as also implemented in other studies (see e.g. ). In particular, for the fMRI data processing, we used FEAT (FMRI Expert Analysis Tool) Version 6.00, part of FSL (FMRIB’s Software Library, www.fmrib.ox.ac.uk/fsl). In particular, the following pre-statistics processing was applied: motion correction using MCFLIRT , slice-timing correction using Fourier-space time-series phase-shifting; non-brain removal using BET , spatial smoothing using a Gaussian kernel of FWHM 5mm, grand-mean intensity normalization of the entire 4D dataset by a single multiplicative factor. ICA AROMA  was also implemented to detect and factor out noise-related independent components (ICs) along with a high-pass temporal filtering at 0.01 Hz (100 seconds) that was applied after ICA AROMA procedure as it is highly recommended .
We then proceeded with the decomposition of the pre-processed fMRI data to spatial ICs (for each suject) using the RAICAR methodology . In this way, we found the most reproducible spatial ICs over repeated runs as a solution to the well known problem of the variability of the ICA decomposition . This choice was related to the benchmark fMRIdata per se as there was only a single session per subject with relatively small duration (6 minutes); therefore we wouldn’t expect a robust ICA decomposition for all subjects (see also the discussion in ). Another choice would be to perform group-ICA analysis, but we decided to use single-subject ICA in order to have a common ground with the methodologically similar work presented in . Group-ICA analysis will be performed in a future study.
2.2 Ranking and Averaging ICA by Reproducibility (RAICAR)
2.2.1 Independent Component Analysis (ICA)
ICA is a linear data-driven technique that reduces the high-dimensional fMRI space in a set of statistically independent spatial components (ICs). This reduction can be represented as:
where is the measured BOLD signal, is the temporal amplitude (the matrix containing all temporal amplitudes is known as mixing matrix) and is the spatial magnitude of the i-th ICA component. While PCA requires that the principal components are uncorrelated and orthogonal, ICA asks for statistical independence between the ICs. Generally, ICA algorithms are based either on the minimization of mutual information or the maximization of non-gaussianity among components. As discussed in the introduction, most of the standard implementations of ICA, such as the one in MELODIC (Multivariate Exploratory Linear Optimized Decomposition into Independent Components) , which is part of FSL fMRI analysis software package, share similar gradient based optimization algorithms using an iterative scheme whose initial values are generated randomly, thus making the whole process stochastic. As a consequence, results over repeated runs may differ significantly . A solution to this problem is provided by the so-called ranking and averaging ICA by reproducibility (RAICAR)  that we briefly describe in the following section.
2.2.2 Ranking and averaging ICA by reproducibility (RAICAR)
The RAICAR methodology developed by Yang et al.  was addressed to tackle the problem of the ICs variability by performing ICA realizations. Thus, RAICAR leads to “slightly" different mixing matrices and different sets of ICs . Each realization finds a fixed number of spatial ICs. Then, a cross realization correlation matrix () of size is constructed and the alignment (ICA produces unaligned components) of ICs of each realization takes place on the basis of the absolute maximum cross correlation among components. Thus, the cross realization correlation matrix reads:
with are submatrices of size and their elements represent the absolute spatial cross correlation coefficients among components and across realizations. CRCM is a symmetric matrix and its diagonal consists of identity matrices which are ignored for the next steps of the algorithm.
The procedure starts with the identification of the global maximum of the CRCM, thus finding the matched component based on two realizations. At the next step, the methodology seeks for the highest absolute spatial cross correlation coefficients of the identified component in the remaining realizations factoring out all others. The procedure is repeated times until aligned components are found.
The next step involves the computation of the reproducibility index for each of the aligned components. This is done by constructing the histogram of the absolute spatial cross correlation coefficients of the upper triangle matrix of the CRCM. This histogram tends to be bimodal, as in general, we expect low spatial cross correlation among most of the ICs and high spatial cross correlation only for a few of them. A spatial cross correlation threshold is applied with the desired value lying in the valley of the histogram between the two modes . Finally, the reproducibility index is computed for each one of the aligned components. This is done by aggregating the supra-threshold absolute cross correlation coefficients of the CRCM for each of the aligned components.
The last step of the algorithm is the ranking and averaging of the aligned components in descending order based on the reproducibility index. The selective averaging is applied so that the components are averaged if and only if, the given aligned component has at least one absolute spatial cross correlation coefficient above the threshold across realizations.
After applying RAICAR, the ICs are chosen via a cut-off threshold based on the reproducibility index (of each component) that indicates how consistent is the appearance of an IC across realizations.
Here, we have set realizations (same as also in ); taking more realizations did not change the outcomes of the analysis. The spatial cross correlation threshold was chosen by localizing the minimum of the histogram of absolute cross correlation coefficients of the CRCM. This threshold was specified separately for each subject. The reproducible ICs were determined by the histogram of reproducibility index over the number of components in descending order. The cut-off threshold was set as the half of the maximum reproducibility index value possible (this choice is the same with the one used in ). This cut-off threshold was set equal for all subjects.
Subjects with less than 20 reproducible ICs were excluded from further analysis as this number of components resulted in disconnected graphs. Thus, we ended up with 104 subjects out of which 57 were healthy controls and 47 schizophrenia patients.
2.3 Construction of Functional Connectivity Networks
For the construction of FCN, we used all combinations between three manifold learning algorithms, namely MDS, ISOMAP and Diffusion Maps and two widely used metrics, namely the lagged cross-correlation [48, 63, 64] and the euclidian distance [52, 65, 66].
2.3.1 Construction of FCN based on the lagged cross-correlation
For every pair of the associated time courses of the ICs, say and , the cross-correlation function (CCF) over a maximum of three time lags (as in ) reads:
where is the time lag, and is the mean value of the whole time series.
For the construction of the FCN connectivity/ correlation matrices, we used a pseudo-distance measure defined as (see also ):
The resulting dis(similarity) matrices are fully connected and therefore are hardly comparable between subjects (see the discussion in ). Thus, here as a standard practice, (and in all other algorithms described below), we applied thresholding to the (dis)similarity matrices in order to keep the strongest connections of the derived functional connectivity matrix. In order to factor out the influence of the variable network density on the computation and comparison of graph-theoretical measures across groups , we have implemented the approach of proportional thresholding (PT) . In particular, we considered a range of levels of PT from 20% to 70% with a step of 2%. Below the threshold of 20%, graphs became too fragmented, while thresholds above the 70% resulted in dense graphs, that mostly included noisy and less significant edges (see also the discussion in  about this issue). If a graph was still fragmented after thresholding, the giant component was used for further analysis.
2.3.2 Construction of FCN based on the euclidean distance
The euclidean distance is used in many studies to assess (dis)similarities between fMRI data points [52, 65, 66]. For time series associated with the independent spatial maps, and , the euclidean distance reads:
For the construction of FCN, PT was applied to the euclidean similarity matrices for each individual over the range of %- %.
2.4 Construction of FCN with manifold learning algorithms
Below we present how MDS, ISOMAP and Diffusion Maps can be exploited to construct (embedded) FCN.
2.4.1 Construction of FCN with MDS
Multidimensional Scaling  is a linear method of dimensionality reduction that can be used to find similarities between pairs of objects in a low-dimensional (embedded) space. Given a set of objects/observables , MDS produces a low-dimensional data representation minimizing the objective function:
is the (dis)similarity obtained (by any (dis)similarity measure of choice) between all pairs of points . In our case, the observables are the amplitudes of the spatial ICs . Here, (number of time points).
The coordinates of the embedded manifold are given by:
contains the square roots of the
largest eigenvalues, and
are the corresponding eigenvectors of the matrix:
is the double centering matrix defined as:
Using MDS, the dimensionality reduction of the original data yields the embedding of , .
Here, for the construction of the embedded FCN, we produced distance matrices (using either lagged cross-correlation or the euclidean distance) of size .
For the implementation of the MDS algorithm, we used the “cmdscale" function contained in the software package“Stats" in the R free Software Environment .
2.4.2 Construction of FCN using ISOMAP
ISOMAP is a non-linear manifold learning algorithm that given a set of objects/observables produces a low-dimensional data representation , minimizing the objective function:
where is the shortest path (geodesic distance) and is the (dis)similarity obtained (by any (dis)similarity measure of choice) between all pairs of points .
In our case, the observables are the amplitudes of the spatial ICs .
The above minimization problem is solved as follows: :
Construct a graph , where the vertices are the ICs ; its links are created by using either the -nearest neighbors algorithm or a fixed distance between nodes, known as the distance. For example, a link between two ICs is created if . Here, we used the nearest neighbours algorithm with ). Set the weight of the link (if any) between as . If there is not a link set: .
Approximate the embedded manifold by estimating the shortest path (geodesic distance) for each pair of nodes based on the distances ; this step can be implemented for example using the Dijkstra algorithm . This procedure results to a matrix, with elements the shortest paths:
Estimate the coordinates of the low-dimensional (embedded) manifold exploiting the MDS algorithm  on the geodesic distance matrix .
2.4.3 Diffusion maps
Diffusion Maps  is a non-linear manifold learning algorithm that given a set of objects/observables produces a low-dimensional representation , , addressing the diffusion distance among data points as the preserved metric. The embedding of the data in the low-dimensional space is obtained by the projections on the eigenvectors of a normalized graph Laplacian . The Diffusion Maps algorithm can be described in a nutshell in the following steps:
Construction of the affinity matrix, here is the number of ICs for each subject. The elements represent the weighted edges connecting nodes and using the so-called heat kernel:
where is a -dimensional point (here, N=150), are the (dis)similarities obtained (by any dissimilarity measure of choice) between all pairs of points and is an appropriately chosen parameter which can be physically described as a scale parameter of the heat kernel . The heat kernel
satisfies two important properties, the one of symmetry and the other of positive semi-definite matrix. The latter property is crucial and allows the interpretation of weights as scaled probabilities of “jumping" from one node to another.
The parameter of the neighborhood size is data-dependent and here, it was determined by finding the linear region in the sum of all weights in , say , using different values of [74, 52]. is calculated through the formula:
In order to use a single value of for all participants, we computed a super-distribution of the sum of weights across subjects (taking the median value of the distributions) using different values of . Thus, we considered values of lying in the linear region of the super-distribution. After inspection, every value of was lying in the linear region of every single subject’s logarithmic plot.
Formulation of the diagonal normalization matrix along with the diffusion matrix :
Each element of the symmetric and normalized diffusion matrix reflects the connectivity between two data points and . As an analogy, this connectivity can be seen as the probability of “jumping" from one point to another in a random walk process. Consequently, raising to a power of can be thought as a diffusion process. As the number of increases, paths with low probability tend to zero, while the connectivity between paths with high probability remains high enough, thus governing the diffusion process . Thus, the algorithm of Diffusion Maps preserves the diffusion distance among points in a low-dimensional euclidean space. The diffusion distance is closely related to the diffusion matrix and for two distinct points , and for specific time instance is defined as :
Unlike geodesic distance, the diffusion distance is robust to noise perturbations, as it sums over all possible paths (of steps) between points .
Singular Value Decomposition (SVD) of yields
where is a diagonal matrix containing the eigenvalues of and the eigenvectors of . The eigenvectors of can be bow found by :
By taking out the trivial eigenvalue of the matrix and the corresponding eigenvector contained in , the coordinates of the low dimensional embedded manifold are given by:
where contains the largest eigenvalues, and are the corresponding eigenvectors of the diffusion matrix .
2.4.4 Choice of the embedded dimension
Here, the embedded dimension was determined via the eigenspectrum of the final decomposition for every dimensionality reduction/Manifold learning algorithm. If there is a gap between the first few larger eigenvalues of the final decomposition and the rest of the eigenspectrum, these few eigenmodes capture most of the distance differences between data points and are able to represent and uncover intrinsic properties of the data structure [76, 78, 79]. In order to determine the embedded dimension for the methods described above, we considered the following steps: We sorted the eigenvalues in decreasing order eg. (for diffusion maps is discarded). Then, for each subject, we calculated the consequent pairwise differences , , … , . A gap between the mean (over all subjects) differences determines from which point and further, the relative contribution of another eigendimension is redundant (this having a small contribution to the reconstruction of the embedded FCN). Thus, here for our computations, we considered low-dimensional embeddings of dimensions for each one of the manifold learning methods described above.
2.5 Graph-theoretical measures
Here, we analyzed the topological properties of the binary FCN graphs on the basis of three major graph measures, namely, the average path length, the global clustering coefficient, and the median degree, which are the most frequently used in neuroscience .
In particular, given a graph with representing the link (0: unconnected or 1: connected) from node to node and the degree of node i, the graph measures are computed as follows:
a) The average path length is defined by: , i.e. is the average number of steps along the shortest paths for all possible pairs of the network nodes. This is a measure of the efficiency of information or mass transport on a network between all possible pairs of nodes.
b) The global clustering coefficient is defined by: , where is a triplet and is a closed triplet. A triplet of a graph consists of three nodes that are connected by either to open (i.e open triplet) or closed (i.e closed triplet) ties. In general, this measure indicates how random or structured a graph is (in our case, in terms of functional segregation).
c) The median degree is the median value of the degree distribution of . This measure reflects how well connected is the “median" network node in terms of the number of links that coincide with it.
An extensive review of definitions and meaning of the above and other graph-theoretical measures with respect to brain functional networks can be found in [81, 80].
The computations for the graph analysis were performed with the “igraph"  package contained in the free software environment R .
2.6 Classification Algorithms
Based on the three key graph measures, the classification was assessed using machine learning algorithms, namely Linear Support Vector Machines(LSVM), Radial Support Vector Machines (RSVM), Artificial Neural Networks (ANN) and k-NN classification (for a brief description of the above algorithms and their parameters see in the Appendix). The classification algorithms were trained, validated and tested using a 10-fold cross-validation scheme which was repeated 100 times. Thus, we separated the data in ten separate sub-samples; nine of them were used as training sets and one of them was used for validation. This process was repeated 10 times leaving out each time a different sub-sample which served as a validation set. The whole procedure was repeated 100 times. The overall classification rate was determined via the computation of the average classification rate over all the repetitions of the 10-fold cross validation for each model.
The average confusion matrix (over all repetitions of the 10-fold cross validation) was also computed for each classification model. The confusion matrix is a(in the case of binary classification) square matrix containing all true positives , false positives , true negatives and false negatives . Here, we considered as positives the schizophrenia cases and as negatives the healthy control cases. Sensitivity (also called the True Positive Rate) and Specificity (also called the True Negative Rate) are basic statistical measures for the assessment of binary classifications. The sensitivity is given by , while the specificity is given by . Here, sensitivity characterizes the ability of the classifier to correctly identify a schizophrenic subject, while specificity is the ability of the classifier to correctly identify a healthy subject.
Here, we used the algorithms contained in the package “caret"  contained in the R free software environment .
3.1 Classification performance using the lagged cross-correlation metric
In table 1, we present the best classification accuracy, along with the corresponding sensitivity and specificity rate for each manifold learning algorithm, threshold and classifier when using the lagged cross-correlation metric. At the end of the table 1, we provide also the results obtained by the “conventional" (thresholded) cross-correlation matrix.
Fig. 1 shows the super-distribution of the sum of weights of all subjects against different values of used for construction of the FCN using the Diffusion Maps algorithm; the red dotted vertical line shows the optimal (here, = 0.325) while the black vertical lines bound the linear region (). The results were robust to different choices of the time step of the Diffusion Maps algorithm, namely . The investigation of the optimal was performed with respect to the best classification accuracy over all classifiers.
Fig.2 depicts the best classification rates for MDS, ISOMAP, Diffusion Maps and lagged-cross correlation matrix for different classifiers are reported. Diffusion Maps provided the best classification accuracy (79.3% for SVM with an RBF kernel and 52% PT), thus appearing more robust over a wide range of PT. With respect to the maximum classification accuracy obtained by Diffusion Maps, results were robust over a wide range of values of .
Fig. 3 depicts the classification performance obtained with Diffusion Maps for different values of .
Isomap performed better for low thresholds. Overall, the performance of Isomap was more or less the same with the cross-correlation matrix with the best classification rate being slightly higher for Isomap (74.4% for SVM with an RBF kernel and 24% PT). Its performance was however sensitive to the choice of the number of nearest neighbors; with we got a 74.4% classification accuracy for SVM with an RBF kernel and 24% PT) while for we got a 71% classification accuracy for ANN and 20% PT), thus providing the best performance for ISOMAP. MDS was outperformed by both ISOMAP and Diffusion Maps as well as the cross correlation matrix; the classification performance of MDS was overall relatively poor.
Fig. 4 shows characteristic eigenspectrums of the final decomposition for MDS, ISOMAL and Diffusion Maps. As it is shown, there are three gaps: the first gap appears between the first eigenvalue and the rest of the spectrum, the second gap between the first two eigenvalues and the rest of the spectrum, and a third gap appears between the first four-five eigenvalues and the rest of the spectrum.
3.2 Classification performance using the euclidean distance
The same analysis was performed for the euclidean distance.
The best classification rates using the euclidean distance for all manifold learning methods and classifiers are presented in table 2.
Overall, the classification rates with the euclidean distance were more noisy compared to the ones computed with the lagged cross-correlation distance.
For a wide range of PT, the classification rates obtained were under 60% and sometimes at the level of a random or a completely biased classifier (towards the larger class, here, the healthy controls). In our case, a random classifier resulted to a 50.5% classification accuracy, while for a strict majority rule for imbalanced datasets, a completely biased (and thus useless) classification model results to a 54.8 % classification rate.
Fig. 5 depicts the accuracy of all methods across all thresholds using all different classifiers.
In this case, the best classification rate was obtained with Isomap (72.9% for RSVM and 68% PT). Despite the fact that for certain thresholds, the maximum classification rates obtained with the euclidean distance were enough high, the performance of the methods was generally poorer compared to the ones computed with the lagged cross-correlation distance for all the range of PT.
Characteristic eigenspectrums for all manifold learning algorithms utilizing the euclidean distance are shown in Fig. 6.
Finally, in Fig. 7 are given the boxplots of the classification rates among metrics used. Overall, the lagged cross-correlation distance resulted in higher classification rates compared to the euclidean distance with the exception of MDS. Except from some specific thresholds, MDS in both cases performed poorly.
In this study, we constructed embedded FCN from rsfMRI data using linear and non-linear manifold learning techniques. Based on the graph theoretical measures of the constructed FCN we then used machine learning algorithms for classification purposes. We also compared the performance of two widely used metrics, namely the lagged cross-correlation and the euclidian metric. For our illustrations, we used a publicly available dataset of resting fMRI recordings taken from healthy and patients with schizophrenia. This is the first study that performs such a systematic comparative analysis between various manifold learning algorithms, machine learning algorithms and metrics. To the best of our knowledge, it is also the first study that shows how Diffusion Maps can be exploited to construct FCN from fMRI data.
At this point we should note that our intention was not to try to obtain the best possible classification performance by “optimising" the pre-processing of the raw fMRI data and/or by trying to find the best set of graph-theoretical measures. Instead we used a very basic pre-processing of the raw data (as the one performed in ), while classification was based only on three key graph-theoretical measures in neuroscience , namely, the average path length, the global clustering coefficient and the median degree of the embedded binary networks. Based on the above, our best reported accuracy was 79.3 % obtained with Diffusion Maps and lagged cross-correlation (using 10 fold cross validation repeated 100 times).
For the same benchmark fMRI dataset Anderson and Cohen  used ISOMAP for the construction of embedded low-dimensional FCN for the classification between controls and schizophrenia patients. ROIs were acquired using single subject ICA and functional connectivity was accessed using the lagged cross-correlation distance. The analysis revealed differences in small world properties among groups and 13 graph theoretic features led to a reported 65% accuracy rate. Algunaid et al. 
reported a 95% accuracy (with FSV feature selection method and 82.5% with a Welch’s t-test) testing more than 360 graph-based features. AAL was used for signal extraction and ROI selection, while an SVM-based classifier was used along with 10 fold cross validation scheme to evaluate its performance.
An important outcome of our analysis is that the Diffusion Maps result in more robust results and higher classification accuracy and that the lagged cross-correlation distance outperforms the euclidean distance which is used traditionally to construct connectivity matrices from fMRI data.
In a future work, we aim at analysing the local-graph theoretical properties, thus matching the derived RAICAR ICs with known resting state networks and also perform group ICA analysis.
5 Data availability
The COBRE dataset is publicly available at http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html. For our analysis, we used the “R" software subroutines as described in 2 and 7.
6 Author Contributions
Conceptualization:Constantinos Siettos; Methodology: Constantinos Siettos; Formal analysis and investigation: Ioannis Gallos; Data acquisition and processing: Ioannis Gallos and Evangelos Galaris; Writing- original draft preparation: Ioannis Gallos; Writing - review and editing: Constantinos Siettos and Ioannis Gallos; Supervision:Constantinos Siettos
For classification we used Linear Support Vector Machines (LSVM), Radial (Radial basis function kernel) Support Vector Machines (RSVM), one hidden layer Artificial Neural Networks (ANN) and k-NN classifier(k-NN). All classifiers were trained and evaluated via repeated 10-fold cross-validation scheme repeated 100 times. For the classification we used the three key graph theoretical measures as described in subsection 2.5 in Materials and Methods. Training and classification were implemented using algorithms contained in package “caret"  publicly available in R free software environment .
7.1 Support Vector Machines (SVM)
Support vectors machines (SVM) aim at finding the optimal separating plane or hyperplane in the feature space among groups. In particular, for a set of points, where is the number of subjects, contains attributes/features selected for subject and the subject’s class (here, either healthy or patient), SVM attempts to find the optimal plane or hyperplane that divides the two classes by maximizing the margin of separation. Any hyperplane with a given set of points can be modelled as where represent the weights of features . Parallel hyperplanes can be described as if and if . The optimization problem then aims at maximizing the margin between hyperplanes such that for every . One can take advantage of a regularization parameter indicating the penalty of error that gives a trade-off between misclassifications and the width of the separating margin. This leads to the final optimization problem, which minimizes subject to , .
Based on the idea that the data maybe better separable in a higher dimensional space, SVM may utilize a kernel function to map to , . In our study, besides standard Linear SVM (LSVM), we also used Radial SVM (RSVM) making use of the Radial basis functions kernel given by , where is the kernel’s scale parameter.
7.2 k-Nearest Neighbours classifier(k-NN)
k-nearest neighbours algorithm is one of the simplest classification/machine learning algorithms. Given , where is the number of subjects, contains attributes/features selected for subject and the subject’s class (here, either healthy or patient), k-NN utilizes euclidean distance in the feature space to perform a voting system among closest neighbours. In this manner, each point is classified as “control", if the number of“control" neighbours is greater than the number of “patient" neighbours and inversely. The number
of closest neighbours is a parameter of choice that plays a crucial role in method’s performance. In this study, it is important to note that we chose odd values of(i.e how many neighbours we take into consideration) in order not to have to break possible ties in the voting system among neighbours
7.3 Artificial Neural Networks (ANN)
In this study, we also used feed-forward Artificial Neural Networks (ANN) consisting of one hidden layer. The input units were three as the number of the features considered for classification. We have chosen one hidden layer consisting from 1 to 5 neurons along with a bias term. The activation function used for all neurons was the logistic transfer function. The output was one node (reflecting simple binary classification control/patient). The training procedure of the model was done via back-propagation  using a 10-fold cross validation scheme repeated 100 times. Finally a weight decay parameter (regularization parameter) was used to prevent over-fitting and improve generalization  of the final model. For the implementation of the ANN we used the “nnet" software package  publicly available in R free software environment .
7.4 Parameters tested for each classifier
We tuned the parameters of the algorithms via grid search.
For the SVM:
For the k-NN classifier: .
For the ANN: number of neurons in the hidden layer ,
decay level .
-  Karl J Friston, Andrew P Holmes, Keith J Worsley, J-P Poline, Chris D Frith, and Richard SJ Frackowiak. Statistical parametric maps in functional imaging: a general linear approach. Human brain mapping, 2(4):189–210, 1994.
-  Stephen M Smith, Mark Jenkinson, Mark W Woolrich, Christian F Beckmann, Timothy EJ Behrens, Heidi Johansen-Berg, Peter R Bannister, Marilena De Luca, Ivana Drobnjak, David E Flitney, et al. Advances in functional and structural mr image analysis and implementation as fsl. Neuroimage, 23:S208–S219, 2004.
-  Karl J Friston. Functional and effective connectivity: a review. Brain connectivity, 1(1):13–36, 2011.
-  Meenakshi Khosla, Keith Jamison, Gia H Ngo, Amy Kuceyeski, and Mert R Sabuncu. Machine learning in resting-state fmri analysis. Magnetic resonance imaging, 2019.
-  Yuan Zhou, Kun Wang, Yong Liu, Ming Song, Sonya W Song, and Tianzi Jiang. Spontaneous brain activity observed with functional magnetic resonance imaging as a potential biomarker in neuropsychiatric disorders. Cognitive neurodynamics, 4(4):275–294, 2010.
-  Bharat Biswal, F Zerrin Yetkin, Victor M Haughton, and James S Hyde. Functional connectivity in the motor cortex of resting human brain using echo-planar mri. Magnetic resonance in medicine, 34, 1995.
-  Michael D Greicius, Ben Krasnow, Allan L Reiss, and Vinod Menon. Functional connectivity in the resting brain: a network analysis of the default mode hypothesis. Proceedings of the National Academy of Sciences, 100(1):253–258, 2003.
-  Michael D Fox, Abraham Z Snyder, Justin L Vincent, Maurizio Corbetta, David C Van Essen, and Marcus E Raichle. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proceedings of the National Academy of Sciences, 102(27):9673–9678, 2005.
-  Daniel S Margulies, AM Clare Kelly, Lucina Q Uddin, Bharat B Biswal, F Xavier Castellanos, and Michael P Milham. Mapping the functional connectivity of anterior cingulate cortex. Neuroimage, 37(2):579–588, 2007.
-  David M Cole, Stephen M Smith, and Christian F Beckmann. Advances and pitfalls in the analysis and interpretation of resting-state fmri data. Frontiers in systems neuroscience, 4:8, 2010.
-  Michael D Saxe, Fortunato Battaglia, Jing-Wen Wang, Gael Malleret, Denis J David, James E Monckton, A Denise R Garcia, Michael V Sofroniew, Eric R Kandel, Luca Santarelli, et al. Ablation of hippocampal neurogenesis impairs contextual fear conditioning and synaptic plasticity in the dentate gyrus. Proceedings of the National Academy of Sciences, 103(46):17501–17506, 2006.
-  Aapo Hyvärinen and Erkki Oja. Independent component analysis: algorithms and applications. Neural networks, 13(4-5):411–430, 2000.
-  Christian F Beckmann, Marilena DeLuca, Joseph T Devlin, and Stephen M Smith. Investigations into resting-state connectivity using independent component analysis. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1457):1001–1013, 2005.
-  Christian F Beckmann and Stephen M Smith. Tensorial extensions of independent component analysis for multisubject fmri analysis. Neuroimage, 25(1):294–311, 2005.
-  Dae Il Kim, Jing Sui, Srinivas Rachakonda, Tonya White, Dara S. Manoach, V. P. Clark, Beng-Choon Ho, S. Charles Schulz, and Vince D. Calhoun. Identification of imaging biomarkers in schizophrenia: A coefficient-constrained independent component analysis of the mind multi-site schizophrenia study. Neuroinformatics, 8(4):213–229, jul 2010.
-  Stephen M Smith, Peter T Fox, Karla L Miller, David C Glahn, P Mickle Fox, Clare E Mackay, Nicola Filippini, Kate E Watkins, Roberto Toro, Angela R Laird, et al. Correspondence of the brain’s functional architecture during activation and rest. Proceedings of the National Academy of Sciences, 106(31):13040–13045, 2009.
-  Raimon HR Pruim, Maarten Mennes, Daan van Rooij, Alberto Llera, Jan K Buitelaar, and Christian F Beckmann. Ica-aroma: A robust ica-based strategy for removing motion artifacts from fmri data. Neuroimage, 112:267–277, 2015.
-  Johan Himberg, Aapo Hyvärinen, and Fabrizio Esposito. Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage, 22(3):1214–1222, 2004.
-  Gustavo SP Pamplona, Bruno H Vieira, Frank Scharnowski, and Carlos EG Salmon. Personode: A toolbox for ica map classification and individualized roi definition. Neuroinformatics, pages 1–11, 2020.
-  Zhi Yang, Stephen LaConte, Xuchu Weng, and Xiaoping Hu. Ranking and averaging independent component analysis by reproducibility (raicar). Human brain mapping, 29(6):711–725, 2008.
-  Yi-Ou Li, Tülay Adalı, and Vince D Calhoun. Estimating the number of independent components for functional magnetic resonance imaging data. Human brain mapping, 28(11):1251–1266, 2007.
-  I.T Jollife. Principal Component Analysis, Second Edition. Springer-Verlag, 2002.
-  Keith J Worsley, Jen-I Chen, Jason Lerch, and Alan C Evans. Comparing functional connectivity via thresholding correlations and singular value decomposition. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1457):913–920, 2005.
-  R Baumgartner, L Ryner, W Richter, R Summers, M Jarmasz, and R Somorjai. Comparison of two exploratory data analysis methods for fmri: fuzzy clustering vs. principal component analysis. Magnetic Resonance Imaging, 18(1):89–94, 2000.
-  Joseph B Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1):1–27, 1964.
-  Karl J Friston, Chris D Frith, Paul Fletcher, PF Liddle, and Richard SJ Frackowiak. Functional topography: multidimensional scaling and functional connectivity in the brain. Cerebral cortex, 6(2):156–164, 1996.
-  Armin Iraji, Vince D Calhoun, Natalie M Wiseman, Esmaeil Davoodi-Bojd, Mohammad RN Avanaki, E Mark Haacke, and Zhifeng Kou. The connectivity domain: Analyzing resting state fmri data using feature-based data-driven and model-based methods. Neuroimage, 134:494–507, 2016.
-  Roberto Viviani, Georg Grön, and Manfred Spitzer. Functional principal component analysis of fmri data. Human brain mapping, 24(2):109–129, 2005.
-  Kaiming Li, Lei Guo, Jingxin Nie, Gang Li, and Tianming Liu. Review of methods for functional brain connectivity detection using fmri. Computerized Medical Imaging and Graphics, 33(2):131–139, 2009.
-  Svetlana V Shinkareva, Jing Wang, and Douglas H Wedell. Examining similarity structure: multidimensional scaling and related approaches in neuroimaging. Computational and mathematical methods in medicine, 2013, 2013.
-  Charidimos Tzagarakis, Trenton A Jerde, Scott M Lewis, Kâmil Uğurbil, and Apostolos P Georgopoulos. Cerebral cortical mechanisms of copying geometrical shapes: a multidimensional scaling analysis of fmri patterns of activation. Experimental brain research, 194(3):369–380, 2009.
-  Alice J O’Toole, Fang Jiang, Hervé Abdi, Nils Pénard, Joseph P Dunlop, and Marc A Parent. Theoretical, statistical, and practical perspectives on pattern-based classification approaches to the analysis of functional neuroimaging data. Journal of cognitive neuroscience, 19(11):1735–1752, 2007.
-  James V Haxby, M Ida Gobbini, Maura L Furey, Alumit Ishai, Jennifer L Schouten, and Pietro Pietrini. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539):2425–2430, 2001.
-  Svetlana V Shinkareva, Vicente L Malave, Marcel Adam Just, and Tom M Mitchell. Exploring commonalities across participants in the neural representation of objects. Human brain mapping, 33(6):1375–1383, 2012.
-  Hans P Op de Beeck, Marijke Brants, Annelies Baeck, and Johan Wagemans. Distributed subordinate specificity for bodies, faces, and buildings in human ventral visual cortex. Neuroimage, 49(4):3414–3425, 2010.
-  Raymond Salvador, John Suckling, Martin R Coleman, John D Pickard, David Menon, and ED Bullmore. Neurophysiological architecture of functional magnetic resonance images of human brain. Cerebral cortex, 15(9):1332–1342, 2005.
-  Simon Benjaminsson, Peter Fransson, and Anders Lansner. A novel model-free data analysis technique based on clustering in a mutual information space: application to resting-state fmri. Frontiers in systems neuroscience, 4:34, 2010.
-  Pierre-Yves Hervé, Annick Razafimandimby, Mathieu Vigneau, Bernard Mazoyer, and Nathalie Tzourio-Mazoyer. Disentangling the brain networks supporting affective speech comprehension. NeuroImage, 61(4):1255–1267, 2012.
-  Amit Etkin and Tor D Wager. Functional neuroimaging of anxiety: a meta-analysis of emotional processing in ptsd, social anxiety disorder, and specific phobia. American Journal of Psychiatry, 164(10):1476–1488, 2007.
-  DE Welchew, GD Honey, Tonmoy Sharma, TW Robbins, and ET Bullmore. Multidimensional scaling of integrated neurocognitive function and schizophrenia as a disconnexion disorder. NeuroImage, 17(3):1227–1239, 2002.
-  David E Welchew, Chris Ashwin, Karim Berkouk, Raymond Salvador, John Suckling, Simon Baron-Cohen, and Ed Bullmore. Functional disconnectivity of the medial temporal lobe in asperger’s syndrome. Biological psychiatry, 57(9):991–998, 2005.
-  Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500):2323–2326, 2000.
-  Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.
-  Ronald R Coifman and Stéphane Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
-  Anqi Qiu, Annie Lee, Mingzhen Tan, and Moo K Chung. Manifold learning on brain functional networks in aging. Medical image analysis, 20(1):52–60, 2015.
-  Hui Shen, Lubin Wang, Yadong Liu, and Dewen Hu. Discriminative analysis of resting-state functional connectivity patterns of schizophrenia using low dimensional embedding of fmri. Neuroimage, 49(4):3110–3121, 2010.
-  Peter Mannfolk, Ronnie Wirestam, Markus Nilsson, Freddy Ståhlberg, and Johan Olsrud. Dimensionality reduction of fmri time series data using locally linear embedding. Magnetic Resonance Materials in Physics, Biology and Medicine, 23(5-6):327–338, 2010.
-  Ariana Anderson and Mark Steven Cohen. Decreased small-world functional network connectivity and clustering across resting state networks in schizophrenia: an fmri classification tutorial. Frontiers in Human Neuroscience, 7:520, 2013.
-  Ariana Anderson, Ivo D Dinov, Jonathan E Sherin, Javier Quintana, Alan L Yuille, and Mark S Cohen. Classification of spatially unaligned fmri scans. neuroimage, 49(3):2509–2519, 2010.
-  Koen V Haak, Andre F Marquand, and Christian F Beckmann. Connectopic mapping with resting-state fmri. Neuroimage, 170:83–94, 2018.
-  Xilin Shen and François G Meyer. Analysis of event-related fmri data using diffusion maps. In Biennial International Conference on Information Processing in Medical Imaging, pages 652–663. Springer, 2005.
-  Tuomo Sipola, Fengyu Cong, Tapani Ristaniemi, Vinoo Alluri, Petri Toiviainen, Elvira Brattico, and Asoke K Nandi. Diffusion map for clustering fmri spatial maps extracted by independent component analysis. In 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6. IEEE, 2013.
-  Wenzhao Lian, Ronen Talmon, Hitten Zaveri, Lawrence Carin, and Ronald Coifman. Multivariate time-series analysis and diffusion maps. Signal Processing, 116:13–28, 2015.
-  Dominique Duncan, Ronen Talmon, Hitten P Zaveri, and Ronald R Coifman. Identifying preseizure state in intracranial eeg data using diffusion kernels. Math. Biosci. Eng, 10(3):579–590, 2013.
-  Constantinos Siettos and Jens Starke. Multiscale modeling of brain dynamics: from single neurons and networks to mathematical tools. Wiley Interdisciplinary Reviews: Systems Biology and Medicine, 8(5):438–458, 2016.
-  Jonas Richiardi, Sophie Achard, Horst Bunke, and Dimitri Van De Ville. Machine learning with brain graphs: predictive modeling approaches for functional imaging in systems neuroscience. IEEE Signal Processing Magazine, 30(3):58–70, 2013.
-  Vince D Calhoun, Jing Sui, Kent Kiehl, Jessica A Turner, Elena A Allen, and Godfrey Pearlson. Exploring the psychosis functional connectome: aberrant intrinsic networks in schizophrenia and bipolar disorder. Frontiers in psychiatry, 2:75, 2012.
-  Andrew R Mayer, David Ruhl, Flannery Merideth, Josef Ling, Faith M Hanlon, Juan Bustillo, and Jose Cañive. Functional imaging of the hemodynamic sensory gating response in schizophrenia. Human brain mapping, 34(9):2302–2312, 2013.
-  Muhammad Naveed Iqbal Qureshi, Jooyoung Oh, Dongrae Cho, Hang Joon Jo, and Boreom Lee. Multimodal discrimination of schizophrenia using hybrid weighted feature concatenation of brain functional connectivity and anatomical features with an extreme learning machine. Frontiers in neuroinformatics, 11:59, 2017.
-  Mark Jenkinson, Peter Bannister, Michael Brady, and Stephen Smith. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage, 17(2):825–841, 2002.
-  Stephen M Smith. Fast robust automated brain extraction. Human brain mapping, 17(3):143–155, 2002.
-  Christian F Beckmann and Stephen M Smith. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE transactions on medical imaging, 23(2):137–152, 2004.
-  Regina J Meszlényi, Petra Hermann, Krisztian Buza, Viktor Gál, and Zoltán Vidnyánszky. Resting state fmri functional connectivity analysis using dynamic time warping. Frontiers in neuroscience, 11:75, 2017.
-  James S Hyde and Andrzej Jesmanowicz. Cross-correlation: an fmri signal-processing strategy. NeuroImage, 62(2):848–851, 2012.
-  Archana Venkataraman, Koene RA Van Dijk, Randy L Buckner, and Polina Golland. Exploring functional connectivity in fmri via clustering. In Proceedings of the… IEEE International Conference on Acoustics, Speech, and Signal Processing/sponsored by the Institute of Electrical and Electronics Engineers Signal Processing Society. ICASSP (Conference), volume 2009, page 441. NIH Public Access, 2009.
-  Cyril Goutte, Peter Toft, Egill Rostrup, Finn Å Nielsen, and Lars Kai Hansen. On clustering fmri time series. NeuroImage, 9(3):298–310, 1999.
-  Martijn P. van den Heuvel, Siemon C. de Lange, Andrew Zalesky, Caio Seguin, B.T. Thomas Yeo, and Ruben Schmidt. Proportional thresholding in resting-state fMRI functional connectivity networks and consequences for patient-control connectome studies: Issues and recommendations. NeuroImage, 152:437–449, may 2017.
-  Rami F Algunaid, Ali H Algumaei, Muhammad A Rushdi, and Inas A Yassine. Schizophrenic patient identification using graph-theoretic features of resting-state fmri data. Biomedical Signal Processing and Control, 43:289–299, 2018.
-  R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2014, 2014.
-  Edsger W Dijkstra. A note on two problems in connexion with graphs. Numerische mathematik, 1(1):269–271, 1959.
-  Jari Oksanen, Roeland Kindt, Pierre Legendre, Bob O’Hara, M Henry H Stevens, Maintainer Jari Oksanen, and MASS Suggests. The vegan package. Community ecology package, 10:631–637, 2007.
-  Boaz Nadler, Stephane Lafon, Ioannis Kevrekidis, and Ronald R Coifman. In Advances in neural information processing systems, pages 955–962, 2006.
-  Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
-  Amit Singer, Radek Erban, Ioannis G Kevrekidis, and Ronald R Coifman. Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. Proceedings of the National Academy of Sciences, 106(38):16090–16095, 2009.
J De la Porte, BM Herbst, W Hereman, and SJ Van Der Walt.
An introduction to diffusion maps.
Proceedings of the 19th Symposium of the Pattern Recognition Association of South Africa (PRASA 2008), Cape Town, South Africa, pages 15–25, 2008.
Boaz Nadler, Stephane Lafon, Ronald Coifman, and Ioannis G Kevrekidis.
Diffusion maps-a probabilistic interpretation for spectral embedding
and clustering algorithms.
Principal manifolds for data visualization and dimension reduction, pages 238–260. Springer, 2008.
-  Joseph Richards. Diffusion map. R package version, pages 1–1, 2014.
-  Harry Strange and Reyer Zwiggelaar. Open Problems in Spectral Dimensionality Reduction. Springer, 2014.
-  Lawrence K Saul, Kilian Q Weinberger, Jihun H Ham, Fei Sha, and Daniel D Lee. Spectral methods for dimensionality reduction. Semisupervised learning, pages 293–308, 2006.
-  Cornelis J Stam and Jaap C Reijneveld. Graph theoretical analysis of complex networks in the brain. Nonlinear biomedical physics, 1(1):3, 2007.
-  Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059–1069, 2010.
-  Gabor Csardi and Tamas Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5):1–9, 2006.
-  Max Kuhn et al. Building predictive models in r using the caret package. Journal of statistical software, 28(5):1–26, 2008.
-  Brian D Ripley. Pattern recognition and neural networks. Cambridge university press, 2007.
Theory of the backpropagation neural network.In Neural networks for perception, pages 65–93. Elsevier, 1992.
-  Anders Krogh and John A Hertz. A simple weight decay can improve generalization. In Advances in neural information processing systems, pages 950–957, 1992.
Brian Ripley and William Venables.
nnet: Feed-forward neural networks and multinomial log-linear models.R package version, 7(5), 2011.