Manifold learning for brain connectivity

05/01/2020
by   Félix Renard, et al.
0

Human brain connectome studies aim at extracting and analyzing relevant features associated to pathologies of interest. Usually this consists in modeling the brain connectome as a graph and in using graph metrics as features. A fine brain description requires graph metrics computation at the node level. Given the relatively reduced number of patients in standard cohorts, such data analysis problems fall in the high-dimension low sample size framework. In this context, our goal is to provide a machine learning technique that exhibits flexibility, gives the investigator grip on the features and covariates, allows visualization and exploration, and yields insight into the data and the biological phenomena at stake. The retained approach is dimension reduction in a manifold learning methodology, the originality lying in that one (or several) reduced variables be chosen by the investigator. The proposed method is illustrated on two studies, the first one addressing comatose patients, the second one addressing young versus elderly population comparison. The method sheds light on the graph metrics and underlying neurobiological phenomena.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

page 22

page 23

page 24

12/21/2017

Autism Classification Using Brain Functional Connectivity Dynamics and Machine Learning

The goal of the present study is to identify autism using machine learni...
02/09/2018

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold...
04/15/2019

GraphTSNE: A Visualization Technique for Graph-Structured Data

We present GraphTSNE, a novel visualization technique for graph-structur...
04/30/2021

Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder

In this paper, we propose a new method to perform data augmentation in a...
11/04/2020

Node-Centric Graph Learning from Data for Brain State Identification

Data-driven graph learning models a network by determining the strength ...
02/17/2018

Connectivity-Driven Parcellation Methods for the Human Cerebral Cortex

In this thesis, we present robust and fully-automated methods for the su...
02/28/2018

GraphVar 2.0: A user-friendly toolbox for machine learning on functional connectivity measures

Background: We previously presented GraphVar as a user-friendly MATLAB t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Brain modeling and understanding is a very active field of research involving different disciplines, such as neuroscience, image and signal processing, statistics, physics, and biology. These last years, neuroimaging modalities have been developed to explore the brain for both structural and functional features. It is now recognized that these images are providing very promising noninvasive observations of the brain [22, 6, 28]. One consequence of the availability of such massive datasets is the need to develop more and more sophisticated models to unravel the possible alteration of brains due to the impact of different pathologies. In this context, representing the brain as a global system is capital. This may be achieved using a network [7]. A brain network is a graph where nodes correspond to specific regions and edges describe interactions and links between those regions. Different kinds of links and interactions may be of interest. Anatomical tracts are identified using diffusion imaging [32] and used in anatomical connectivity studies, where the whole set of links is called an anatomical connectome. Functional interactions are identified in functional imaging studies, whether in resting-state or in task-performing [30, 9], and used in functional connectivity studies. The whole set of functional links is called a functional connectome. In the functional case, brain networks are unique in encapsulating both spatial and temporal information in a single model. This model has attracted lots of attention these last twenty years by providing both very intuitive and spatial maps of brain networks.

Brain networks can be quantified using graph metrics such as minimum path length, clustering [35], global and local efficiency [16], modularity [24], and assortativity [23], among others. As these metrics are associated to specific network features, it is often possible to find the appropriate metrics to use given specific neuroscience hypotheses of the study. For the study of brain disorders, these metrics have been used in order to extract biomarkers for pathologies such as for example Alzheimer’s disease [33], schizophrenia [18], and multiple sclerosis [10]. Extracting quantitative parameters of brain networks is compulsory to conduct any statistical analysis. In this framework, statistical and machine learning approaches on graph metrics on all nodes allow the quantification of differences between groups [28].

For any dataset, any graph metric can be computed either at the global level with one value for an entire network or at the nodal level with one value for each node and a vector of values for the entire network. It has already been shown that global values may not discriminate two groups of subjects

[2], which shows their limits as biomarkers. Few attempts have been made to use directly distances between networks such as the edit distance [21], or network similarities [20]

. However, nodal level approaches are challenging since hundreds of brain areas can be extracted whereas the number of subjects is generally small. This corresponds to the High Dimension Low Sample Size (HDLSS) configuration and falls under the curse of dimensionality

[4]. In particular, standard classification and regression algorithms are not robust anymore in such a context (chapter 2 section 5 and chapter 18 of [14]).

Dimension reduction techniques tackle curse of dimensionality issues [14]

. In this framework, feature selection, where a subset of the original variables is considered, and feature extraction, where the original variables are transformed to a smaller set, may be envisaged

[36]

. We resort here to the ISOMAP methodology, which is a well-known nonlinear feature extraction algorithm generalizing Principal Component Analysis dimension reduction

[34, 15]

. ISOMAP may be seen as a manifold learning approach, where the degrees of freedom of the data are captured by the latent variables, and where the structure of points in the latent space (the reduced space) mimics the structure of data in the original space. Nevertheless, ISOMAP raises two issues: interpreting the latent variables and determining the effect a change in the latent variables incurs in the data space, that is the corresponding changes in brain networks and the underlying neuroscience hypotheses at stake in the case of the present study.

Dimension reduction is not new in the field of brain connectivity studies. Several methods have been proposed to extract nodal features at the level of brain regions. Using the Hub Disruption Index (the index) to analyze a set of brain networks may be considered as a feature extraction approach: this is a user-defined transformation of the original space to a 1D latent space [2]. Principal Component Analysis (PCA) is applied in [29] on graph metrics vectors representing brains at nodal level. We proposed in [26] to use kernel PCA, a nonlinear version of PCA. Besides, interpreting latent variables may be addressed by correlating the reduced space with clinical data [13]. Covariates may also be mapped or regressed on the reduced space as proposed in [3], thus shedding light on latent variables.

The objective of this article is to integrate all features cited above in one method: working at the nodal level, applying dimension reduction techniques, and mapping covariates to ease interpretation. In addition, a new methodology is proposed to incorporate interesting networks features already identified in specific datasets directly in the manifold learning approach.

This paper is focusing on two already published datasets. The first one consists in fMRI datasets on 20 healthy controls and 17 coma patients from Achard et al. [2]. The second one is based on [1]

where 15 young healthy subjects and 11 elderly healthy subjects were scanned using resting state fMRI. Our first experiment compares data driven approaches such as Linear Discriminant Analysis (LDA) and Random Forests (RF) to an ad hoc description such as the hub disruption index

. This allows to compare classical machine learning approaches where the interpretability of the results is often difficult with approaches resorting to descriptors constructed using neuroscientific hypotheses. The second experiment consists in constructing a data-driven manifold, ISOMAP, using the graph metrics as features. ISOMAP is providing a compact representation of brain connectomes in a reduced space where it is straightforward to map the available covariates. In addition, we may interpret changes in connectomes by regressing covariables like on the reduced space using latent variables.

This representation allows a visualization of each subject relatively to the whole population, which is crucial in clinical studies for example in order to better understand brain changes for each specific subject. Besides,

has been shown to be both a meaningful descriptor and a good classifying feature for brain connectomes of coma patients. Therefore, we propose a new method based on a covariate constrained manifold learning (CCML) using

as an input of ISOMAP. This allows us to propose a new generative model based on our new data representation, to better predict the evolution of each patient given the evolution of covariables.

2 Materials and methods

2.1 Resting state fMRI data

2.1.1 Comatose study

The data were acquired in a previous study aimed at characterizing resting state connectivity brain networks for patients with consciousness disorders. The description of the data and results is reported in [2]. The patients were scanned a few days after major acute brain injury, when sedative drug withdrawal allowed for spontaneous ventilation. Therefore, all patients were spontaneously ventilating and could be safely scanned at the time of fMRI. The causes of coma are patient-dependent: 12 had cardiac and respiratory arrest due to various causes; 2 had a gaseous cerebrovascular embolism; 2 had hypoglycemia; and 1 had extracranial artery dissection. A total of twenty-five patients were scanned (age range, 21-82 y; 9 men). Data on eight patients were subsequently excluded because of unacceptable degrees of head movement. The coma severity for each patient was clinically assessed using the 62 items of the WHIM scale: scores range from 0, meaning deep coma, to 62, meaning full recovery. Six months after the onset of coma, 3 patients had totally recovered, 9 patients had died, and 5 patients remained in a persistent vegetative state. The normal control group is composed of 20 healthy volunteers matched for sex (11 men) and approximately for age (range, 25-51 y) to the group of patients. This study was approved by the local Research Ethics Committee of the Faculty of Health Sciences of Strasbourg on October 24, 2008 (CPP 08/53) and by the relevant healthcare authorities. Written informed consent was obtained directly from the healthy volunteers and from the next of kin for each of the patients. Resting-state data were acquired for each subject using gradient echo planar imaging technique with a 1.5-T MR scanner (Avanto; Siemens, Erlangen, Germany) with the following parameters: relaxation time = 3 s, echo time = 50 ms, isotropic voxel size = 4 x 4 x 4 , 405 images, and 32 axial slices covering the entire cortex. The preprocessing of the data is detailed in our previous study [2].

2.1.2 Young and elderly study

The data used in this study have already been analyzed in two papers [1] and [19]. The goal of these papers was to identify the changes in brain connectomes for elderly subjects in terms of topological organization of brain graphs. The data consist of 15 young subjects aged 18-33 years, mean age=24 and 11 elderly subjects aged 62-76 years. Each subject was scanned using resting-state fMRI as described in [1] (Wolfson Brain Imaging Centre, Cambridge, UK). For each dataset, a total of 512 volumes was avalaible with number of slices, 21 (interleaved); slice thickness, 4 mm; interslice gap, 1 mm; matrix size, 64 3 64; flip angle, 908; repetition time (TR), 1100 ms; echo time, 27.5 ms; in-plane resolution, 3.125 mm.

2.2 Wavelet graph estimation

Brain network graphs were determined following [2] for comatose study and [1]

for young and elderly study. For each subject, data were corrected for head motion and then coregistered with each subject’s T1-weighted structural MRI. Each subject’s structural MRI was non linearly registered with the Colin27 template image. The obtained deformation field image was used to map the fMRI datasets to the automated anatomical labeling (AAL) or to a customized parcellation image with 417 anatomically homogeneous size regions based on the AAL template image. Regional mean time series were estimated by averaging the fMRI time series over all voxels in each parcel, weighted by the proportion of gray matter in each voxel of the segmented structural MRIs. We estimated the correlations between wavelet coefficients of all possible pairs of the N = 90 or 417 cortical and subcortical fMRI time series extracted from each individual dataset. For the coma, only scale 3 wavelet correlation matrices were considered. The corresponding frequency interval of the functional connectivity was 0.02-0.04 Hz. For the young and elderly, the wavelet scale considered corresponds to 0.06-0.11 Hz. To generate binary undirected graphs, a minimum spanning tree algorithm was applied to connect all parcels. The absolute wavelet correlation matrices were thresholded to retain 2.5 % of all possible connections. Each subject was then represented by a graph with nodes corresponding to the same brain regions, and with the same number of edges.

2.3 Graph metrics

The objective is to extract differences between the two groups with respect to the topological organization of the graphs. Each graph is summarized by graph metrics computed at the nodal level. Three metrics are considered here: degree, global efficiency, and clustering [6].

The degree is quantifying the number of edges belonging to one node. Let denote a graph with when there is no edge between nodes and , and when there is an edge between nodes and . The degree of node is computed as

(1)

The global efficiency measures how the information is propagating in the whole network. A random graph will have a global efficiency close to 1 for each node, and a regular graph will have a global efficiency close to 0 for each node. The global efficiency

is defined as the inverse of the harmonic mean of the set of the minimum path lengths

between node and all other nodes in the graph:

(2)

Clustering is a local efficiency measure corresponding to information transfer in the immediate neighborhood of each node, defined as:

(3)

where is the subgraph of defined by the set of nodes that are the nearest neighbors of node . A high value of clustering corresponds to highly connected neighbors of each node, whereas a low value means that the neighbors of each node are rather disconnected.

Each graph metric emphasizes a specific property at the nodal level. With a view to statistical comparison, several methods have already been developed, representing data in specific spaces. Each method aims at separating classes. Usually these methods are very general and can be applied without careful inspection of the data. We used here four different methods [27]: the

index resulting from a careful inspection of the data, mean over graph metrics (denoted here MEAN), LDA and Feature Selection (FS) by selecting the best feature based on a univariate statistical Student t-test. Like the

index, each of these methods provides, for each patient, a scalar feature corresponding to particular property of the data. Figure 1 gives an illustration of the different methods.

Figure 1: General framework from graphs of cerebral connectomes to the different scalar features.

2.4 index definition

In our previous study [2], was devised to compare graph metrics obtained on each node of a subject or of a group with reference group (see figure 2). In classical comparisons between a group of patients and a group of healthy volunteers, the reference is the group of healthy volunteers. In the present study, for a given graph metric and two groups, we first compute the average of this metric for each node over the group of healthy volunteers, denoted as the reference. Each subject is then summarized as a vector of values of dimension the number of nodes. Then, for each patient, corresponds to the slope of the regression of a nodal graph metric between the given patient minus the reference and the reference. In order to give a simple interpretation of , we assume that the global graph metric computed as an average over the nodes is the same in both groups. A value of zero for is showing that the graph metric obtained at the node level is the same for the patient and the reference. A positive value of is indicating that the hubs and non-hubs of the patient in comparison to the reference are located on the same nodes. However, the values of the graph metrics are increased for the hubs and decreased for the non-hubs. Finally, when the value of is negative, the hubs of the reference are no longer hubs of the patient, and the non-hubs of the reference are hubs for the patient. In [2], we showed that the index is able to discriminate both groups (coma patients and healthy volunteers) while the global metric is unable to identify any significant difference. Instead of averaging the graph metrics, the

index is capturing a joint variation of the metrics computed for each node.

Figure 2: Extraction of hub disruption index : A. brain connectomes inferred for each subjects; B. for each brain connectome, extraction of graph metrics for each region of the brain; C. matrix representation of the graph metrics where a row corresponds to a subject and a column corresponds to a brain region; D. computation of the hub disruption index by regressing the average of brain metrics of the difference of patients and average of healthy volunteers against the average of healthy volunteers. The hub disruption index corresponds to the slope coefficient. We give several illustrations following the sign of this coefficient.

2.5 Mean over the nodes (MEAN)

For each graph metric, the mean over the nodes of the graph captures a global property of the network. These global metrics have been previously used to discriminate two populations of networks, for example for Alzheimer’s disease [33] and for schizophrenia [18]. Such a coefficient can discriminate well two networks when their topologies are really different. However, such metrics do not take into account the specificity of the nodes. Indeed, when permuting the nodes of the graph, the global metric is not changed, but the hubs of the graph are not associated to the same nodes anymore. Therefore, a graph reorganization cannot be detected using such global metrics.

2.6 Linear discriminant analysis (LDA)

LDA [11] is a classification method, aiming at identifying the linear projection optimally separating two groups. It can be considered as a gold standard for linear group discrimination. It is not specific to the analysis of networks.

LDA has been previously used for network discrimination in [29]. This algorithm amounts to computing a scalar for each graph. However, there is no simple clinical interpretation of the discriminating parameter.

2.7 Feature Selection (FS)

As for LDA, FS determines the features yielding the best separation of the two groups. Several features may be used simultaneously. In order to establish a fair comparison with the other methods, we choose to extract the single feature yielding the best separation. Several methods exist for FS. We choose univariate FS implemented in [25]. An advantage of FS is that it is capturing discriminative features at the node level. As the selected features are extracted directly from the data, it is usally possible to derive a clinical interpretation. However, joint variations are not modeled and on the comatose study, FS is not able to yield results of the same quality as those obtained using .

2.8 Modeling populations of networks with manifold learning

ISOMAP [34] is used as a manifold learning approach to describe population networks. We propose here an original approach based on ISOMAP, where we constrain one variable of the reduced space (the latent space) to correspond to a covariate.

2.8.1 Manifold learning using ISOMAP

ISOMAP devises a reduced dimension version of the original set of points. Interpoint distances in the reduced space reproduce as much as possible interpoint distances in the original space. Euclidean and geodesic distances are respectively used. Principal component analysis may be seen as a particular case of ISOMAP, where Euclidean distances are used in the original space, instead of geodesic distances. The reader is referred to [34] for details about the algorithm.

In our case, the original data correspond to a vector of graph metrics for each subject, the dimension of the vector being the number of nodes times the number of metrics. For each analysis, only one metric is considered here. However, this method could be applied using jointly several metrics. Covariates may be regressed on the reduced space. In the present work, this was achieved using a classic radial basis function interpolation.

2.8.2 Covariate constrained manifold learning

One drawback of manifold learning algorithms is the difficulty to interpret the reduced coordinates because they are usually meaningless. The original method proposed in this work consists in constraining one coordinate of the reduced space to correspond to a specific covariate. The other coordinates are left unconstrained, as in classical ISOMAP. Such a procedure requires special care regarding the optimization aspect. We apply a strategy proposed in [5], where points are introduced one by one.

Moreover, a scale factor is considered for the axis corresponding to the covariate. This parameter, obtained by optimization, balances the scales of the different axes.

The reduced point is defined by , where is the chosen covariate and are the other coordinates. The cost function is defined as:

(4)

For an incoming data point , the cost function is optimized three times with regard to 1) as , 2) as and 3) for each point that has already been included as .

To facilitate optimization and to avoid possible local minima, instead of inserting the samples at random, we choose the sample to be incorporated next as the one with the largest geodesic distance to the samples already incorporated. Indeed, interpolation problems are always easier than extrapolation problems where greater uncertainty may occur. We initialize the procedure by taking the two samples with the largest geodesic distance. The first two samples are used as landmarks of the border of the reduced space, and the insertion of new samples will generate only small displacements of the already inserted samples.

The algorithm is described in Algorithm 1.

Input – dataset: vectors (samples) of a graph metric over graph nodes
Result: reduced space representation of the dataset, where the first coordinate of each corresponds to the covariate.
Initialization: select the two most distant samples
Determine their reduced coordinates by minimizing with
Update the scale by minimizing wrt , fixed, as .
while All points are not included do
     1) Select the most distant sample to the already selected samples
     2) Compute by minimizing wrt ( and other ’s fixed) as .
     3) Update the scale by minimizing wrt (’s fixed) as .
     4) Update ’s of samples already included by minimizing as .
end while
Algorithm 1 – covariate constrained manifold learning (CCML)

2.9 Application: a generative model for the prediction of the evolution of a subject with regard to the evolution of a covariate

From the obtained embedding, a generative model

(5)

can be devised, where is a vector in the original space (the connectome space), is a vector from the manifold embedding, and is a regression function. MARS [12] is chosen for the regression function for its nice properties (one regression for each coordinate of , i.e. regressions): locally linear and globally nonlinear. The parameters of can be determined using the dataset and the corresponding reduced vectors using equation:

(6)

where is the residual between a sample and its prediction . The residuals allow to evaluate the accuracy of the regression function.

This kind of model is not original, PCA being the most well known case where the model is defined as , see e.g. [17, 31] for references. Such a generative model used in the CCML framework allows to determine changes in the original space (the connectome space) generated by a displacement in the reduced space, for example along the covariate axis.

3 Results

The different algorithms have been implemented in the Python language using the scikit learn toolbox [25]. When left unspecified, coma data are used. The use of the young and elderly data is explicitely stated.

3.1 Local analysis using dimension reduction

Permutation tests are performed on the index and on the three other measures (LDA, FS, MEAN) to assess the ability of those four metrics to discriminate two populations. More precisely, for each coefficient separately, the difference of the means of the two populations is determined for the observed populations. The labels of the samples are then shuffled and the difference of the means of the shuffled two populations is determined. This latter step is performed times. It avoids to make any assumption on the distribution of the statistic. Simultaneously the correlations between the observed index and the other coefficients are estimated.

Coefficients corr. corr. corr.
(p-value) (p-value) (p-value)
Table 1: P-value of permutation tests comparing the mean of the two groups ( permutations, which bounds the p-values). The correlation scores are estimated between the index and the three other measures (, , and ).

The results corresponding to the different methods aiming at discriminating the two groups (control and coma) are given in Table 1. As expected, the machine learning algorithms (ie, , , and ) show good performances in separating the two groups for different graph metrics. This is consistent with the fact that these methods have been tailored to classify the two groups. The results with the index show similar performances in separating the two groups. The large correlations between machine learning algorithms on the one hand and the graph metric on the other hand show retrospectively that similar performances were to be expected.

Besides, a strong relationship can be observed between and LDA (correlation scores greater than for each metric). The FS correlation scores are lower than the LDA correlation ones. The difference between the two methods is that LDA considers a linear combination of features, with a global perspective, whereas FS selects one feature and acts locally. Since reflects a global reorganization of the brain, it is expected that the correlation score of LDA be greater than the FS one. Finally, MEAN scores reveal that this measure is not appropriate in this study.

3.2 Standard ISOMAP manifold learning and the index

Figure 3: Standard ISOMAP reduced space representation of the original dataset. : (comatose) patient ; : control . Covariates are mapped onto the reduced space (covariate value is color-coded). Top left: index mapping; top right: MEAN mapping; bottom left: LDA mapping; bottom right: FS mapping.

In this section, the goal is to link the reduced space obtained by manifold learning to different covariates such as the index. We want to assess whether a given covariate varies smoothly across the reduced space, and is therefore predictible using this space.

Figure 3 represents the reduced space obtained using standard ISOMAP, as opposed to using CCML. The values of the different covariates are color-coded. The reduced space representation allows to separate both populations. Besides, by visual inspection of the color-coded maps, it appears that those regression maps are capturing features corresponding to and MEAN. In order to quantify these visual observations, covariates are regressed on the reduced space. The root mean square error (RMSE) and the maximum error are determined in a leave-one-out procedure. The results are given in Table 2. It can be noted that the MEAN strategy gives the same values for the graph metric degree D for all graphs since the number of edges is set to be the same for all graphs.

Covariate
RMSE M RMSE M RMSE M
0.51 2.44 0.68 2.59 0.26 2.14
0.97 7.97 0.9 5.44 0.66 4.13
0.64 2.2 1.1 9.12 0.83 7.49
0.15 0.73 1.02 5.44
Table 2: Assessment of the regression of covariates on the reduced space. Three different reduced spaces are at stake, one for each graph metric. Root mean square error (RMSE) and maximal error are displayed. The MEAN strategy is not relevant for the degree D since the degrees D for all graphs are equal (the number of edges is set to be the same for all graphs).

In this table, the lower the values, the better the adequacy with the reduced space. It appears that is the best choice across all metrics, except for where it is outperformed by MEAN. In the case of , this suggests that both and MEAN scores correspond to degrees of freedom of the intrinsic manifold of the functional connectivity graphs.

Figure 4

displays the probability of belonging to a specific class computed using logistic regression on the reduced space stemming from standard ISOMAP. The probability of belonging to the comatose class is color-coded in Figure

4.

Figure 4: Logistic regression using reduced space stemming from standard ISOMAP. The color codes the probability of belonging to the comatose class.

This probability estimation using logistic regression is compared with covariates such as or MEAN, in Table 3. A high correlation score is observed between and logistic regression probablity. The correlation score between the MEAN coefficient and the probabilistic mapping is lower than the one with as expected.

Coefficients
0.87 0.87 0.81
0.55 0.42
Table 3: Correlation scores between the probabilistic mapping and the different coefficients ( index and MEAN measure). A high correlation score of the index indicates a good fitting between the reduced space representation and the classification of the two groups.

Taken together, these observations demonstrate the importance of in the classification of these populations. Obviously, for the special case of global efficiency metric, the MEAN score describes correctly the reduced space, but does not correspond to the classification pattern.

3.3 Covariate constrained manifold learning

3.3.1 Comatose population

First we evaluate the convergence of the optimization problem (Algorithm 1). To assess the difficulty of the optimization problem, we ran it with random initializations. Only of the runs converged to the same solution, whereas of the runs converged to a local (worse) optimum. It thus appears that our initialization procedure addresses the local optimum issue. Nevertheless, this optimization problem would probably deserve further investigations which are out of the scope of this paper.

In Figure 5, we display the reduced space corresponding to our new manifold learning algorithm. We can observe that the two populations are well discriminated in the case of , but not for the MEAN coefficient.

We observe the strong interaction between and the MEAN in the top right of Figure 5, where the reduced space on one axis is and the mapping corresponds to the MEAN. To quantify this, in the case of , we estimate the correlation between the second coordinate and the MEAN score. The obtained score equals to , which confirms the intrinsic relationship between and the MEAN coefficient.

Figure 5: Covariate mapping onto the reduced space given by our method CCML. The reduced space is computed using as a graph metric (the ’s). Covariate value is color-coded. For each subfigure, the coordinates correspond to , where is the constrained variable and the free parameter. Top left: mapping with a -constrained reduced space, Top right: MEAN mapping with a -constrained reduced space; Bottom : MEAN mapping with a MEAN-constrained reduced space. As expected, we can observe that the mappings correlate well with the first coordinate by construction (top left and bottom). It is also clear that using a -constrained reduced space is facilitating the discrimination between the two populations. Indeed, the controls and patients are not covering the same part of the reduced space. On the contrary, as expected using the MEAN-constrained reduced space, the method is not providing a very clear discrimination between patients and controls. Especially, patients 9 and 18 are very close to controls. Finally, the top right figure is showing a correlation between the second coordinate of CCML and the MEAN.

3.3.2 Elderly and young population

The elderly and young groups are investigated in this section.

Figure 6: Left: mapping with the standard ISOMAP reduced space, Right: MEAN mapping with the same reduced space. The old controls (resp. young controls) are labeled O (resp. Y). For these groups, the index cannot discriminate the two groups. However the MEAN index behaves better for the discrimination between the two groups. In each case, the interpretation is complex since the mapping of the covariate is not linear.
Figure 7: CCML approach. Left: mapping with the -constrained reduced space, Right: MEAN mapping with the MEAN-constrained reduced space. The old controls (resp. young controls) are labeled O (resp. Y). Two manifolds (one for each constraint) have been determined. However only the MEAN-constrained manifold discriminates both groups.

In Figure 6, the manifold obtained by standard ISOMAP is displayed. It is interesting to highlight that the index is not a pertinent feature to discriminate the old from the young, whereas the MEAN is a better discriminating feature. In both cases, the interpretation of the mapping is complex since it is not smooth.

In Figure 7, results from CCML are displayed for the MEAN coefficient. We can observe that the MEAN mapping discriminates the two groups.

3.4 Application: a generative model for the prediction of the evolution of a subject with regard to the evolution of a covariate

Using the algorithm detailed in the last section, a map of the population is determined, with one of the reduced coordinates corresponding to a chosen covariate. To highlight the potential of the proposed method, we compute the evolution of a patient with regard to the evolution of a covariate. This allows gives insight into the effect of the covariate on the patient. To perform this analysis, a regression is used to map the reduced space to the initial space (we used MARS regression [12], coordinate-wise).

This application is illustrated in Figure 8.

Figure 8: Evolution of one patient along the covariate axis. We consider the patient P18 in state B in the reduced space (left part of the figure). We predict its evolution when the index is decreased (point A) or increased (point C). Intuitively, point A and point C correspond respectively to a degradation and to an improvement of the health of the patient. Interestingly, these trends can be directly observed in the graph space for clinical insights (right part of the figure). It can be noticed that the variations are not linear.

4 Discussion

4.1 Assessment of graph metric descriptors

The handcrafted design of graph metric descriptors is interesting since such descriptors carry straightforward physical meaning, like the index for hub reorganization. However, in a classification framework, such new scalar descriptors may not be optimal. To assess the pertinence of a new ad hoc graph metric descriptor for a classification task, it will be interesting to confront it to specific scalar coefficients used in standard classification algorithms (like LDA or FS), and examine if there is some correlation between the scalars at stake.

4.2 No free lunch for graph metric descriptors

We hypothesize that there is no best descriptor adapted to all datasets. The optimality depends on 1) the kind of the data and 2) the kind of question/task addressed. This idea is known as the "no free lunch theorem" [37]: if an algorithm performs well on a certain class of problems then it has necessarily poorer performances on the set of all remaining problems.

In the present study, we showed that the index yields good classification performances in separating a comatose population from a healthy population. However the MEAN index better describes the groups of elderly and young people (see Fig. 3): for this dataset, the index cannot separate the two groups, but the MEAN score can. It is interesting to notice that several descriptors can map correctly a population, while providing different information.

The "no free lunch theorem" also applies to manifold learning algorithms. The underlying question is the one of choosing an interpoint distance in the data space. A given interpoint distance will yield a specific structure of samples in the reduced space. Therefore, the retained interpoint distance chosen will depend on the final goal: mimicking the structure of the initial data points, enhancing class separation with a view to achieving better classification performances, focusing on a specific property of the data, etc…The proposed algorithm CCML aims at mimicking the structure of the initial data points, and this to be done using explicitly a particular characteristic, chosen and imposed by the investigator.

4.3 Manifold learning for brain graph analysis

Manifold learning is well suited for brain graph analysis for several reasons. Firstly, global descriptors of graph metrics represent an entire graph by a scalar value, which is generally ultimately insufficient to model correctly the complexity of a graph population. Manifold learning is better suited to capturing the complexity and variability of a given graph population, since more degrees of freedom are structuring the reduced space.

Secondly, connectomes have been studied for their capacity to represent the brain as a system and not merely as a juxtaposition of independent anatomical/functional regions. Classical statistical tests are not adapted to analyze joint variations between local descriptors of graph metrics since those tests assume independence between features. Brain graph manifold learning for comparing groups of graphs is promising because joint variations are accounted for.

Thirdly, manifold learning may be turned to a generative model, when resorting to a mapping from the reduced space to the data space.

Brain graph manifold learning can be seen as a trade-off between global and local brain graph metrics analysis. In other words, manifold learning is considered as a model at the level of the group while preserving the information of the individuals. However this technique is hard to interpret by its own. The addition of explicative covariables as proposed with the CCML method can provide an understandable and generative model of population with the possibility of focusing at the individual level [8].

More generally, manifold learning can be an interesting solution for personalized and predictive medicine purposes.

5 Conclusion

The originality and contribution of this paper is the devising of a nonlinear manifold model of brain graph metrics. The essence of the approach is the capture of a metric through all nodes of a graph, across all graphs of an entire population: a population of graphs is represented by a population of vectors, each vector holding the variation of a metric through the nodes at stake. The population is then represented in a reduced space. This is to be opposed to the standard representation of a given brain graph by a mere scalar.

The proposed approach has several advantages. First and foremost, the data are represented with several degrees of freedom, corresponding to the dimensions of the reduced space. The structure of the original data set is captured by a compact representation. This allows to account for the complex variability of populations of brain graphs. Secondly, such an approach naturally offers analysis of joint variations of those brain graph metrics. Besides, the investigator has the possibility to analyze the data at the population scale and simultaneously at the individual scale.

The investigation tool corresponding to the proposed approach allowed us to retrospectively assess the hub disruption index (HDI), denoted , and proposed in one of our former works. Earlier work showed that the HDI is a very good candidate for discriminating patients and controls in the case of coma. Retrospectively, its performances are here assessed in comparison with machine learning methods dedicated to linear group classification such as LDA. Besides yielding nice classification performances, the present study showed that an advantage of HDI, put in the perspective of a manifold model, is to give clinical clues related to the pathology mechanism.

We observed strong relationships between scalar coefficients such as HDI and MEAN, and the coordinates of the manifold. It is important to notice that MEAN, which can separate groups in several pathologies [33, 18], is not able to discriminate the comatose patients from the normal population. However it brings additional information in terms of description of the population. The manifold at stake shows that a scalar coefficient cannot capture the whole information encapsulated in the graphs. One interest of manifold learning, and more specifically our new proposed method, is its ability to reach a new level of interpretation of the brain graph metrics and the interaction between them.

Author contributions
This project was formulated by FR, CH and SA based on substantial data, analyses and experiments of FR, CH, MB, MS, FS, SK and SA. FR and SA formalised the model, implemented and ran the model; FR, CH and SA wrote the manuscript.

References

  • [1] S. Achard and E. Bullmore. Efficiency and cost of economical human brain functional networks. PLoS Computational Biology, 3:e17, 2007.
  • [2] Sophie Achard, Chantal Delon-Martin, Petra E. Vértes, Félix Renard, Maleka Schenck, Francis Schneider, Christian Heinrich, Stéphane Kremer, and Edward T. Bullmore. Hubs of brain functional networks are radically reorganized in comatose patients. Proceedings of the National Academy of Sciences, 109(50):20608–20613, Nov. 2012.
  • [3] P. Aljabar, R. Wolz, and D. Rueckert. Manifold learning for medical image registration, segmentation and classification. In K. Suzuki, editor, Machine learning in computer-aided diagnosis: medical imaging intelligence and analysis. IGI Global, 2012.
  • [4] Richard E. Bellman. Adaptive control processes - A guided tour. Princeton University Press, 1961.
  • [5] Matthieu Brucher, Christian Heinrich, Fabrice Heitz, and Jean-Paul Armspach. A metric multidimensional scaling-based nonlinear manifold learning approach for unsupervised data reduction. EURASIP J. Adv. Sig. Proc., 2008, 2008.
  • [6] E. Bullmore and O. Sporns. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci, 10(3):186–198, Mar 2009.
  • [7] Ed Bullmore and Olaf Sporns. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 10(3):186–198, feb 2009.
  • [8] Lilia Costa, Jim Smith, Thomas Nichols, James Cussens, Eugene P Duff, Tamar R Makin, et al. Searching multiregression dynamic models of resting-state fmri networks using integer programming. Bayesian Analysis, 10(2):441–478, 2015.
  • [9] Fabrizio De Vico Fallani, Jonas Richiardi, Mario Chavez, and Sophie Achard. Graph analysis of functional brain networks: practical issues in translational neuroscience. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1653):20130521, 2014.
  • [10] Massimo Filippi, Paola Valsasina, Sara Sala, Vittorio Martinelli, Angelo Ghezzi, Pierangelo Veggiotti, Andrea Falini, Giancarlo Comi, and Maria Rocca. Abnormalities of the brain functional connectome in pediatric patients with multiple sclerosis. Neurology, 82(10 Supplement), 2014.
  • [11] R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.
  • [12] J. Friedman. Multivariate adaptive regression splines. Annals of Statistics, 19(1):1–67, March 1991.
  • [13] Samuel Gerber, Tolga Tasdizen, P. Thomas Fletcher, Sarang Joshi, and Ross Whitaker. Manifold modeling for brain population analysis. Medical Image Analysis, 14(5):643–653, Oct. 2010.
  • [14] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer, 2001.
  • [15] Xiaoming Huo, Xuelei Sherry Ni, and Andrew K Smith. A survey of manifold-based learning methods. Recent advances in data mining of enterprise data, pages 691–745, 2007.
  • [16] V. Latora and M. Marchiori. Efficient behavior of small-world networks. Physical Review Letters, 87:198701, Oct 2001.
  • [17] Neil D Lawrence.

    Gaussian process latent variable models for visualisation of high dimensional data.

    In Advances in neural information processing systems, volume 16, pages 329–336, 2004.
  • [18] Mary-Ellen Lynall, Danielle S. Bassett, Robert Kerwin, Peter J. McKenna, Manfred Kitzbichler, Ulrich Muller, and Ed Bullmore. Functional connectivity and brain networks in schizophrenia. The Journal of Neuroscience, 30(28):9477–9487, 2010.
  • [19] David Meunier, Sophie Achard, Alexa Morcom, and Ed Bullmore. Age-related changes in modular organization of human brain functional networks. Neuroimage, 44(3):715–723, 2009.
  • [20] Ahmad Mheich, Mahmoud Hassan, Mohamad Khalil, Vincent Gripon, Olivier Dufor, and Fabrice Wendling. Siminet: a novel method for quantifying brain network similarity. IEEE transactions on pattern analysis and machine intelligence, 40(9):2238–2249, 2017.
  • [21] Fatemeh Mokhtari and Gholam-Ali Hossein-Zadeh. Decoding brain states using backward edge elimination and graph kernels in fmri connectivity networks. Journal of neuroscience methods, 212(2):259–268, 1 2013.
  • [22] Benson Mwangi, Tian Siva Tian, and Jair C Soares.

    A review of feature reduction techniques in neuroimaging.

    Neuroinformatics, 12(2):229–244, 2014.
  • [23] M. E. J. Newman. Assortative mixing in networks. Physical Review Letters, 89:208701, Oct 2002.
  • [24] M. E. J. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577–8582, 2006.
  • [25] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  • [26] F. Renard, C. Heinrich, S. Achard, E. Hirsch, and S. Kremer. Statistical kernel-based modeling of connectomes. In

    Proc. Int Workshop on Pattern Recognition in NeuroImaging

    , pages 69–72, 2012.
  • [27] Jonas Richiardi, Sophie Achard, Edward Bullmore, and Dimitri Van De Ville. Classifying connectivity graphs using graph and vertex attributes. In Proc. Int Workshop on Pattern Recognition in NeuroImaging, Seoul, Korea, 2011.
  • [28] Jonas Richiardi, Sophie Achard, Horst Bunke, and Dimitri Van De Ville. Machine learning with brain graphs: predictive modeling approaches for functional imaging in systems neuroscience. IEEE Signal Processing Magazine, 30(3):58–70, 2013.
  • [29] Emma C Robinson, Alexander Hammers, Anders Ericsson, A David Edwards, and Daniel Rueckert. Identifying population differences in whole-brain structural networks: a machine learning approach. NeuroImage, 50(3):910–919, April 2010.
  • [30] Cristina Rosazza and Ludovico Minati. Resting-state brain networks: literature review and clinical applications. Neurological sciences, 32(5):773–785, 2011.
  • [31] Giorgos Sfikas and Christophoros Nikou. Bayesian multiview manifold learning applied to hippocampus shape and clinical score data. In

    Medical Computer Vision and Bayesian and Graphical Models for Biomedical Imaging –

    miccai 2016
    , pages 160–171. Springer, 2016.
  • [32] Olaf Sporns, Giulio Tononi, and Rolf Kötter. The human connectome: a structural description of the human brain. PLoS Computational Biology, 1(4):e42, 2005.
  • [33] Kaustubh Supekar, Vinod Menon, Daniel Rubin, Mark Musen, and Michael D. Greicius. Network analysis of intrinsic functional brain connectivity in alzheimer’s disease. PLoS Computational Biology, pages 1–11, June 2008.
  • [34] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319 –2323, December 2000.
  • [35] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’networks. nature, 393(6684):440, 1998.
  • [36] Andrew Webb. Statistical pattern recognition. Wiley, second edition, 2002.
  • [37] David H Wolpert. The lack of a priori distinctions between learning algorithms. Neural computation, 8(7):1341–1390, 1996.