I Introduction
Diffusion Magnetic Resonance Imaging (MRI) technology noninvasively reveals white matter fiber structures and provide a model of the brain fiber tracts at a relatively high resolution. This opens up new research opportunities to generate, explore and analyze complex brain networks derived from Diffusion Tensor Imaging (DTI) based structural connectivity information [1, 2, 3]. Researchers have successfully applied graph theoretical analysis on specialized structural networks to shed light on differences between different population groups and on brain disorders such as dementia [4] and schizophrenia [5]. Brain network analysis requires a reasonably accurate anatomical segmentation of the cerebral cortex, called parcellation, in which structurally homogeneous regions constitute the nodes of the network. Traditional anatomical brain regions may not incorporate connectivity information, and are typically identified by the distribution of cell types [6], myelinated fibers [7], or neurotransmitter receptors [8]. Common widelyused anatomical brain parcellations include Brodman’s areas [9], Automated Anatomical Labeling (AAL) [10], and Jülich histological parcellations [11]. However, there are no generally accepted anatomical parcellations and atlases of the whole brain that are based purely on the anatomic brain connectivity information revealed by diffusion MRI data. Most of existing DTIbased parcellation studies focus on particular parts of the cerebral cortex, such as the human inferior parietal cortex complex (IPCC) [12], the lateral parietal cortex [13], the temporoparietal junction area (TPJ) [14], the dorsal frontal cortex [15], the ventral frontal cortex [16], cingulate and orbitofrontal cortex [17], and Broca’s areas [18]. The parcellations generated specifically for these regions have a small number of subregions but achieve high consistency among subjects of a population sample. Connectivitybased parcellation of the whole brain is challenging due to a number of factors that include: (i) the very large size of the connectivity matrix produced by tractography of each subject’s DTI data; (ii) spatial constraints among the voxels of each region that must be respected in addition to the connectivity information; (iii) enforcing consistency for any structurally homogeneous population sample; and (iv) the lack of effective techniques to evaluate, and validate good parcellations.
In this paper, we propose a novel iterative method based on spectral clustering applied to a sparse representation of the connectivity information which also incorporates the necessary spatial constraints. Our goal is to generate reproducible wholebrain parcellations based purely on DTI data, which are stable and subjectreproducible, achieve highly structurally homogeneous regions, and are consistent among structurally similar population samples. Such parcellations can be used as the basis for conducting graphtheoretic analysis on the resulting anatomic connectivity networks. Our method uses probabilistic tractography to generate the connectivity matrix that represents connectivity strength between any two gray voxels. A sparse representation of the connectivity matrix is defined by a graph whose edges capture spatial connectivity within a small spatial neighborhood and whose edge weights provide a similarity measure of the connectivity profiles of the endpoints. We show that our method is effective in generating parcellations that are highly consistent among subjects in the same population sample and that capture anatomic connectivity patterns that can be used to distinguish between population samples with known structural differences. Moreover, the methods are computationally efficient and robust to various random factors.
We note two particular works that are directly related to this paper. The first, reported by Craddock et al. [19], focuses on a datadriven approach for generating atlases based on restingstate functional MRI. The main goal there is to parcellate the whole brain into coherent regions of interests that are homogeneous in their restingstate functional connectivity (FC). They develop independently a graph formulation that is similar to ours, and apply spectral clustering in a straightforward way. The resulting atlas, while better than several of the standard anatomical atlases in term of FC homogeneity, has similar characteristics to a random atlas. Moreover, the input size is significantly smaller than the size of the problem we are dealing with here. The second work reported in [20]
addresses the same problem tackled in this paper and uses hierarchical clustering to generate a hierarchy of whole brain parcellations. Hierarchical clustering techniques have serious limitations since they use a local greedy strategy, and each successive refinement cannot modify the clustering determined in previous steps. In addition, the evaluation methodology carried out there is limited to either known results for small regions such as the inferior parietal cortex convexity or to other wellknown cytoarchitectonic parcellations that do not incorporate the connectivity information provided by DTI.
We summarize our main contributions in this paper as follows:

We develop efficient, scalable algorithms based on a sparse representation of the whole brain connectivity matrix, which reduces the number of edges from around a half billion to a few million while incorporating the necessary spatial constraints.

For an arbitrary subject from a population sample and for any value of the number of regions, we show that our algorithm converges to a stable parcellation after a few iterations, defined by structurally homogeneous regions.

Our parcellations of subjects within a population sample are consistent using any of a number of similarity metrics between parcellations of different subjects.

Our method captures structural patterns to allow us to distinguish effectively between structurally different population groups such as Males vs Females, Normal Controls vs Schizophrenia, and different age groups in Normal Controls.
The rest of the paper is organized as follows. We start in the next section by describing the data and tools used to generate the connectivity matrix of each subject. Our iterative method is described in Section III, while the stability and reproducibility results at the individual subject level or group level are covered in Section IV. Section V covers the discriminative power of the resulting parcellations. We end with a brief discussion in Section VI.
Subject Group  Male  Female  Age 1830  Age 3150  Age 5160  Total 

Normal Controls  41  35  23  28  25  76 
Schizophrenia  31  17  16  17  15  48 
Ii Data and Preprocessing Steps
Iia Data Acquisition
Imaging was performed at the University of Maryland Center for Brain Imaging Research using a Siemens 3T TRIO MRI (Erlangen, Germany) system and 32 channel phase array head coil. The highangular resolution diffusion imaging protocol was used to assess white matter integrity as measured by fractional anisotropy. Diffusion tensor data were collected using a single shot, echoplanar, single refocusing spinecho, T2weighted sequence with a spatial resolution of 1.71.73.0mm. The sequence parameters were: TE/TR=87/8000ms, FOV=200mm, axial slice orientation with 50 slices and no gaps, 64 isotropically distributed diffusion weighted directions, two diffusion weighting values (b=0 and 700s/mm) and six b=0 images. These parameters were calculated using an optimization technique that maximizes the contrast to noise ratio for FA measurements. The total scan time was approximately 9 minutes per subject. For each subject, the image data consists of 70 volumes of 3D images of dimensions 12812853, each voxel representing 1.718mm1.718mm3mm brain volume. We collected data from 76 normal (NC) subjects and 48 schizophrenia (SZ) subjects. The subject demographics are shown in Table I.
IiB Nonlinear Registration
The diffusion images of all subjects are registered to Montreal Neurological Institute (MNI) standard space using nonlinear registration package FNIRT in FSL [21]. The nonlinear registration process generates the warping coefficients that balance the similarity between the diffusion image and the standard MNI152 image, and the smoothness of the warping coefficients. The registration process facilitates group atlas generation and comparison with other standard atlases.
IiC Probabilistic tractography
The preprocessing step of probabilistic tractography is used to model cross fiber distributions for each voxel through the BEDPOSTX package in FSL [22]. Probabilistic tractography is processed through the diffusion toolbox in FSL [23]. The standard white matter atlas is specified as a seed region. The AAL mask is specified as the target region, which is the whole brain cortex region. We generate 50 streamlines from every voxel in a seed region. These streamlines are propagated following the cross fiber distribution computed from the preprocessing step. Curvature threshold is enforced to eliminate unqualified streamlines. The distance correction option is set to correct for the fact that the distribution drops as travel distance increases. The tractography output is a structural connectivity network modeled as a weighted graph where each node is a voxel in the target region space and each edge weight corresponds to relative connectivity strength in terms of the number of streamlines connecting the corresponding pair of voxels.
Iii Our Approach
Our main method takes as input a subject’s connectivity matrix. The number of voxels in the AAL mask is and the connectivity matrix is a sparse matrix of size . Given a positive integer value , our problem is to parcellate the cerebral cortex into spatially contiguous regions, such that each region possesses a high degree of structural homogeneity. Moreover, these parcellations must be stable and reproducible, as well as, consistent among members of a population sample with similar connectivity patterns. We first introduce our notion of a connectivity profile followed by a description of our method.
Iiia Connectivity Profile
For each voxel, the connectivity profile is the signature that discriminates a voxel from the rest of voxels based on connectivity. Parcellations are built by clustering voxels with similar connectivity profiles together. In principle, we can take the row of the connectivity matrix corresponding to a voxel as its connectivity profile, but that would be computationally expensive to process even if we compress each row into a list that contains only connectivity values above a certain threshold. In our approach, the connectivity profile of a voxel is computed as an array of weights, where each element represents the cumulative connectivity strengths of the voxel to a set of the regions determined initially by a predefined brain segmentation. Not only does the use of a coarser version of the connectivity profile leads to much more efficient computations, but it also helps to smooth out errors introduced by the tractography process through aggregation. More importantly, we will show later that our method converges to the same parcellation regardless of the initial segmentation used.
We explore several possibilities to initialize brain segmentations. An obvious choice is to use the regions of interests (ROIs) defined by any of the wellknown anatomical atlases such as the 90 regions of the AAL90 atlas. Note that the initial number of spatial regions is completely unrelated to the number
of parcellated regions and is used merely to initialize the connectivity profile of each voxel. Another possibility is to use a brain segmentation generated using a spatially constrained version of the kmeans++ algorithm with randomized centers
[24]. The third possibility that we consider is to spatially segment the volume into almost equalsize subcubes. The last two segmentation methods result in any specified number of contiguous regions. We will show that our method results in consistent and similar parcellations regardless of which connectivity profile we use.IiiB Spatially Constrained Similarity Graph
A spatialconstraint similarity graph, considerably sparser than the weighted graph defined by the connectivity matrix, is formed using spatial adjacency and the connectivity profiles as follows. The voxels define the nodes of our graph. Two nodes are connected by an edge if and only if the corresponding voxels lie within a sphere of radius . In our implementation, we have used such that the number of neighbors of any node is at most as shown in Fig. 1. Each edge is weighted by the similarity between the connectivity profiles of its end points. We can use any of several similarity metrics, including the correlation coefficient or the cosine function; our tests show that the results are very similar regardless of the similarity measure used. We assume from now on that we are using the correlation coefficient as our similarity measure between the connectivity profiles of two voxels.
Algorithm1 Iterative Parcellation Method 

1. Generate the connectivity matrix of a subject using probabilistic tractography. 
2. Construct a spatial graph as a sparse representation of the 3D brain. 
3. Initialize a random spatiallycoherent brain parcellation, to be used to define the connectivity profile of each voxel. 
Repeat 
4. Use the current brain parcellation to define the connectivity profiles of all the voxels based on the connectivity matrix. 
5. Apply spectral clustering algorithm to generate the brain parcellation of a predefined level of granularity. 
6. Measure the similarity between the new parcellation and the previous parcellation used to define connectivity profiles. 
Until the similarity measurement exceeds some threshold. 
7. Return the parcellation result. 
IiiC Minimum GraphCut Problem and Iterative Refinement
Our parcellation algorithm starts by partitioning our spatial similarity graph into several subgraphs with the objective of minimizing the total weight of the edges connecting the subgraphs subject to a constraint on the relative sizes of the subgraphs. More specifically, our objective function is to minimize the normalized cut rather than just the cut, which is standard in the literature (see for example [25, 26, 27]). This will more or less ensure that we won’t have subgraphs with very few vertices. The subgraphs induce a spatial segmentation of the 3D image data, which is then used to redefine the connectivity profile of each voxel, after which we iterate until the generated parcellations are almost unchanged. Our algorithm results in a solution where the voxels within the same region have similar connectivity profiles and voxels across different regions are relatively dissimilar. The most efficient method to solve the graph cut problem during each iteration is spectral clustering [25, 26, 27]. In particular, we use the normalized spectral clustering method, which can be summarized as follows, where is the weight matrix associated with the spatial similarity graph and is the number of desired regions.

Compute the normalized Laplacian matrix . is the diagonal matrix with each element .

Compute the eigenvectors of corresponding to the smallest eigenvalues.

Apply the kmeans clustering algorithm on the rows of the eigenvectors to obtain the final clusters.
To make the clustering result consistent against the random initializations in the kmeans step, we run the kmeans++ algorithm [24] several times and choose the result with the minimum withincluster sum of pointtocentroid distances [28]. Note that each run of the kmeans++ involves points (voxels) each of dimension . Algorithm1 provides a highlevel description of our method.
By applying the spectral clustering algorithm, we expect voxels within the same region to possess successively higher degrees of similarity in terms of structural connectivity during successive iterations. This iterative refinement approach converges to a stable parcellation as we will later show. At that point, we will also introduce a quantitative stopping criterion to be used to terminate the algorithm.
Iv Reproducibility and Stability Analysis
This section presents the methodology used and the results achieved to illustrate the reproducibility of our results both at the individual subject level and at the group level. We start by introducing two wellknown methods to quantitatively measure the similarity between two arbitrary clustering solutions of a dataset.
Iva Parcellation Similarity Metrics
We use the following metrics to measure the similarity between any two parcellations with the same level of granularity (that is, the same value of ). The cluster labels generated by our method are essentially arbitrary in the sense that regions with the same labels in two different parcellations are not necessarily spatially related. Moreover, as the level of granularity increases, we may not be able to determine a reasonable onetoone mapping between the regions. In this paper, we will use the following two metrics.
IvA1 Normalized Mutual Information (NMI)
Mutual information has been used in information theory to measure the relationship between any two probability distributions
[29]. Essentially, it provides a measure of how similar the joint distribution of two random variables is to the product of their marginal distributions. The normalized mutual information (NMI) is an approximate discrete version commonly used to measure the similarity between pairs of clusters of a dataset. It has a value between
and , with the value indicating the two clusterings are completely independent of each other, whereas the value indicates that they are identical. The NMI between two parcellations and is defined as(1) 
The entropy for individual parcellations and the mutual information are approximated from the marginal and joint distributions as follows. is the set of voxels that are labeled as in parcellation . Similarly, is the set of voxels that are labeled as in parcellation .
(2) 
(3) 
(4) 
The marginal probability for any label is approximated as the fraction of the number of voxels with that label over the total number of voxels. Similarly, the joint distribution is computed as the fraction of the number of voxels with label in parcellation and with label in parcellation over the total number of voxels. Here the total number of voxels is the number of voxels in the AAL mask, which is the same for all parcellations.
(5) 
(6) 
As stated previously, if the parcellations are identical, except for label reordering, then the mutual information and the entropy for each parcellation are equal, and hence the resulting NMI is equal to . The higher the value of the NMI, the more similar the two parcellations are.
IvA2 Dice’s Coefficient
Dice’s coefficient measures the similarity directly from the clustering matrix defined by
(7) 
where
is the vector that contains the label of every voxel. That is, the
entry of the clustering matrix is equal to if, and only if, voxels and belong to the same region. Given the matrices corresponding to two parcellations, the Dice’s coefficient is computed as twice the number of common nonzero entries normalized by the total number of nonzero entries in both clustering matrices [30]. Dice’s coefficient is always between and , and the larger it is, the more similar the two parcellations are.Both NMI and Dice’s coefficient capture the similarity between two parcellations of any level of granularity. But for NMI, the joint distribution is in general greater than the product of the marginal distributions , which may cause NMI to overestimate the similarity between the parcellations. We note that in general NMI is larger than the Dice’s coefficient.
IvB Stability and Reproducibility of Subject Parecellations
The main factors that affect the parcellations generated by our algorithm are the choice of the brain segmentation that is used to define connectivity profiles and the random initialization of the kmeans++ algorithm used in the last step of spectral clustering. The effect of random initialization could be mitigated by running the kmeans++ initialization [24] several times, as stated before. Here we consider only the effect of the initial brain segmentation used to define the connectivity profiles. Note that the connectivity profiles encapsulate the only information we have from the DTI data for each subject since the rest of the information captured by the spatial similarity graph does not involve anything related to the connectivity data.
The initial brain segmentation can be defined as any arbitrary spatial segmentation of the brain mask. However it would be more intuitive to use initial segmentations with comparable region sizes. Note that the number of regions in the initial segmentation is completely independent of the desired number of parcellated regions. The main result of this section is that, regardless of the initial segmentation and for any value of , our algorithm will converge to a stable parcellation for each subject that captures the critical connectivity information embodied in the DTI data.
The following brain segmentations, shown in Fig. 2, are used to define initial connectivity profiles that are used to generate 40region parcellations.
IvB1 Automated Anatomical Labeling (AAL)
The AAL atlas defines 90 anatomical regions with 45 volumes of interest in each hemisphere, which were delineated following the courses of the main sulci of the brain. In fact, we have used the AAL mask to define cerebral cortex to be parcellated. Here we use it as the initial segmentation that defines the connectivity profiles of all voxels.
IvB2 Random Spatial Segmentation
We generate a random spatial segmentation with any level of granularity using the kmeans++ algorithm, based only on spatial coordinate. The purpose is to generate segmentations that have regions that are spatially contiguous and compact. Random initialization of the kmeans++ using 90, 1000, and 2000 regions were generated.
IvB3 Regular Grid Segmentation
A regular grid segmentation consists of a set of almost equalsized cubes that cover the whole brain. The cube size determines the granularity of the segmentation. We set the cube size to 5 and therefore, this segmentation consists of 1,987 cubes that cover all brain voxels.
IvB4 Synthetic Parcellations
The synthetic parcellations are generated from the similarity graph in which the weights of all the edges are set to 1. A similar approach was reported in [19], which concludes that the synthetic parcellations are almost as good as real parcellations in terms of FC cluster homogeneity. We use the synthetic parcellation with the same number of regions to define the connectivity profile and show that starting from the same synthetic parcellation, our iterative method will incorporate the underlying connectivity information and converge to the subject’s characteristic parcellations.
For each subject, we show that our algorithm will yield essentially the same parcellation for all these initial segementations. The NMI and Dice’s coefficient are computed between all pairs of parcellations generated from different brain segmentations after each iteration. Tables II through IX show the corresponding results.
Abbreviations 

AAL: Automatic Anatomical Labeling. 
R#: Random brain segmentation with # number of regions. 
Grid: Regular grid segmentaton with grid size . 
S#: Synthetic parcellation generated from spatialconstrained similarity graph with all edges’ weights as . 
Segmentation  AAL  R90  R1000  R2000  Grid  S40 

AAL  1.0000  0.8673  0.8791  0.8497  0.8734  0.8568 
R90  0.8673  1.0000  0.9009  0.8622  0.8849  0.8804 
R1000  0.8791  0.9009  1.0000  0.8774  0.9039  0.8657 
R2000  0.8497  0.8622  0.8774  1.0000  0.8855  0.8433 
Grid  0.8734  0.8849  0.9039  0.8855  1.0000  0.8666 
S40  0.8568  0.8804  0.8657  0.8433  0.8666  1.0000 
Segmentation  AAL  R90  R1000  R2000  Grid  S40 

AAL  1.0000  0.9173  0.9085  0.9117  0.8999  0.8990 
R90  0.9173  1.0000  0.9042  0.9031  0.8880  0.8954 
R1000  0.9085  0.9042  1.0000  0.9018  0.8971  0.8908 
R2000  0.9117  0.9031  0.9018  1.0000  0.8840  0.8780 
Grid  0.8999  0.8880  0.8971  0.8840  1.0000  0.8980 
S40  0.8990  0.8954  0.8908  0.8780  0.8980  1.0000 
Segmentation  AAL  R90  R1000  R2000  Grid  S40 

AAL  1.0000  0.9194  0.8940  0.9412  0.8985  0.9035 
R90  0.9194  1.0000  0.9050  0.9266  0.9242  0.9266 
R1000  0.8940  0.9050  1.0000  0.9103  0.8899  0.8992 
R2000  0.9412  0.9266  0.9103  1.0000  0.9087  0.9064 
Grid  0.8985  0.9242  0.8899  0.9087  1.0000  0.9101 
S40  0.9035  0.9266  0.8992  0.9064  0.9101  1.0000 
Segmentation  AAL  R90  R1000  R2000  Grid  S40 

AAL  1.0000  0.9405  0.9052  0.9382  0.9004  0.9258 
R90  0.9405  1.0000  0.9181  0.9211  0.9141  0.9494 
R1000  0.9052  0.9181  1.0000  0.8949  0.8828  0.9148 
R2000  0.9382  0.9211  0.8949  1.0000  0.9225  0.9075 
Grid  0.9004  0.9141  0.8828  0.9225  1.0000  0.9020 
S40  0.9258  0.9494  0.9148  0.9075  0.9020  1.0000 
The above tables show that similarity, in terms of NMI or Dice’s coefficients, between all pairs of parcellations from different brain segmentations increase with the number of iteration. After the iteration, most of NMI values are
Segmentation  AAL  R90  R1000  R2000  Grid  S40 

AAL  1.0000  0.7448  0.7709  0.7083  0.7666  0.7230 
R90  0.7448  1.0000  0.8374  0.7456  0.8046  0.7810 
R1000  0.7709  0.8374  1.0000  0.7778  0.8396  0.7553 
R2000  0.7083  0.7456  0.7778  1.0000  0.7874  0.7010 
Grid  0.7666  0.8046  0.8396  0.7874  1.0000  0.7585 
S40  0.7230  0.7810  0.7553  0.7010  0.7585  1.0000 
Segmentation  AAL  R90  R1000  R2000  Grid  S40 

AAL  1.0000  0.8557  0.8378  0.8483  0.8142  0.8236 
R90  0.8557  1.0000  0.8203  0.8199  0.7852  0.8068 
R1000  0.8378  0.8203  1.0000  0.8205  0.8097  0.7936 
R2000  0.8483  0.8199  0.8205  1.0000  0.7737  0.7717 
Grid  0.8142  0.7852  0.8097  0.7737  1.0000  0.8099 
S40  0.8236  0.8068  0.7936  0.7717  0.8099  1.0000 
Segmentation  AAL  R90  R1000  R2000  Grid  S40 

AAL  1.0000  0.8543  0.8007  0.9021  0.8047  0.8197 
R90  0.8543  1.0000  0.8316  0.8759  0.8700  0.8787 
R1000  0.8007  0.8316  1.0000  0.8352  0.7932  0.8103 
R2000  0.9021  0.8759  0.8352  1.0000  0.8317  0.8341 
Grid  0.8047  0.8700  0.7932  0.8317  1.0000  0.8378 
S40  0.8197  0.8787  0.8103  0.8341  0.8378  1.0000 
Segmentation  AAL  R90  R1000  R2000  Grid  S40 

AAL  1.0000  0.9070  0.8279  0.8936  0.8125  0.8786 
R90  0.9070  1.0000  0.8526  0.8547  0.8387  0.9247 
R1000  0.8279  0.8526  1.0000  0.7960  0.7653  0.8472 
R2000  0.8936  0.8547  0.7960  1.0000  0.8699  0.8290 
Grid  0.8125  0.8387  0.7653  0.8699  1.0000  0.8171 
S40  0.8786  0.9247  0.8472  0.8290  0.8171  1.0000 
above and most of Dice’s coefficients are above , which indicates very consistent parcellations. The iterative method mitigates the random effect caused by the initial arbitrary segmentations and leads to stable parcellations regardless of the initial definition of connectivity profiles. Note that the kmeans++ step of our algorithm introduces a small uncertainty, which explains the few deviations in the tables above. However, it is clear that the parcellations generated at the end of third and fourth iterations are very close to each other. Fig. 3 illustrates the increase of the average, over all the different intial segmentations, of NMI and Dice’s coefficient after each iteration.
Table X and XI show the similarity between parcellations in consecutive iteration stages for a given initial segmentation. Taking into consideration the uncertainty introduced by the kmeans++ step of our algorithm, it is clear that successive iterations of the algorithm generate more similar parcellations, for any of the initialization methods of the connectivity profiles.
Segmentation  /  /  / 

AAL  0.9131  0.9353  0.9539 
R90  0.9125  0.9325  0.9631 
R1000  0.9199  0.9292  0.9198 
R2000  0.8873  0.9224  0.9314 
Grid  0.9185  0.9380  0.9267 
S40  0.9151  0.9341  0.9486 
Segmentation  /  /  / 

AAL  0.8448  0.8886  0.9179 
R90  0.8402  0.8838  0.9475 
R1000  0.8714  0.8787  0.8495 
R2000  0.7997  0.8556  0.8697 
Grid  0.8700  0.8960  0.8709 
S40  0.8520  0.8813  0.9124 
IvC Group Consistency and Atlas Generation
Table XII shows the average similarity between every pair of parcellations from subjects in the NC group. As can be seen from entries in this table, the parcellations are reasonably consistent within the NC group; similar results hold for the SZ group.
Table XIII shows the average similarity between a random parcellation and the parcellations generated for the subjects in the NC group. As can be seen from the column of the Dice coefficients, our generated parcellations are significantly different from random parcellations. As mentioned before, the NMI coefficients tend to overestimate the similarity between the parcellations, and hence the slightly higher numbers in the second column of Table XIII, but still significantly lower than the similarity of the generated parcellations between the subjects of the NC group (Table XII).
Number of regions  NMI  Dice’s Coefficient 

40  0.7734  0.5503 
50  0.7786  0.5323 
60  0.7939  0.5507 
70  0.7988  0.5415 
90  0.8040  0.5326 
120  0.8151  0.5287 
Number of regions  NMI  Dice’s Coefficient 

40  0.6923  0.3994 
50  0.6857  0.3679 
60  0.7140  0.3871 
70  0.7164  0.3771 
90  0.7393  0.3995 
120  0.7452  0.3720 
Atlas generation: We employ the following atlas generation procedure to further validate withingroup consistency. In generating our parcellations, regions are labeled randomly; therefore, regions with the same index are not necessarily spatially matched. The first step of atlas generation is to align all parcellations to a reference parcellation that is randomly chosen from the group. We relabel each of the regions using the region index of the reference parcellation that shares the largest overlapped area. For a group of relabeled subjects, we generate an atlas as follows. For each voxel, we associate a vector of length consisting of the label index from each subject. We set the voxel’s label to be the most frequent index in its vector, thereby generating an atlas as shown in Fig. 4(a).
The confidence map is a grayscale image, where the gray level of each voxel represents the uncertainty of the labeling across all subjects, in terms of the proportion of the frequent index in the length vector. The confidence map in Fig. 4(b) shows that for almost all voxels, except possibly along the region boundaries, most subjects are consistently labeled as indicated by the atlas.
V Discriminative Analysis
In this section, we show how our parcellations can be used to shed light on structural differences between different experimental groups. We have selected cases that were known to have significant differences in white matter integrity and structural networks. We include a discussion of three significant different groups: Male vs Female, Age groups, and SZ vs NC. The subject demographics in our data are shown in Table I.
Similarity comparison  pvalue  tstatistic 

Male vs Malefemale  9.0286e5  3.9205 
Female vs Malefemale  1.4025e8  5.6911 
Male vs Female  3.3752e20  9.2766 
Similarity comparison  pvalue  tstatistic 

Group I vs Group III  0.5133  0.6539 
Group II vs Group III  0.5566  0.5880 
Group I vs Group II  0.2009  1.2798 
Group II vs Group IIIII  2.4175e4  3.6800 
Group III vs Group IIIII  0.0028  2.9921 
Group II vs Group III  6.6455e11  6.5814 
Similarity comparison  pvalue  tstatistic 

NC vs NCSZ  2.9995e89  20.2514 
SZ vs NCSZ  1.4025e8  10.4867 
NC vs SZ  1.1636e198  30.9687 
We adopt two strategies to discriminate among experimental groups. The first strategy focuses on the heterogeneity of the parcellations within a group sample and is based on the pairwise similarities between all pairs of parcellations in a group. As shown in the previous section, our parcellations are consistently labeled across subjects of a population sample except for some boundary voxels. The boundary differences reflected by the pairwise similarity may be used to determine some features that are specific to particular subgroups. In particular, we will show that the parcellations of the subjects in the SZ group have substantially more variability that those of the NC group and that healthy males seem to exhibit more heterogeneity within their group than healthy females do.
The other strategy is to analyze the structural connectivity network built from the parcellations and tractography results, where the nodes correspond to the parcellation regions and the edge weights correspond to the cumulative connectivity strength between voxels in the two regions; this strategy is commonly used in the literature [31, 32]. Our iterative method generates parcellations where voxels within the same region share similar connectivity profiles that are defined as the accumulated connectivity strength to every other region. Hence the parcellations obtained are consistent with the structural connectivity network where the connectivity pattern of each node summarizes the connectivity profile of the voxels in that region. The “connectome” analysis shows more powerful discriminative ability of our parcellations than using existing anatomical atlases.
Va Similaritybased Analysis
The analysis is based solely on pairwise similarity between pairs of parcellations. We start by analyzing the similarities relative to female and male subgroups of the NC sample using parcellations with 90 regions and NMI as the similarity measure. Results corresponding to other values of or to the use of Dice’s coefficient as a similarity measure exhibit the same patterns.
A twosample ttest was performed on the pairwise similarity between parcellations within the female subgroup, the male subgroup, and pairwise similarity between parcellations from different groups. The pvalue and tstatistics are shown in Table
XIV.Our results indicate that the similarity of parcellations of either healthy females or healthy males is significantly different that the similarity between a female parcellation and a male parcellation. More importantly, the last row of Table XIV indicates that the female parcellations are much closer to each other that the male parcellations, which may indicate more structural brain heterogeneity among the male subjects than among the female subjects.
In the age study, we divide the NC sample into three age groups, which are: Group I: Age 1829, 23 subjects; Group II: Age 3049, 28 subjects; Group III: Age 5062, 25 subjects. The pvalue and tstatistics are shown in table XV. For parcellations in Group I and II, their similarities did not show significant differences. But parcellation similarities within Group II have significant differences than the similarity between parcellations in Group II and Group III. And parcellations in Group III are more heterogeneous than those in Group I and Group II.
Perhaps more interesting is the similarity comparison of the parcellations of NC vs SZ groups, illustrated in XVI. These results clearly show that the parcellations within the SZ group show much more heterogeneity than those for the NC group.
VB Connectome Analysis
There is much evidence supporting that schizophrenia is a disorder related to brain connectivity. Our previous work analyzed the structural connectivity network based on individual parcellations refined from the AAL atlas to discriminate schizophrenic and normal control groups with high accuracy [33]. We apply the same strategy to discriminate among the two groups using the 5region parcellations generated from our iterative approach as shown in Fig. 5. The reason we choose a small number of regions is the high consistency across subjects, and because regions can be trivially mapped spatially, onetoone, between any pair of parcellations.
We first relabel all parcellations based on a randomly selected subject. The connectomes are built by defining nodes as regions in the parcellation and edge weights represent cumulative connectivity strength between regions. Table XVII shows the pvalue and tstatistcs of pairwise connectivity between the two groups.
A large portion of pairwise connectivity shows significant differences between the two groups. Moreover, most pairwise connectivity strengths of NC subjects are greater than those of SZ subjects, a fact that is consistent with the previous findings that SZ subjects have decreased interhemispheric and intrahemispheric connectivity [34]
. We select the three pairs with the most significant pvalues and use their connectivity values as features to train a support vector machine classifier. We test our classifier using a 10fold crossvalidation and are able to achieve up to
accuracy, which is significantly better than our earlier result in [33].Label  1  2  3  4  5 

1  0.0137  0.0395  0.0038  0.0989  0.4954 
2  0.0300  0.0279  0.0029  0.0514  0.0058 
3  0.0029  0.0042  4.11e4  3.04e5  0.0019 
4  0.0798  0.0579  3.14e5  0.0775  0.1127 
5  0.4928  0.0083  0.0028  0.1216  0.2377 
Label  1  2  3  4  5 

1  2.5019  2.0809  2.9479  1.6630  0.6839 
2  2.1963  2.2250  3.0391  1.9671  2.8055 
3  3.0437  2.9197  3.6325  4.3330  3.1707 
4  1.7666  1.9149  4.3244  1.7806  1.5976 
5  0.6879  2.6846  3.0493  1.5588  1.1865 
We also carried out an additional test to confirm the discriminative capabilities of our parcellations. Consider 40region parcellations for the two population samples and the corresponding structural connectivity networks. For each edge, we perform a twosample ttest between the sequence of connectivity strengths of the NC group and that of the SZ group. We find that many of the edges result in pvalues less than 0.00005 as shown in Fig. 6. Fig. 7 shows the corresponding results when we use the AAL atlas and determine connectivity strengths on the corresponding edges (pairs of regions). As shown by the binary maps, the proportion of entries in the AALbased network which have significant connectivity strength difference between healthy controls and schizophrenic subjects is much smaller than that those obtained through the network built from our 40region parcellations.
It seems clear that our pacellations seem to effectively capture the inherent connectivity information present in the DTI data and hence are more suitable for studying structural connectivity than anatomical atlases.
Vi Conclusion
We herein propose a sparse representation of the connectivity information derived from DTI data and a novel method that generates wholebrain parcellations for any number of regions. Our method is computationally efficient and is able to consistently generate stable and reproducible parcellations that seem to capture inherent structural patterns present in the data. The results are validated through the use of a number of methods, including subject reproducibility, group consistency, and discriminative characteristics between different population groups.
Acknowledgment
We gratefully acknowledge funding provided by The University of Maryland/Mpowering the State through the Center for Healthrelated Informatics and Bioimaging (CHIB).
References
 [1] E. Bullmore and O. Sporns, “Complex brain networks: graph theoretical analysis of structural and functional systems,” Nature Reviews Neuroscience, vol. 10, no. 3, pp. 186–198, 2009.
 [2] O. Sporns, “Structure and function of complex brain networks,” Dialogues in clinical neuroscience, vol. 15, no. 3, p. 247, 2013.
 [3] O. Sporns, D. R. Chialvo, M. Kaiser, and C. C. Hilgetag, “Organization, development and function of complex brain networks,” Trends in cognitive sciences, vol. 8, no. 9, pp. 418–425, 2004.
 [4] E. G. Dopper, S. A. Rombouts, L. C. Jiskoot, T. den Heijer, J. R. A. de Graaf, I. de Koning, A. R. Hammerschlag, H. Seelaar, W. W. Seeley, I. M. Veer et al., “Structural and functional brain connectivity in presymptomatic familial frontotemporal dementia,” Neurology, vol. 83, no. 2, pp. e19–e26, 2014.
 [5] M. P. van den Heuvel, R. C. Mandl, C. J. Stam, R. S. Kahn, and H. E. H. Pol, “Aberrant frontal and temporal complex network structure in schizophrenia: a graph theoretical analysis,” The Journal of Neuroscience, vol. 30, no. 47, pp. 15 915–15 926, 2010.
 [6] K. Amunts, A. Schleicher, and K. Zilles, “Cytoarchitecture of the cerebral cortex—more than localization,” Neuroimage, vol. 37, no. 4, pp. 1061–1065, 2007.
 [7] O. Vogt, “Die myeloarchitektonik des isocortex parietalis,” J Psychol Neurol, vol. 18, pp. 379–390, 1911.
 [8] K. Zilles and K. Amunts, “Receptor mapping: architecture of the human cerebral cortex,” Current opinion in neurology, vol. 22, no. 4, pp. 331–339, 2009.
 [9] K. Brodmann, Vergleichende Lokalisationslehre der Grosshirnrinde in ihren Prinzipien dargestellt auf Grund des Zellenbaues. Barth, 1909.
 [10] N. TzourioMazoyer, B. Landeau, D. Papathanassiou, F. Crivello, O. Etard, N. Delcroix, B. Mazoyer, and M. Joliot, “Automated anatomical labeling of activations in spm using a macroscopic anatomical parcellation of the mni mri singlesubject brain,” Neuroimage, vol. 15, no. 1, pp. 273–289, 2002.
 [11] S. B. Eickhoff, K. E. Stephan, H. Mohlberg, C. Grefkes, G. R. Fink, K. Amunts, and K. Zilles, “A new spm toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data,” Neuroimage, vol. 25, no. 4, pp. 1325–1335, 2005.
 [12] M. Ruschel, T. R. Knösche, A. D. Friederici, R. Turner, S. Geyer, and A. Anwander, “Connectivity architecture and subdivision of the human inferior parietal cortex revealed by diffusion mri,” Cerebral Cortex, vol. 24, no. 9, pp. 2436–2448, 2014.
 [13] R. B. Mars, S. Jbabdi, J. Sallet, J. X. O’Reilly, P. L. Croxson, E. Olivier, M. P. Noonan, C. Bergmann, A. S. Mitchell, M. G. Baxter et al., “Diffusionweighted imaging tractographybased parcellation of the human parietal cortex and comparison with human and macaque restingstate functional connectivity,” The Journal of Neuroscience, vol. 31, no. 11, pp. 4087–4100, 2011.
 [14] R. B. Mars, J. Sallet, U. Schüffelgen, S. Jbabdi, I. Toni, and M. F. Rushworth, “Connectivitybased subdivisions of the human right “temporoparietal junction area”: evidence for different areas participating in different cortical networks,” Cerebral cortex, vol. 22, no. 8, pp. 1894–1903, 2012.
 [15] J. Sallet, R. B. Mars, M. P. Noonan, F.X. Neubert, S. Jbabdi, J. X. O’Reilly, N. Filippini, A. G. Thomas, and M. F. Rushworth, “The organization of dorsal frontal cortex in humans and macaques,” The Journal of Neuroscience, vol. 33, no. 30, pp. 12 255–12 274, 2013.
 [16] F.X. Neubert, R. B. Mars, A. G. Thomas, J. Sallet, and M. F. Rushworth, “Comparison of human ventral frontal cortex areas for cognitive control and language with areas in monkey frontal cortex,” Neuron, vol. 81, no. 3, pp. 700–713, 2014.
 [17] F.X. Neubert, R. B. Mars, J. Sallet, and M. F. Rushworth, “Connectivity reveals relationship of brain areas for rewardguided learning and decision making in human and monkey frontal cortex,” Proceedings of the National Academy of Sciences, vol. 112, no. 20, pp. E2695–E2704, 2015.
 [18] A. Anwander, M. Tittgemeyer, D. Y. von Cramon, A. D. Friederici, and T. R. Knösche, “Connectivitybased parcellation of broca’s area,” Cerebral Cortex, vol. 17, no. 4, pp. 816–825, 2007.
 [19] R. C. Craddock, G. A. James, P. E. Holtzheimer, X. P. Hu, and H. S. Mayberg, “A whole brain fmri atlas generated via spatially constrained spectral clustering,” Human brain mapping, vol. 33, no. 8, pp. 1914–1928, 2012.
 [20] D. MorenoDominguez, A. Anwander, and T. R. Knösche, “A hierarchical method for wholebrain connectivitybased parcellation,” Human brain mapping, vol. 35, no. 10, pp. 5000–5025, 2014.
 [21] J. L. Andersson, M. Jenkinson, S. Smith et al., “Nonlinear registration, aka spatial normalisation fmrib technical report tr07ja2,” FMRIB Analysis Group of the University of Oxford, 2007.
 [22] T. Behrens, H. J. Berg, S. Jbabdi, M. Rushworth, and M. Woolrich, “Probabilistic diffusion tractography with multiple fibre orientations: What can we gain?” Neuroimage, vol. 34, no. 1, pp. 144–155, 2007.
 [23] T. Behrens, M. Woolrich, M. Jenkinson, H. JohansenBerg, R. Nunes, S. Clare, P. Matthews, J. Brady, and S. Smith, “Characterization and propagation of uncertainty in diffusionweighted mr imaging,” Magnetic resonance in medicine, vol. 50, no. 5, pp. 1077–1088, 2003.
 [24] D. Arthur and S. Vassilvitskii, “kmeans++: The advantages of careful seeding,” in Proceedings of the eighteenth annual ACMSIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007, pp. 1027–1035.
 [25] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and computing, vol. 17, no. 4, pp. 395–416, 2007.

[26]
A. Y. Ng, M. I. Jordan, Y. Weiss et al.
, “On spectral clustering: Analysis and an algorithm,”
Advances in neural information processing systems, vol. 2, pp. 849–856, 2002.  [27] S. X. Yu and J. Shi, “Multiclass spectral clustering,” in Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on. IEEE, 2003, pp. 313–319.
 [28] http://www.mathworks.com/help/stats/kmeans.html.

[29]
N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for
clusterings comparison: Variants, properties, normalization and correction
for chance,”
The Journal of Machine Learning Research
, vol. 11, pp. 2837–2854, 2010.  [30] L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945.
 [31] M. Ingalhalikar, A. Smith, D. Parker, T. D. Satterthwaite, M. A. Elliott, K. Ruparel, H. Hakonarson, R. E. Gur, R. C. Gur, and R. Verma, “Sex differences in the structural connectome of the human brain,” Proceedings of the National Academy of Sciences, vol. 111, no. 2, pp. 823–828, 2014.
 [32] G. Gong, P. RosaNeto, F. Carbonell, Z. J. Chen, Y. He, and A. C. Evans, “Ageand genderrelated differences in the cortical anatomical network,” The Journal of neuroscience, vol. 29, no. 50, pp. 15 684–15 693, 2009.
 [33] Q. Wang, R. Chen, J. JaJa, Y Jin, L. Hong, and E. Herzkovits, “ConnectivityBased Brain Parcellation: A ConnectivityBased Atlas for Schizophrenia Research”, submitted to Neuroinformatics.
 [34] E. H. Herskovits, L. E. Hong, P. Kochunov, H. Sampath, and R. Chen, “Edgecentered dti connectivity analysis: Application to schizophrenia,” Neuroinformatics, pp. 1–9, 2015.
Comments
There are no comments yet.