1 Introduction
The function of macroscopic neural networks is constrained by the integrity of structural connections between disparate regions. This form of longdistance (i.e., centimeters) communication relies on dense bundles of axons that are known as white matter
[3]. To prevent degradation of action potentials across long distances, these fiber bundles are supported by the myelin sheath, nonneuronal glial cells that insulate axons and facilitate communication along the fascicle. As a result, the integrity of the myelin sheath is critical for synchronizing information transmission between distal brain areas
[2], fostering the ability of these networks to adapt over time [4]. Thus, variability in the myelin sheath, as well as other cellular support mechanisms, would contribute to variability in functional coherence across the circuit.To study the integrity of structural connectivity, we recently introduced the concept of the local connectome. This is defined as the pattern of fiber systems (i.e., number of fibers, orientation, and size) within a voxel, as well as immediate connectivity between adjacent voxels, that can be quantified using diffusion MRI (dMRI) by measuring the fiberwise density of microscopic water diffusion within a voxel [1]
. The collection of these multifiber diffusion density measurements within all white matter voxels is termed the local connectome fingerprint (LCF). The LCF is a highdimensional feature vector that describes the unique configuration of the structural connectome along the segments of white matter pathways
[5]. Thus, the LCF provides a diffusioninformed measure along the fascicles that supports interregional communication, rather than determining the start and end positions of a particular fiber bundle.Since the LCF measures the local integrity along white matter bundles that connect regions across the entire brain, it reflects the overall communication capacity of the brain [2]. Hence, we expect to see that variations in the LCF should also correlate with those in the dynamics of brain networks, measured by connectivity patterns in the restingstate functional MRI (fMRI).
To formally validate this intuition, we employ statistical approaches to examine the following hypotheses:
Hypothesis 1 Similarity in the LCF, between individuals, is associated with similarity in their functional connectivity patterns measured with restingstate fMRI.
Hypothesis 2 Variability in specific segments of the LCF is associated with patterns of functional connectivity in specific circuits.
2 Materials and Methods
We summarize our abbreviations and notation in Table 1.
Notation  Definition 

LCF  Local connectome fingerprint 
FCG  Functional correlation graph 
HCP  The Human Connectome Project (dataset) 
Number of subjects (793)  
Dimension of LCF vectors (433,386)  
Dimension of FCG vectors (195,625)  
dimensional LCF vector of subject  
dimensional FCG vector of subject  
The scaled Euclidean distance between LCFs (1)  
The Pearson correlation distance between FCGs (2.2.1)  
Scaled Euclidean distance matrix between LCFs  
Pearson correlation distance matrix between FCGs  
matrix containing the LCFs as rows  
matrix containing the FCGs as rows  
,  The  and norm of a realvalued vector 
The permutation group on 
2.1 Data Acquisition
2.1.1 Participants
We used publicly available dMRI and fMRI data from the S900 (2015) release of the Human Connectome Project (HCP) [6], acquired by Washington University in St. Louis and the University of Minnesota. Out of the 900 participants released, 841 genetically unrelated participants (370 male, ages 2237, mean age 28.76) had viable dMRI datasets. Among them, participants had at least one viable restingstate fMRI measurement. Our analysis was restricted to this subsample. All data collection procedures were approved by the institutional review boards at Washington University in St. Louis and the University of Minnesota. The post hoc data analysis was approved as exempt by the institutional review board at Carnegie Mellon University, in accordance with 45 CFR 46.101(b)(4) (IRB Protocol Number: HS14139).
2.1.2 Diffusion MRI Acquisition
The dMRI data were acquired on a Siemens 3T Skyra scanner using a 2D spinecho singleshot multiband EPI sequence with a multiband factor of 3 and monopolar gradient pulse. The spatial resolution was 1.25 mm isotropic (TR = 5500 ms, TE = 89.50 ms). The bvalues were 1000, 2000, and 3000 s/mm2 . The total number of diffusion sampling directions was 90 for each of the three shells in addition to 6 b0 images. The total scanning time was approximately 55 minutes.
2.1.3 LCF Reconstruction
An outline of the pipeline for generating LCFs is shown in Fig. 1. The dMRI data for each subject was reconstructed in a common stereotaxic space using qspace diffeomorphic reconstruction (QSDR) [7], a nonlinear registration approach that directly reconstructs water diffusion density patterns into a common stereotaxic space at 1 mm resolution. The LCF reconstruction was conducted using DSI Studio (http://dsistudio.labsolver.org), an opensource diffusion MRI analysis tool for connectome analysis. To compute the LCF, the axonal direction in each voxel was derived from the HCP dataset, and all of the data and source code for this analysis are publicly available on the same website.
A spin distribution function (SDF) sampling framework was used to provide a consistent set of directions to sample the magnitude of SDFs along axonal directions in the cerebral white matter. Since each voxel may have more than one axonal direction, multiple measurements were extracted from the SDF for voxels that contained crossing fibers, while a single measurement was extracted for voxels with fibers in a single direction. The appropriate number of density measurements from each voxel was sampled by the leftposteriorsuperior voxel order and compiled into a sequence of scalar values. Gray matter was excluded using the ICBM152 white matter mask (MacConnell Brain Imaging Centre, McGill University, Canada). The cerebellum was also excluded due to different slice coverage in cerebellum across participants. Since the density measurement has arbitrary units, the LCF was scaled to make the variance equal to 1
[5]. For each subject , we denote this highdimensional LCF of the th subject, across sampled directions, as . The collection of all LCFs are compactly represented as a data matrix with each LCF as a row vector.2.1.4 Functional MRI Acquisition & Processing
We analyzed the minimally processed restingstate fMRI data acquired as part of the Human Connectome Project (HCP) [8, 6] which used a multiband gradient echoplanar imaging protocol (see [9] for details on aquisition parameters). The dataset contains volumetric NIFTI data for restingstate fMRI scans (14 minutes each), motion parameters, and physiological data. Only data for the first restingstate scan collected at the AP phase encoding direction were used for analyses. Using these measurements, we computed the average BOLD (bloodoxygenlevel dependent) signals at each of the 626 regions of interest (ROIs) [10]
and regressed out the linear effects of the noise terms via ordinary leastsquares (OLS). The 16 noise terms include the global signal, 12 motion parameters (6 estimates from a rigidbody transformation to the SBRef image acquired at the start of each scan; 6 temporal derivatives of these estimates), and the top3 principal component projections of the voxellevel white matter signals (measured at each of 2,258 voxels and 840 seconds). The resulting residual terms were then filtered by a firstorder Butterworth bandpass filter
[11] between frequencies 0.08Hz and 0.15Hz.2.1.5 Functional Correlation Graph Construction
For each subject, given a preprocessed time series (840s; 1Hz) at each ROI, we computed the functional correlation graph (FCG), alternatively called the functional connectome fingerprint in [12], by computing the Pearson correlation between time series at every pair of ROIs. For each subject , we use to denote the vector of these Pearson correlations, which we collectively refer to as the th FCG. The collection of all FCGs are compactly represented as a data matrix with each FCG as a row vector.
2.2 Statistical Inference of Distancebased Correlations
Our first goal is to test whether there is a statistically significant relationship between LCFs and FCGs. However, because both the structural and functional feature vectors are highdimensional, fully multivariate statistical tests of dependence are intractable and uninterpretable. This means that we need to find a way to effectively reduce the dimensionality of each feature vector.
For each pair of subjects, we first compute the pairwise distance between their feature vectors. This gives us one distance matrix between their structural features (LCFs) and another between their functional features (FCGs). Then, we measure the correlation between the resulting pair of structural and functional distance matrices.
Our hypothesis states that if two subjects have similar LCFs, then they are more likely to also have similar FCGs. This hypothesis derives from previous research that found (a) similar LCFs imply genetic similarity [5] and (b) identical FCGs imply that the two graphs most likely come from the same individual [12]. By formally defining a notion of similarity, it is possible to derive distributionfree statistical inference methods that can test whether the two highdimensional feature vectors are correlated or not. This approach overcomes the high dimensionality while being statistically and theoretically rigorous.
2.2.1 Choice of Distance Metrics
In [5], Yeh et al. establish that LCFs are highly specific to each individual. More precisely, they show that the Euclidean distance between any pair of LCFs effectively captures the genetic (and temporal) difference between the two measurements, achieving 100% accuracy across 17,398 leaveoneout identification tasks. Therefore, to quantify individual variability in structural features, we use the Euclidean distance, scaled by the number of features as in [5]:
(1) 
To estimate distance between functional features, we follow the approach that Finn et al. [12] used on FCGs of the Q2released version of the HCP dataset. They successfully predicted identity with 92.994.4% test set accuracy using the Pearson correlation, and the accuracy increased to 9899% when comparing specific subnetworks (the medial frontal network and the frontoparietal network). Since our goal is to capture individual variability, not maximize prediction accuracy, we use the Pearson correlation distance on the entire FCG:
(2) 
where denotes the Pearson correlation and denotes the mean of all entries in the vector . We note that is not a proper distance metric in the mathematical sense, because it does not satisfy positive definiteness or triangle inequality. It is nevertheless nonnegative, symmetric, and is exactly zero when the two inputs are identical (it is also zero when two inputs are scalar multiples of each other).
Given these choices of metrics, we can represent all such distances on our data compactly in two distance matrices, and , such that and .
2.2.2 Setting Up a Valid Hypothesis Test
In general, it is highly nontrivial to set up a proper statistical test comparing distance matrices, because the entries of each distance matrix are not independent from each other. Intuitively, for any pair of subjects and , the distance between the th and th feature vectors is correlated with the distance between the th feature vector and any other feature vector. Thus, standard statistical approaches that rely on the i.i.d. assumption will not give valid results if they are naïvely applied to distance matrices.
We instead use the distance matrices to construct null and alternative hypotheses and derive proper statistical inference strategies. Given independent copies of random vectors , where
is the joint distribution of
and , we test(3) 
where
(4) 
In short,
is the Pearson correlation between the two random distances, each of which is a function of two independent and identically distributed random variables. In our approach, the null hypothesis states that the Euclidean distance between the LCFs of two subjects is uncorrelated with the correlation distance between their FCGs. The alternative hypothesis states that the two distances are in fact positively correlated. Note that it is natural to consider a onesided hypothesis here because we know that both distances are likely to increase as two subjects become more genetically distant
[5, 12].While there are no known parametric statistical tests corresponding to (3
), we can extend the permutation test of Pearson correlation in standard linear regression to our case. Given the structural and functional distance matrices
and , let and , where denotes the double sum. Then, the sample test statistic for (
3) is given by(5) 
Given (5), a permutation test can be devised by randomly shuffling one of the feature vectors (say , without loss of generality) among the subjects. This corresponds to permuting the rows of the data matrix . Mathematically, for a random permutation of elements, the empirical distribution of permuted correlations
(6) 
estimates the null distribution of . If the sample correlation (5) deviates from this null distribution significantly, then we can reject the null hypothesis of the test in (3).
This test can be viewed as a variant of the Mantel test [13], which jointly permutes both features among the subjects to test the same statistic. Yet, because our version does not permute the feature dimension, it does not introduce unintended bias coming from spatial correlations [14].
Note that a nonzero correlation will imply statistical dependence, but not the other way around. When we take to be the Euclidean distance instead of the correlation distance, however, we obtain distance correlation (dCor) [15], where a zero value implies statistical independence. We will consider the statistical test (3) both when is the correlation distance and when is the Euclidean distance. In the latter case, we use the unbiased version of the statistic that leads to a test [16].^{1}^{1}1We implement the unbiased dCor test [16] by (substantially) modifying the MATLAB implementation found in http://mathworks.com/matlabcentral/fileexchange/39905distancecorrelation.
2.2.3 Constructing a Valid Confidence Interval with Subsampling
The permutation test is nonparametric, but it does not readily yield confidence intervals unless a stronger assumption (and tedious computation) is made
[17, 18]. Subsampling [19] is an alternative approach to statistical inference that makes less assumptions and gives a confidence interval as its outcome. It estimates the true distribution of by computing the empirical version of the statistic many times on different random subsamples of the full data.Subsampling notably differs from the more standard bootstrapping because it samples without replacement and only samples a fraction of the data points. The first difference is crucial in our scenario, because any duplicate sample from bootstrapping will zero out entries of and and thus lead to a biased (higher) estimate of .
2.3 HighDimensional Canonical Correlation Analysis with CrossValidation
While statistical inference of the distancebased correlation will provide some insights to the structurefunction relationship, this measure of correlation aggregated over so many features may not be as intuitive or informative. In search of more detailed and interpretable relationships between the two sets of features, we attempt to find small subsets of the LCF that are predictive of small subsets of the FCG on a heldout set.
2.3.1 Canonical Correlation Analysis
For a pair of random vectors, canonical correlation analysis (CCA) [20]
finds a pair of linear transformations (“alignments”) onto the same Euclidean space such that the projections are the most correlated. Assuming
centered data and , CCA solves the following biconvex constrained optimization problem:(7)  
subject to  
The objective is often written alternatively as , up to a constant, where is the empirical crosscovariance matrix. When the columns of and are further standardized, the solution to this biconvex problem is given by the left and right singular vectors of the empirical crosscovariance matrix
that correspond to its largest singular value.
Intuitively, CCA captures the directions in and that explain the largest crosscorrelation. If we assume that and indeed have some correlation structures, then CCA will find the linear transformations that recover such structures.
2.3.2 Sparse CCA
In high dimensions, i.e. when the data dimensions and are large compared to the sample size , the estimate of the true crosscovariance is no longer consistent unless more structural assumptions are made [21, 22]. It is also considered a more difficult problem than sparse PCA [23, 24], which itself is considered challenging due to the poor behavior of the sample covariance matrix as an estimate [25]. To obtain a reliable estimate of the highdimensional crosscorrelation structure, we assume that there are interesting lowdimensional correlation structures between subsets of the structural and functional features. This allows us to focus on a sparse subset of each set of features that are the most correlated to one another.
A popular approach to finding sparse subsets of features is to use regularization. In our case, we add an penalty to the alignment vectors in (7):
(8) 
where are sparsity parameters. The penalty, most commonly used in the Lasso [26], performs variable selection by forcing some of the entries to be precisely zero when the sparsity parameters are sufficiently small. A penalized version of CCA that combines (7) and (8) has been called sparse CCA in the literature, and an alternating convex optimization algorithm can be used to find a sparse solution [27, 28].
Yet, the penalty alone is not sufficient for effective variable selection in our setting. One reason is that both the LCF and the FCG naturally contain interesting correlation structures within their entries, while regularization tends to select only one entry from a correlated group [29]. Another reason is that penalized CCA from (7) and (8) is not strictly biconvex in high dimensions, so that the optimization problem can be unstable. Both of these issues can be alleviated by further including an penalty:
(9) 
with constants .^{2}^{2}2For simplicity, we fix these constants to be in our analysis. The resulting optimization problem can be viewed as the elastic net [29] applied to CCA. It is now a strictly biconvex problem, and we can find a feasible solution efficiently by alternately applying existing convex optimization solvers. We note that, in general, there is no known algorithm for this biconvex problem that guarantees a globally optimal solution [27]. For our analysis, we use the MATLAB implementation from [30] that is freely available online.^{3}^{3}3http://people.stern.nyu.edu/xchen3/Code/groupCCA.zip
2.3.3 Fold CrossValidation
For sparse CCA, we use fold crossvalidation to find the set of sparsity parameters that give the highest canonical correlations between subsets of the LCFs and the FCGs.
Specifically, using , we first split the subjects into training and test sets with the ratio of 5 to 1. Then, we randomly partition the training set (size ) into 5 equally sized subsamples, fit sparse CCA with each candidate set of sparsity parameters to 4 of the subsamples, and use the fitted alignment vectors and to align the feature vectors from the unused subsample (i.e. the validation set). The resulting canonical correlation on the validation set can be viewed as an estimate of canonical correlation on unseen data. By leaving out each of the 5 subsamples in the previous step, we obtain 5 such estimates of the canonical correlation, and the average of these 5 estimates can be used to validate the performance of the candidate set of sparsity parameters. After these steps, we choose the set of sparsity parameters that give the largest average canonical correlation on the validation set.
The resulting alignment vectors can transform unseen feature vectors coming from the same distribution as our dataset, so that the LCFs are the most correlated to the FCGs in the transformed space. The final performance of these alignment vectors is measured by the correlation between the alignments of the test set, which was unused throughout the crossvalidation steps.
3 Results
3.1 Exploratory Analysis
We first present exploratory analysis results for the intersubject distances in LCFs and FCGs. Fig. 2 shows that the feature distances between different subjects appear substantially distant from zero. This in part reproduces the results from [5] and [12], in which it is shown that the distances between different individuals are significantly greater than those between the same subjects. This justifies our choice of distances (1) and (2.2.1) for the permutation test as well as the subsamplingbased confidence interval.
3.2 Statistical Inference
In Table 2, we summarize our results from the permutation test, the dCor test, and the subsamplingbased confidence interval. Significance levels are marked with ^{*} (), ^{**} (), and ^{***} (). Significant confidence intervals are marked with ^{+}. We used random permutations for the hypothesis tests as well as subsamples for the confidence interval construction. Subsampling ratio was chosen as 0.135, following the procedure in [31].
Method  Correlation  Result Type  Result 

Permutation (6)  0.120  value  0.001^{***} 
dCor test [16]  0.252  value  0.001^{***} 
Subsampling  0.120  95% conf. int.  (0.098, 0.141)^{+} 
Using a significance level of , we find from the permutation test that there is indeed a statistically significant correlation between the Euclidean distances in LCFs and the correlation distances in FCGs. The dCor test of independence confirms that the two sets of features are statistically dependent, despite the fact that the test makes strong assumptions. Further, because the 95% confidence interval does not include zero, we conclude that the correlation between LCF distances and FCG distances is statistically significant.
Each of these results indicate that the similarity in the local connectome between individuals is significantly correlated to the similarity in their functional connectivity patterns. Specifically, our results show that if two individuals have similar local white matter architectures, they are also more likely to have similar functional brain dynamics.
Note that the correlation value for permutation test and subsampling are indeed identical, because they both compute exactly (5). The value in dCor test [16]
differs, however, not only because the distance metric is changed to the Euclidean distances but because the test uses an unbiased estimate of the (Euclidean distancebased) statistic. While conceptually similar, the two computed values are estimates of different statistics and thus cannot be compared directly.
Fig. 3 visualizes the result from our permutation tests. On the left, we plot the structural and functional pairwise distances in a scatterplot to explore the overall trend. The scatterplot suggests that there is a positive trend between the pairwise distances in the structural and functional features. On the right, the permutation test shows that the correlation on real data is on the farright tail of the correlation on simulated null data, suggesting that there is a statistically significant positive correlation between the structural and functional pairwise distances.
Nonparametric estimates of the correlation give analogous results (Spearman’s : 0.112, Kendall’s : 0.075). This is not surprising, given that the 2D scatterplot in Fig. 3 does not display an obvious nonlinear trend.
Fig. 4 justifies our use of subsampling instead of bootstrapping for our confidence intervals. As we described earlier, because each bootstrap sample contains multiple copies of the same subject, the resulting structural and functional distance matrices always contain many zeros, leading to a spuriously high correlation compared to the truth. The plots show that the bootstrap distribution fails to capture the actual correlation and is significantly biased upwards, while subsampling does not have this issue because it samples from the data without replacement.
3.3 Sparse CCA
For sparse CCA, we select a pair of sparsity () parameters from a 2D grid, one for LCFs and another for FCGs, that yields the maximum canonical correlation on the validation set. Our crossvalidation plot in the left panel of Fig. 5 shows that there is a contiguous region of sparsity levels in both structural and functional features where the canonical correlation on the validation set is maximized. Using the optimal regularization parameters, we find that sparse CCA selects 50,607 (11.7%) LCF features and 2,890 (1.48%) FCG features to give a canonical correlation of 0.689 (train) and 0.515 (test).^{4}^{4}4As described in Section 2.3.3, the final training canonical correlation is computed on all 5 folds (size ) of the training set. The test canonical correlation is computed on the heldout test set (size ), which is unused during crossvalidation.
The crossvalidated sparse CCA projections of the training data as well as the test data are plotted in Fig. 5. Since the objective of CCA is to maximize the correlation between these projected points, we expect to see a linearly increasing pattern in the projected space. The right panel of Fig. 5 demonstrates this expectation: the projections of the training data and the test data exhibit similar linearly increasing patterns with a similar degree of variation. This implies that the alignment vectors we found can generalize well to unseen data in terms of correlation in the linearly projected space.
Note that the projections still have relatively high variance across the linear trend. This variability is likely due to both the variance coming from the optimization problem, which is illconditioned and thus contains many local optima, and the variance coming from the lack of statistical consistency in the highdimensional setting. Indeed, the optimal number of variables chosen by crossvalidation (50,607 and 2,890) is still greater than the number of subjects (793).
In Fig. 6
, we visualize the LCF and FCG features selected by sparse CCA using the optimal sparsity parameters. In both modalities, sparse CCA focuses on connectivity patterns in specific regions of the brain. In particular, within the highdimensional LCF space, the algorithm points to contiguous local pathways of the white matter structure. Our results show that this specific set of local white matter pathways are highly predictive of the lateral dynamics of functional connectivity between the left and right hemispheres. The structurefunction association is observed between the core white matter pathways that regulate both intracortical and corticalsubcortical communication, including the corpus callosum, thalamic radiations, corticospinal, and corona radiata pathways, and the resting state functional activity in a diversity of cortical and subcortical nodes. This suggests that the structurefunction relationship is strongest in the large major communication fascicles that are critical for global brain network communication.
3.4 Canonically Correlated Subcluster Pairs
In order to see if there is substructure in the structurefunction relationships identified by sparse CCA, we decompose the canonical correlations into smaller subclusters of both the LCF and FCG entries. In Fig. 7, we show the three most canonically correlated pairs of LCF and FCG subclusters, which are computed by a simple agglomerative clustering (completelinkage, same distances and respectively) of the selected LCF and FCG features into 5 subclusters each. We compute the canonical correlation between each pair of subclusters without additional regularization terms, as in (7).
Here we see even further specificity in the structurefunction relationship. For example, variability in the centrum semiovale (Fig. 7, left), should predict functional dynamics of intrahemispheric and interhemispheric cortical networks. This pattern largely holds in the corresponding functional networks. In contrast, a small cluster along the inferior longitudinal fasiculus (Fig. 7, middle), a major means of communication along the ventral visual stream, correlates with primarily ventral visual pathway functional dynamics, as well as communication between dorsal and ventral visual streams. Finally, variability in the internal capsule (Fig. 7, right), a major means of communication between cortex and subcortical areas, correlates with primarily functional dynamics between cortical and subcortical nodes. Thus, the specificity of the structurefunction relationships identified in this subclustering analysis is consistent with a priori predictions derived from the neuroanatomical literature.
4 Discussion
In this paper, we show how variability in local white matter architecture is associated with global patterns of functional brain dynamics. Using distancebased correlations, we found a small, but significant effect whereby individuals with more similar local white matter architecture tended to also be more similar in their functional connectome. Using sparse CCA approaches, we were able to show that individual variability in white matter architecture along major brain fascicles correlated with individual differences in functional dynamics within the specific class of brain networks that would be predicted by existing neuroanatomical knowledge. Thus, in conjunction with the constraints of global endtoend structural connectivity [10], our results highlight how variability in the local white matter systems also impacts global brain communication.
Acknowledgments
The research was sponsored by the U.S. Army Research Laboratory, including work under Cooperative Agreement Number W911NF1020022, and the views espoused are not official policies of the U.S. Government. S.B was partially supported by the NSF grant DMS1713003.
References
 [1] F.C. Yeh, D. Badre, and T. Verstynen, “Connectometry: A statistical approach harnessing the analytical potential of the local connectome,” Neuroimage, vol. 125, pp. 162–171, 2016.
 [2] S. Pajevic, P. J. Basser, and R. D. Fields, “Role of myelin plasticity in oscillations and synchrony of neuronal activity,” Neuroscience, vol. 276, pp. 135–147, 2014.
 [3] R. E. Passingham, K. E. Stephan, and R. Kötter, “The anatomical basis of functional localization in the cortex,” Nature Reviews Neuroscience, vol. 3, no. 8, p. 606, 2002.
 [4] R. D. Fields, “A new mechanism of nervous system plasticity: activitydependent myelination,” Nature Reviews Neuroscience, vol. 16, no. 12, p. 756, 2015.
 [5] F.C. Yeh, J. Vettel, A. Singh, B. Poczos, S. Grafton, K. Erickson, W.Y. I. Tseng, and T. Verstynen, “Quantifying differences and similarities in wholebrain white matter architecture using local connectome fingerprints,” PLoS Computational Biology, vol. 12, no. 11, p. e1005203, 2016.
 [6] D. C. Van Essen, S. M. Smith, D. M. Barch, T. E. Behrens, E. Yacoub, K. Ugurbil, W.M. H. Consortium et al., “The wuminn human connectome project: an overview,” Neuroimage, vol. 80, pp. 62–79, 2013.
 [7] F.C. Yeh and W.Y. I. Tseng, “Ntu90: a high angular resolution brain atlas constructed by qspace diffeomorphic reconstruction,” Neuroimage, vol. 58, no. 1, pp. 91–99, 2011.
 [8] M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni et al., “The minimal preprocessing pipelines for the human connectome project,” Neuroimage, vol. 80, pp. 105–124, 2013.
 [9] S. Moeller, E. Yacoub, C. A. Olman, E. Auerbach, J. Strupp, N. Harel, and K. Uğurbil, “Multiband multislice geepi at 7 tesla, with 16fold acceleration using partial parallel imaging with application to high spatial and temporal wholebrain fmri,” Magnetic Resonance in Medicine, vol. 63, no. 5, pp. 1144–1153, 2010.
 [10] A. M. Hermundstad, D. S. Bassett, K. S. Brown, E. M. Aminoff, D. Clewett, S. Freeman, A. Frithsen, A. Johnson, C. M. Tipper, M. B. Miller et al., “Structural foundations of restingstate and taskbased functional connectivity in the human brain,” Proceedings of the National Academy of Sciences, vol. 110, no. 15, pp. 6169–6174, 2013.
 [11] S. Butterworth, “On the theory of filter amplifiers,” Wireless Engineer, vol. 7, no. 6, pp. 536–541, 1930.
 [12] E. S. Finn, X. Shen, D. Scheinost, M. D. Rosenberg, J. Huang, M. M. Chun, X. Papademetris, and R. T. Constable, “Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity,” Nature neuroscience, 2015.
 [13] N. Mantel, “The detection of disease clustering and a generalized regression approach,” Cancer research, vol. 27, no. 2 Part 1, pp. 209–220, 1967.
 [14] G. Guillot and F. Rousset, “Dismantling the mantel tests,” Methods in Ecology and Evolution, vol. 4, no. 4, pp. 336–344, 2013.
 [15] G. J. Székely, M. L. Rizzo, N. K. Bakirov et al., “Measuring and testing dependence by correlation of distances,” The Annals of Statistics, vol. 35, no. 6, pp. 2769–2794, 2007.

[16]
G. J. Székely and M. L. Rizzo, “The distance correlation ttest of independence in high dimension,”
Journal of Multivariate Analysis
, vol. 117, pp. 193–213, 2013.  [17] R. John and J. Robinson, “Significance levels and confidence intervals for permutation tests,” Journal of Statistical Computation and Simulation, vol. 16, no. 34, pp. 161–173, 1983.
 [18] P. H. Garthwaite, “Confidence intervals from randomization tests,” Biometrics, pp. 1387–1393, 1996.
 [19] D. N. Politis, J. P. Romano, and M. Wolf, “Subsampling springer series in statistics,” 1999.
 [20] H. Hotelling, “Relations between two sets of variates,” Biometrika, vol. 28, no. 3/4, pp. 321–377, 1936.
 [21] Z. Bao, J. Hu, G. Pan, and W. Zhou, “Canonical correlation coefficients of highdimensional normal vectors: finite rank case,” arXiv preprint arXiv:1407.7194, 2014.

[22]
I. M. Johnstone, “Multivariate analysis and jacobi ensembles: Largest eigenvalue, tracy–widom limits and rates of convergence,”
Annals of statistics, vol. 36, no. 6, p. 2638, 2008.  [23] M. Chen, C. Gao, Z. Ren, and H. H. Zhou, “Sparse cca via precision adjusted iterative thresholding,” arXiv preprint arXiv:1311.6186, 2013.
 [24] C. Gao, Z. Ma, Z. Ren, H. H. Zhou et al., “Minimax estimation in sparse canonical correlation analysis,” The Annals of Statistics, vol. 43, no. 5, pp. 2168–2197, 2015.

[25]
I. M. Johnstone, “On the distribution of the largest eigenvalue in principal components analysis,”
Annals of statistics, pp. 295–327, 2001.  [26] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.
 [27] D. M. Witten, R. Tibshirani, and T. Hastie, “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis,” Biostatistics, p. kxp008, 2009.
 [28] E. Parkhomenko, D. Tritchler, and J. Beyene, “Genomewide sparse canonical correlation of gene expression with genotypes,” in BMC proceedings, vol. 1, no. 1. BioMed Central, 2007, p. S119.
 [29] H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005.

[30]
X. Chen, H. Liu, and J. G. Carbonell, “Structured sparse canonical correlation
analysis.” in
Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS)
, 2012.  [31] P. J. Bickel and A. Sakov, “On the choice of m in the m out of n bootstrap and confidence bounds for extrema,” Statistica Sinica, pp. 967–985, 2008.