Highlights

A massunivariate GLM analysis on connectomes was estimated multiple times using functional brain parcellations at different scales.

A new omnibus test pooling GLM discoveries across all scales.

On simulations, the FDR was controlled within scale and, in the presence of a significant omnibus test, no marked inflation of the FDR was observed across scales.

On three real datasets, the statistical association maps generated at different scales were mostly consistent.

In all experiments, the highest discovery rates were found below scale 25, yet some effects were better seen around scale 50.
1 Introduction
Context
Brain connectivity in restingstate fMRI has been found to be associated with a wide variety of clinical disorders (Fox and Greicius, 2010; Castellanos et al., 2013; Barkhof et al., 2014). Rather than focusing on a limited set of a priori regions of interest, a recent trend is to perform statistical tests of association across the whole connectome, i.e. at every possible brain connection (Shehzad et al., 2014). Such connectomewide association studies (CWAS) critically depend on the choice of the brain parcels that are used to estimate the connections. Analysis have been performed at different scales in the literature (Meskaldji et al., 2013), e.g. voxels (Shehzad et al., 2014), regions (Wang et al., 2007), or distributed networks (Jafri et al., 2008; Marrelec et al., 2008). The main objective of this work was to develop a statistical framework to study the impact of the spatial scale on the results of a CWAS.
Massunivariate connectomewide association studies
The massunivariate approach to CWAS (Worsley et al., 1998) consists of independently estimating a GLM at every connection. In the GLM, a series of equations are solved to find a linear mixture of explanatory variables (called covariates) that best fit the connectivity values observed across the many subjects. A
value is generated for each connection to quantify the probability that the estimated strength of association between this connection and a covariate of interest could have arisen randomly in the absence of a true association
(Worsley and Friston, 1995). The significance level of each test needs to be corrected for the total number of tests, i.e. the number of brain connections, e.g. using random field theory (Worsley et al., 1998) or FDR (Benjamini and Hochberg, 1995). Correction for multiple comparisons however generally comes at the cost of a sharp decrease in sensitivity.Multiscale parcellations and testing
A straightforward way to mitigate the impact of multiple comparisons on statistical power is to reduce the number of brain parcels. For example, the AAL template (TzourioMazoyer et al., 2002) includes 116 brain parcels based on anatomical landmarks. Datadriven algortihms can also generate functional brain parcels (Bellec et al., 2006; Thirion et al., 2006; Craddock et al., 2012; Blumensath et al., 2013; Thirion et al., 2014; Gordon et al., 2014). Few investigators have examined how scale impacts the results of a GLMconnectome analysis. Abou Elseoud et al. (2011)
explored the impact of the number of components in a dualregression independent component analysis on the difference between patients suffering from nonmedicated seasonal affective disorder and normal healthy controls. The authors concluded that the number of significant findings was maximized at scale 45 (in this case, 45 independent components). The impact of the number of brain parcels was also investigated using spatiallyconstrained spectral clustering
(Craddock et al., 2012) at much higher scales (from 50 to 3000+) by Shehzad et al. (2014). The authors concluded that the association between restingstate connectivity and intelligence quotient was consistent across scales. It should be noted that, in the abovementioned studies (Abou Elseoud et al., 2011; Shehzad et al., 2014), the authors did not investigate the implications that the replication of statistical tests at multiple scales may have in terms of the control of false positives. We are thus not currently aware of a valid statistical framework to examine the results of a CWAS with datadriven brain parcellations at multiple scales.Main objectives
In this paper, we developped a new method to explore multiscale statistical parametric connectome (MSPC). Multiscale functional brain parcellations and associated connectomes were first generated using a group cluster analysis (Bellec et al., 2010; Bellec, 2013). A GLM was then applied on the connectomes and the FDR was controlled at each scale independently (Benjamini and Hochberg, 1995). We developed an omnibus test to assess the significance of the number discoveries, pooled across all scales. Our first objective was to empirically assess if the FDR was well controlled within scale, and, when the omnibus test was rejected, across scales as well. Our second objective was to evaluate how the scale impacted the results of a MSPC in terms of the number and nature of the discoveries. In particular, we wanted to check if different scales would bring complementary and biologically plausible insights in the results of the GLM. We conducted a series of experiments involving both simulated and real datasets to address these two objectives, which have been summarized, along with the main findings, in Table 1.
Simulations
We generated several simulation datasets. A first set of experiments involved independent tests on purely synthetic data. A second set of experiments were based on a mixture of real fMRI data with synthetic signal, in order to introduce realistic dependencies between tests, both within and across scales. We simulated group differences using a variety of scenarios covering different combinations of effect and sample sizes as well as different proportions of true nonnull hypothesis. As part of the simulation study, we also applied MSPC on real fMRI datasets (the Cambridge sample) using random group differences, thus providing insights in the behaviour of the method under the global null hypothesis, where there is no true association to find.
Real datasets
We evaluated the MSPC on three real datasets: (1) a study comparing patients suffering from schizophrenia with healthy control subjects (referred to here as the SCHIZO study, with a large sample size, ); (2) a study on patients suffering of congenital blindness, compared to sighted controls (referred to here as the BLIND study, with a small sample size, ), and; (3) a study of motor learning, where restingstate data connectivity was compared before and after learning of a motor task (referred to here as the MOTOR study, with a moderate sample size, ). These three datasets were chosen to assess how the scale impacted the results of the MSPC with a variety of effect and sample sizes. We had strong a priori hypotheses for those three analyses: changes in the visual network for the BLIND dataset, in the motor network for the MOTOR dataset and a more general dysconnectivity in the SCHIZO dataset. These a priori were useful to assess the biological plausibility of the findings. We checked the assumptions of the parametric GLM estimation, to further validate the specificity of the MSPC approach. We finally quantified similarities and differences of the MSPC findings across scales.
Specific objectives  Experiment(s)  Finding(s) 

Check the assumptions of the parametric GLM tests and the FDRBH algorithm, within scales.  Assumptions of the GLM were tested on four real data samples at high scales (300+). Simulations of group differences at multiple scales with tests featuring realistic dependencies.  Although trends of departure from normality and homoscedasticity were observed, no tests passed significance (Section 5.2). In the simulations with dependent tests, potential departures from the assumptions of the GLM of FDRBH did not compromise the specificity of the FDRBH procedure (Figure 5). 
Assess the specificity of MSPC in the absence of signal (“global null”).  Test for differences in average connectivity between random subgroups of the Cambridge sample.  The FWE under the global null was controlled at nominal level by the multiscale omnibus test (Figure 5). 
Assess the specificity of MSPC within and across scales.  Multiscale simulation of group differences with independent or dependent tests.  The FDR was controlled at nominal level or below within scale (Figures 3, 5). The FDR across scales was controlled in most realistic simulations, and slightly liberal (effective FDR of 0.09 for a nominal level of 0.05) in the worstcase scenario, otherwise featuring sensitivity close to zero at high scales (Figures 5 and 6). 
Assess the sensitivity of MSPC across scales.  Same as above.  The sensitivity varied substantially across scales. This seemed to reflect at least two phenomena: (1) an intrinsic property of the FDRBH procedure, which looses sensitivity when the number of multiple comparison increases, before reaching a plateau; (2) effect sizes may change as a function of scale (Figures 4, 6, 7). 
Assess the plausibility of the results identified with MSPC on real data.  GLM connectome analyses in three real datasets at multiple scales.  The MSPC identified biologically plausible changes in connectivity in all three analyses (Figures 10, 11). 
Assess if different scales can identify complementary effects.  Same as above.  The discovery rate was markedly higher at low scales, below 50 (Figures 8, 10). Statistical parametric maps were highly consistent across scales (Figure 12), although some effects associated with specific structures were better seen at high scales (Figures 10, 11). 
2 Statistical testing procedures
2.1 Functional parcellations
The first step to build a connectome is to select a parcellation of the brain, with parcels. In this work, we used functional brain parcellations, aimed at defining groups of brain regions with homogeneous time series. A number of algorithms have been proposed with additional spatial constraints, to ensure that the resulting parcels are spatially connected (Lu et al., 2003; Thirion et al., 2006; Craddock et al., 2012). However, from a pure functional viewpoint, the spatial constraint seems somewhat arbitrary, as functional units in the brain at low resolution encompass distributed networks of brain regions with homotopic regions often being part of a single parcel (De Luca et al., 2006; Damoiseaux et al., 2006). Some works have thus used distributed parcels as the spatial units to measure functional brain connectivity, e.g. (Jafri et al., 2008; Marrelec et al., 2008). To generate the functional parcelations, we relied on a recent method called “Bootstrap Analysis of Stable Clusters” (BASC), which can identify consistent functional parcels for a group of subjects (Bellec et al., 2010)
, using a hierarchical cluster with Ward’s criterion both at the individual and the group levels. The functional parcels can be generated at any arbitrary scale (within the range of the fMRI resolution), and we considered only parcels generated at the group level, which were nonoverlapping and not necessarily spatially contiguous.
2.2 Functional connectome
For each scale, and each pair of distinct parcels and at this scale, the betweenparcel connectivity is measured by the Fisher transform of the Pearson’s correlation between the average time series of the parcels. Note that other measures can be used to quantify interactions between parcels, such as partial correlations (Marrelec et al., 2006). We used correlation as it is the simplest, most popular and still fairly accurate (Smith et al., 2011) measure of interaction in fMRI. The statistical framework presented here could still be applied to many other measures (see 6). The withinparcel connectivity is the Fisher transform of the average correlation between time series of every pair of distinct voxels inside parcel . The connectome is thus a matrix. Each column (or row, as the matrix is symmetric) codes for the connectivity between parcel and all other brain parcels, or in other word is a full brain functional connectivity map. See Figure 1ab for a representation of a parcellation and associated connectome. Connectomes are generated independently at each scale. See A for a more formal description of the connectome generation.
2.3 Statistical parametric connectome
For a scale with parcels, there are exactly distinct elements in an individual connectome . This connectome can be stored as a vector, where the brain connections have been ordered arbitrarily along one dimension. When functional data is available on subjects, the group of connectomes is then assembled into a array , where each code for one subject and each code for one connection. A general linear model (GLM) can then be used to test the association between brain connectivity and a trait of interest, such as the age or sex of participants. All of these explanatory variables are entered in a matrix . The variables are typically corrected to have a zero mean across subjects, and an intercept (i.e. a column filled with 1) is added to . The GLM relies on the following generative model:
(1) 

is a matrix where each row codes for a subject, and each column codes for a connection,

is a matrix of explanatory variables (or covariates) where each row codes for a subject and each column codes for a covariate,

is an unknown
matrix of linear regression coefficients where each row codes for a covariate and each column codes for a connection,

is a random (noise) variable, with similar coding as .
We relied on the following parametric assumptions on the noise
are (1) that its rows are independent; (2) that each element follows a normal distribution with zero mean, and (3) that the variance of all elements are constant within a column, also called the homoscedasticity assumptions. As the data generated from different subjects are statistically independent the first assumption is reasonable. We tested the normality and homoscedasticity assumptions on real datasets. Under these parametric assumptions, the regression coefficients
can be estimated with ordinary least squares and, for a given “contrast” vector
of size , the significance of can be tested with a connectome of test , with associated values . The quantity controls for the risk of false positive findings at each connection . The GLM applied on connectomes is illustrated in Figure 1c. See B for the equations related to the estimation and testing of regression coefficients in the GLM.2.4 The BenjaminiHochberg FDR procedure
The number of tests grows quadratically with the scale . The significance value applied on within a scale thus needs to be adjusted for this multiple comparison problem. We implemented the benjaminiHochberg (BH) procedure (Benjamini and Hochberg, 1995) to control the FDR at a specified level within scale. The idea of the FDR is not to strictly control the probability to observe at least one false positive (a quantity know as familywise error, FWE), but rather to control, on average, the proportion of false positive amongst the findings. Note that controlling for the FDR is not necessarily a more liberal attitude than controlling for the FWE: if the global null hypothesis is verified, i.e. all discoveries are false positive, then the FDR is exactly the FWE. In the presence of true discoveries, however, the FDR procedure tolerates in general more noise than a FWE approach. The actual number of false discoveries will depend on the amount of signal (true positive) present in the data, and is therefore a contextdependent question. The BH procedure is a popular technique to control the FDR, which was designed for independent tests. The BH still has been shown to have a satisfactory behaviour even in the presence of positive correlation between the tests (Benjamini and Yekutieli, 2001). On simulations, the specificity of the FDRBH algorithm was assessed in the presence of a realistic amount of correlation between tests, as would be found in a MSPC analysis of fMRI data.
2.5 Multiscale parcellations
To explore the association of connectomes and a trait of interest at multiple scales, some multiscale brain parcellations first need to be generated, for which we employed the BASC method (Bellec et al., 2010). A GLM was then estimated with FDR control at each scale independently (Figure 2). To choose the scales of analysis, a comprehensive strategy would consist of a regular grid, e.g. from to brain parcels, with a step of . The GLM results at such scales may however be highly redundant, as some parcels may be found identically at different scales, if those are close. To systematically examine the results of the GLM analysis at all selected scales, a better strategy would be to select a limited number of nonredundant scales that span a given range (e.g. to ). For this purpose, we used the multiscale stepwise selection (MSTEPS) algorithm recently proposed by Bellec (2013) to select scales that provide an accurate summary of the stable features of brain clusters observed at all possible scales. Both strategies (comprehensive grid of scales and sparse subset) were examined in this work, both on simulations and real data.
2.6 Multiscale testing
Testing GLM on connectomes at multiple scales introduces a new level of multiple comparisons, this time across scales rather than across connections. In addition to the FDR within scale, one now needs to consider the FDR across scales, i.e. for all tests combined across scales. For example, if two scales were used, and , there would be tests at each scale (i.e. and ). If there were discoveries at scale , of which was a false positive, and discoveries at scale , of which were false positive, the ratio of false discoveries for this simulation would be within scale 10, and within scale . The ratio of false discoveries across scales would be . The FDR withinscale and betweenscale would be the average of these false discovery proportions over many replications of such experiments. A straightforward way to control the FDR across scales would be to apply the FDRBH procedure to the whole set of tests, pooled across scales. For a regular grid of scales ranging from 10 to 300, with a step of 10, the largest numbers of tests within scale is . The total number of tests is in that case , an order of magnitude larger. Because the FDRBH relies on a Bonferroni estimation of the number of false positives, an FDR procedure in such a high dimension is bound to have low sensitivity, and would be a high price to pay to explore multiple scales of a GLM analysis. However, we can note that in a multiscale analysis, a detection at any scale makes the presence of signal at all the other scales very likely. In the presence of multiple families of tests and a substantial amount of true discoveries, (Efron, 2008) hypothesized based on simulations that there would be no need to adjust the FDR across families, when the FDR is controlled within family. The rationale for this hypothesis is that, in the presence of signal, the FDR controls for a proportion, which behaves well when multiple families are combined. By contrast, the FWE looks at a maximum statistics and gets mechanically inflated when mutliple families are combined. In the context of a MSPC analysis, Efron’s hypothesis would mean that controlling for the FDR at each scale independently would also guarantee that the FDR would be controlled across scales. If supported empirically, this hypothesis would justify to examine the results of a GLM on connectomes at many scales, without introducing the need for any additional correction of multiple comparisons beyond what is required by a traditional, singlescale analysis. The multiscale exploration would basically come for free, provided that enough true discoveries were made overall.
2.7 Omnibus test
Our hypothesis is that the control of the overall FDR (across connections and scales) can be attained through the sole control of FDR within scale, in the presence of a substantial volume of true discoveries. At a given scale , is the percentage of discoveries, i.e. the number of significant tests as identified by FDRBH at a given level , divided by the total number of tests. For a grid of scales, e.g. those selected by MSTEPS, the overal volume of discoveries was defined as the average of across all scales . As a minimal requirement for the volume of true discoveries to be “substantial”, we developped an omnibus test to reject the hypothesis that could be observed under the global null hypothesis , i.e. no nonnull effect at any connection and any scale. This test proceeded by comparing the volume of discoveries observed empirically in the group sample against the volume of discoveries that could be observed under . To generate replications of under , the GLM was first applied with a reduced model where the explanatory variable of interest (as selected through the contrast) had been removed. Then, a permutation of the residuals was generated as described in (Anderson, 2002), see D. A replication of connectomes was generated under by adding the permuted residuals to the estimated mixture of reduced explanatory variables. In order to respect the dependencies between connectivity estimates within and across scales, the same permutation of the subjects was applied to all of the connections and scales. The detection procedure was applied on each replication of connectomes, under , and the total volume of discoveries was derived. A MonteCarlo approximation, with typically permutation samples on real data, was used to estimate a falsepositive rate when testing against the global null hypothesis. Note that a single omnibus test was derived, controlling for the FWE of the experiment as a whole. If this test passed significance, each scale was examined with a control of the FDR at , uncorrected for multiple comparisons across scales. If the omnibus test did not reach significance, then no connection at any scale was deemed significant.
3 Evaluation on simulated datasets with independent tests
3.1 Methods
Datagenerating procedure
We started by simple simulations of independent tests, to assess to which extent the hypothesis of Efron (2008) was robust to different scenarios, and if the omnibus test would systematically ensure that the FDR across scales would be well controlled. A number of test families were generated independently, each one composed of tests, . Each family included a set proportion of true nonnull hypotheses , identical for all families. If was not an integer, the number of true positives was set to either or , with probabilities such that on average over many simulations . For a nonnull test , the associated value was simulated as:
(2) 
(3) 
where
was a Gaussian distribution with zero mean and unit variance, and
was a simulation parameter (further called effect size). The null tests were generated the same way, but with an effect size .Simulations scenarios
For each experiment, all combinations of effect size in the grid and in the grid were considered. We implemented a series of experiments:

We first checked how the FDR across scales behaved as a function of the number of families , with in and equals to (corresponding approximately to the number of tests at scale 45).

We then checked how the FDR across scales behaved as a function of the number of tests per family with identical for all , and in the grid (corresponding roughly to scales , and ), and families.

We checked how the FDR across families behaved for a number of families and a number of tests per family that would be comparable to situations encountered on multiscale GLMconnectome analysis.

We first tested the scales selected by MSTEPS on the SCHIZO dataset (see Section 5), i.e. and the equal to (, , , , , , ), corresponding to the number of tests at scales , , , , , , .

We then tested the procedure on and ranging from to , which would be equal to the number of tests associated with a regular grid covering scales to with a step of .

We finally tested the behaviour of smaller grids, with a number of tests equivalent to GLM tests over scales ranging from to either , or (with a step of 10).

Computational environment
All the experiments reported in the paper were performed using the NeuroImaging Analysis Kit (NIAK^{1}^{1}1http://www.nitrc.org/projects/niak/) version 0.12.18, under CentOS version 6.3 with Octave^{2}^{2}2http://gnu.octave.org version 3.8.1 and the Minc toolkit^{3}^{3}3http://www.bic.mni.mcgill.ca/ServicesSoftware/ServicesSoftwareMincToolKit version 0.3.18. Analyses were executed in parallel on the ”Guillimin” supercomputer^{4}^{4}4http://www.calculquebec.ca/en/resources/computeservers/guillimin, using the pipeline system for Octave and Matlab (Bellec et al., 2012), version 1.0.2. The scripts used for processing can be found on Github^{5}^{5}5https://github.com/SIMEXP/glm_connectome.
Statistical testing procedure
For each simulation scenario, the FDRBH procedure was applied to each family independently, with a significance level in the grid . To estimate the distribution of the volume of discovery under the global null, 1000 samples were generated with the parameters and set to zero, for each choice of and . These replications under the globall null were used to generate the values of the omnibus test for all simulations with identical and .
Effective FDR, sensitivity and omnibus test
The effective FDR at each scale was computed as the number of false discoveries divided by the total number of discoveries, averaged across 1000 simulation replications for each simulation scenario. The effective sensitivity was computed at each scale as the number of true discoveries, divided by the number of true nonnull hypotheses present at this scale, and averaged across the 1000 replications. To compute the FDR across scales, the same procedure to estimate the effective FDR was applied to the combination of tests pooled across all families. Finally, we also derived a modified FDR and sensitivity, where the BHFDR procedure was combined with the omnibus permutation test, at a significance level of .
3.2 Results
The intrafamily FDR
Some of the conclusions in that example were observed consistently across all our experiments, see for instance Figure 3. In particular, the effective FDR within each family is following closely . This is a replication of a wellestablished result: the BHFDR procedure is conservative by a factor for independent tests. For example, for and a of , the effective FDR is of approximately 0.18 for all effect sizes , see the rightmost values in the last column in Figure 3.
The FDR across families
The FDR across families followed a smooth transition between two regimes. The first regime, called “liberal” was clearly seen under the global null hypothesis (first column in Figure 3). In this regime, the FDR matched with the FWE, i.e. the probability to have one or more false positive. As expected, controling the FWE within family did not control for the FWE across families, i.e. the effective FDR across families was largely superior to the prescribed level . The second regime, called “exact” was seen best in the last column of Figure 3. In the presence of a large (in this example ), the FDR across scales precisely followed the FDR within scales. The transition between these two regimes (liberal and exact) was smooth, and in situations that ressembled the global null hypothesis (i.e. at low or effect size), the FDR across scales was more liberal than the nominal , sometimes by a wide margin, e.g. , and in Figure S13. Note that both increasing the effect size, or increasing both pushed the FDR across families towards the “exact” regime.
The FDR across families, with omnibus test
The effect of the omnibus test on the FDR across families was particularly apparent under the global null hypothesis in all simulations: the FDR across scales matched the FWE, which was less than, or equal to the value of the omnibus test, as expected, see the first column in Figure 3. More generally, for simulation scenarios that represented a transition between the liberal and exact regimes of the FDR across families, the application of the omnibus test tended to make the FDR across families more conservative. There was still no guarantee that the FDR across families conformed to the specified level, as can be seen for , and in Supplementary Figure S13, where the effective FDR was larger than for a nominal FDR of . This behaviour was however found to be very much dependent on the number of tests per family and the number of families, which was further studied below.
Influence of the number of families
By varying in for a fixed number of tests per family ( for all ), we found that the transition between the liberal and exact regime of the FDR across scales took longer when the number of families increased. For and , and a nominal FDR , the effective FDR was about with , while it increased to almost for (Supplementary Figure S13).
Influence of the number of tests per family
By varying in (with identical for all ) for a fixed number of families , we found that the transition between the liberal and exact regime of the FDR was quicker when the number of tests per family increased. In other words, the exact regime appears as an asymptotic behaviour of the FDR across scales, when the number of tests per family becomes large. For example, for and , and a nominal FDR , the effective FDR across scales went from above with tests per family to below with tests per family (Supplementary Figure S14).
Scenarios based on multiscale connectome testing
Figure 3 shows the effective FDR for families including a number of tests similar to the ones associated with the scales selected by MSTEPS on the SCHIZO dataset. In this setting, and despite the high number of families, the transition from the liberal to the exact regime was very quick, which likely reflected the high number of tests found in most families. In particular, the FDR was found to be appropriately controlled across scales as soon as . In situations were the FDR across scales was liberal, the correction by the omnibus tests made the test very close to nominal levels, or conservative. Regarding the sensitivity of the tests, as expected, increasing either or increased the overall sensitivity of the tests. We observed a general trend in all scenarios: the sensitivity peaked at very low scales, and decreased exponentially to reach a plateau around scale 10 to 50. After this initial loss in sensitivity, the sensitivity was uniform across scales (Figure 4). Identical conclusions were reached with families akin to connectome testing on a regular grid of scales ranging from 10 to either 50, 100, or 300 parcels (with a step of 10). See Supplementary Figure S15 for the effective FDR, and Supplementary Figure S16 for sensitivity results.
4 Evaluation on simulated datasets with dependent tests
4.1 Methods
Datagenerating procedure
We designed a simulation framework for multiscale GLMconnectome analysis in the presence of depencies between tests, both within scale and across scales. To ensure that these depencies would be as realistic as possible, semisynthetic datasets were generated starting from a large real sample (Cambridge) released as part of the 1000 functional connectome project^{6}^{6}6http://fcon_1000.projects.nitrc.org/fcpClassic/FcpTable.html (Biswal et al., 2010). This sample (Liu et al., 2009) included restingstate fMRI time series (eyes opened, TR of 3 seconds, 119 volumes per subject) collected with a 3T scanner on 198 healthy subjects (75 males), with an age ranging from 18 to 30 years. All the datasets were preprocessed and resampled in stereotaxic space, as described in Section 5.1. A region growing algorithm was used to extract 483 regions, common to all subjects, as described in Bellec et al. (2010). For each subject, the average functional time series and associated connectomes were generated using these regions^{7}^{7}7The average time series have been publicly released at http://figshare.com/articles/Cambridge_resting_state_fMRI_time_series_preprocessed_with_NIAK_0_12_4/1159331. (see Section 2.2). The average connectome across all subjects was derived, and a hierarchical clustering procedure (with Ward’s criterion) was applied to derive a hierarchy of brain parcels at all possible scales, ranging from 1 to 483. The simulation procedure relied on the manual selection of a critical scale and a particular cluster at this scale. For each simulation, two nonoverlapping subgroups of subject ( subjects per group) were randomly selected. A circular block bootstrap (CBB) procedure was applied to resample the individual time series, using identical time blocks within each cluster, and independent time blocks in different clusters. This resampling scheme ensured that withincluster correlations were preserved, while betweencluster correlations had a value of zero on average. Finally, for the subjects selected to be in the first group, a single realization of a independent and identically distributed Gaussian variable, where each time point had a zero mean and a variance of , was added to the time series of the regions inside cluster , after the time series were corrected to a zero temporal mean and a variance of . The addition of this signal increased the intraparcel connectivity of the cluster including cluster for all scales smaller or equal to , and increased the within as well as betweenparcel connectivity for all clusters included in cluster for scales strictly larger than . Because of the absence of correlations between parcels at scale (due to the CBB resampling), all other connections within or between clusters at every scale were left unchanged by this procedure. It was thus possible to know exactly which connections were true or false null hypothesis in the group difference at every scale. Supplementary Figures S17 and S18 outline the procedure of multiscale connectome simulation.
Effect size and proportion of nonnull hypothesis
A number of clusters of reference were handpicked such that the proportion of nonnull hypothesis would be about , , and at all scales . Note that these reference clusters were used to set true nonnull hypotheses at all the scales of analysis, yet the subdivisions (or merging) associated with these clusters represented a varying proportion of the number of clusters at any given scale. As a consequence, and unlike simulations of independent tests, was dependent on the scale . Two values for were selected: and . The effect size associated with a given actually depended on the withincluster correlations, betweensubject variance in connectivity as well as the scale of analysis. Two sample sizes were investigated: ( subject per group), and ( subjects per group). See Supplementary Material Figure S19 for plots of the effect size and percentage of true nonnull hypotheses as a function of scale. For each simulation scenarios, MonteCarlo samples were generated and subjected to the MSPC analysis, with FDR levels of , , and , as well as an omnibus test at . The same scales were tested here as in the simulations for independent tests: the scales selected by MSTEPS on the SCHIZO experiment, and a regular grid from to clusters (with a step of 10).
Simulations under the global null
To assess the behaviour of the testing procedures in the absence of any signal, we also ran experiments under the global null. In that case real connectomes were generated for randomly selected and nonoverlapping groups of subjects, and then a testing procedure was implemented to assess the significance of group differences. In these experiments, no bootstrap was performed on individual time series nor any signal was added. The experiments simply consisted in comparing real connectomes between random groups of subjects sampled from identical populations, using real dependencies between tests.
Robustness to the choice of clusters
Finally, we also investigated how the procedure behaved when the clusters used in the testing procedures did not match exactly with the clusters that were used to generate the simulations. For this purpose, for each simulation, no signal was actually generated in 30% of the regions in the cluster of reference, but rather in an equivalent number of arbitrarily selected regions from other clusters. The same regions were selected across all simulations to simulate a systematic departure of the test clusters from the ground truth clusters. The multiscale clusters without perturbations were used in the statistical testing procedures. In this setting, many connections outside of the cluster of reference ended up with very small effects, and we did not investigate the specificity given the very large number of true nonnull hypotheses and large variations in effect size. However, we did investigate the sensitivity of the FDRBH procedure, using the same definition of true nonnull hypothesis as with the simulations without perturbation.
4.2 Results
Effective FDR within scale
Figure 5 represents the empirical FDR as a function of scale for the MSPC procedure, in the case of a regular grid of scales covering to brain parcels and a perfect match between the true and test clusters. The effective FDR was conservative within scale on the simulations with dependent tests, e.g. the effective FDR was about for a nominal FDR of . This is in contrast with the independent tests, where the control of the FDR within scale was exact under the global null hypothesis. Our interpretation was that the large positive correlations present in fMRI time series caused the FDRBH procedure to become conservative. In the presence of signal, the FDR within scale was still well controlled, with the same factor on the effective FDR as was observed with independent tests.
Effective FDR across scales
As was observed on independent tests, the FDR across scales transitioned between a “liberal” regime, in simulation scenarios close to the global null hypothesis, to an exact regime, where the FDR across scales matched the FDR within scales. The transition between regimes happened quite fast, with either or , as soon as was larger than . When combined with the omnibus test at , the FWE under the global null hypothesis became exact or conservative for a FDR level above . Importantly, the omnibus test also made the procedure either conservative (for ) or only slightly liberal in the scenarios were the FDR across scales transitioned between the “liberal” and “exact” regimes, with the effective FDR in the range to for a nominal level of in the worst cases (i.e. , and ). The conclusions were identical when using a regular grid of scales ranging from 10 to 300 parcels (with a step of 10), or scales identical to those selected by MSTEPS on the SCHIZO dataset, see Supplementary Figure S20.
Sensitivity
When the true and test clusters perfectly matched, the sensitivity across scales followed a similar patterns in all scenarios: a decrease in sensitivity with increasing scales, although not as sharp as what was observed on simulations with independent tests, see Figure 6. This closely mirrored the large increase in effect size at low scales, due to averaging on clusters that perfectly matched the simulated signal, see Supplementary Figure S19. We noted that the simulation settings where departure from nominal levels were observed were also characterized by very low rate of discoveries, notably at high scale, with sensitivity below 0.1 for scales higher than 50 and falling to zero for scales higher than 150 (Figure 6, first row). By contrast, when introducing a mismatch between the true and test clusters, increases in sensitivity were observed across a wider range of low scales, e.g. , and , or even at high scales, e.g. , and (Figure 7). This again reflected the more variable profiles of effect size as a function of scales across scenarios after the introduction of a mismatch between the true and test clusters. These simulations demonstrated the possibility to have increase in sensitivity as a function of scale, and that these gains would potentially be dependent on the effect size, the mismatch between the true/test clusters, as well as the sample size. These observations were made for a regular grid of scales, but were identical using the MSTEPS scales from the SCHIZO dataset (not shown).
5 Application to real datasets
5.1 Methods
Participants
The SCHIZO dataset was contributed by the Center for Biomedical Research Excellence (COBRE) to the 1000 functional connectome project^{8}^{8}8http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html (Biswal et al., 2010). The sample comprised 72 patients diagnosed with schizophrenia (58 males, age range = 1865 yrs) and 74 healthy controls (51 males, age range = 1865 yrs). The BLIND (Collignon et al., 2011) and MOTOR (Albouy et al., 2014) datasets were acquired at the Functional NeuroImaging Unit, at the Institut Universitaire de Gériatrie de Montréal, Canada. Participants gave their written informed consent to take part in the studies, which were approved by the research ethics board of the Quebec BioImaging Network (BLIND, MOTOR), as well as the ethics board of the Centre for Interdisciplinary Research in Rehabilitation of Greater Montreal (BLIND). The BLIND dataset was composed of 14 congenitally blind volunteers recruited through the Nazareth and Louis Braille Institute (10 males, age range = 2661 yrs) and 17 sighted controls (8 males, age range = 2360 yrs). The MOTOR sample included 54 healthy young participants (33 males, age range = 1933 yrs).
Acquisition
Restingstate fMRI scans were acquired on a 3T Siemens TrioTim for all datasets. One single run was obtained per subject for either the SCHIZO or BLIND dataset while two runs were acquired in each subject for the MOTOR dataset, one immediately preceding and one following the practice on a motor task. For the SCHIZO dataset, 150 EPI bloodoxygenation level dependent (BOLD) volumes were obtained in 5 mns (TR = 2 s, TE = 29 ms, FA = 75°, 32 slices, voxel size = 3x3x4 mm, matrix size = 64x64), and a structural image was acquired using a multiecho MPRAGE sequence (TR = 2.53 s, TE = 1.64/3.5/5.36/7.22/9.08 ms, FA = 7°, 176 slices, voxel size = 1x1x1 mm, matrix size = 256x256). For the BLIND dataset, 136 EPI BOLD volumes were acquired in 5 mins (TR = 2.2s, TE = 30 ms, FA = 90°, 35 slices, voxel size = 3x3x3.2 mm, gap = 25%, matrix size = 64x64), and a structural image was acquired using a MPRAGE sequence (TR = 2.3 s, TE = 2.91 ms, FA = 9°, 160 slices, voxel size = 1x1x1.2 mm, matrix size = 240x256). For the MOTOR dataset, 150 EPI volumes were recorded in 6 mins 40 s (TR = 2.65s, TE = 30ms, FA = 90°, 43 slices, voxel size = 3.4x3.4x3 mm, gap = 10%, matrix size = 64x64), and a structural image was acquired using a MPRAGE sequence (TR = 2.3 s, TE = 2.98 ms, FA = 9°, 176 slices, voxel size = 1x1x1 mm, matrix size = 256x256).
Motor task
Between the two rest runs of the MOTOR experiment, subjects were scanned while performing a motor sequence learning task with their left nondominant hand. 14 blocks of motor practice were interspersed with 15s rest epochs. Motor blocks required subjects to perform 60 finger movements, ideally corresponding to 12 correct fiveelement finger sequence. The duration of the practice blocks decreased as learning progressed. It should be noted that the effect of motor learning per se on the subsequent rest run could not be distinguished from that of a mere motor practice/fatigue effect in the present experimental setting.
Preprocessing
Each fMRI dataset was corrected for interslice difference in acquisition time and the parameters of a rigidbody motion were estimated for each time frame. Rigidbody motion was estimated within as well as between runs, using the median volume of the first run as a target. The median volume of one selected fMRI run for each subject was coregistered with a T1 individual scan using Minctracc (Collins and Evans, 1997)
, which was itself nonlinearly transformed to the Montreal Neurological Institute (MNI) template
(Fonov et al., 2011) using the CIVET pipeline (AdDab’bagh et al., 2006). The MNI symmetric template was generated from the ICBM152 sample of 152 young adults, after 40 iterations of nonlinear coregistration. The rigidbody transform, fMRItoT1 transform and T1tostereotaxic transform were all combined, and the functional volumes were resampled in the MNI space at a 3 mm isotropic resolution. The “scrubbing” method of Power et al. (2012), was used to remove the volumes with excessive motion (frame displacement greater than 0.5 mm). A minimum number of 60 unscrubbed volumes per run, corresponding to s of acquisition, was then required for further analysis. For this reason, some subjects were rejected from the subsequent analyses: 16 controls and 29 schizophrenia patients in the SCHIZO dataset (none in either the BLIND or MOTOR datasets). The following nuisance parameters were regressed out from the time series at each voxel: slow time drifts (basis of discrete cosines with a 0.01 Hz highpass cutoff), average signals in conservative masks of the white matter and the lateral ventricles as well as the first principal components (95% energy) of the six rigidbody motion parameters and their squares (Giove et al., 2009). The number of confounds regressed from the individual time series ranged from 12 to 18 for the MOTOR sample, from 11 to 15 for the BLIND sample, and from 10 to 17 for the SCHIZO sample. The fMRI volumes were finally spatially smoothed with a 6 mm isotropic Gaussian blurring kernel. Note that the preprocessed fMRI time series for the COBRE experiment have been made publicly available^{9}^{9}9http://figshare.com/articles/COBRE_preprocessed_with_NIAK_0_12_4/1160600.Multiscale parcellation
Brain parcellations were derived using BASC separately for each dataset, while pooling the patient and control groups in the SCHIZO and BLIND datasets, and runs in the MOTOR dataset. The BASC used 100 replications of the clustering of each individual time series, using circular block bootstrap, and 500 replications of the group clustering, using regular bootstrap. The functional group clusters were first generated on a fixed regular grid, from to clusters with a step of
, identical for all three real datasets, and with identical numbers of individual, group, and final (consensus) clusters. The MSTEPS procedure was then implemented to select a datadriven subset of scales approximating the group stability matrices up to 5% residual energy, through linear interpolation over selected scales. The number of individual clusters were selected in the fixed grid above, yet the number of group and final clusters were searched in an interval of
of the individual scales, such that the final scales were not constrained in the range to . Note that the number of scales itself was selected by the MSTEPS procedure in a datadriven fashion, and that the number of individual, group and final (consensus) number of clusters were not necessarily identical.General linear model
For all GLM analyses, the covariates included an intercept, the age and sex of participants as well the average frame displacement of the runs involved in the analysis (two covariates of frame displacement were used in the MOTOR dataset, one per run). The contrast of interest was on a dummy covariate coding for the difference in average connectivity between the two groups for SCHIZO and BLIND, and on the intercept (average of the difference in connectivity pre and posttraining) for the MOTOR dataset. Note that for the motor dataset the difference in connectivity between the second run and the first run was entered in the grouplevel GLM, in place of the individual connectome. All covariates except the intercept were corrected to a zero mean.
Modeling assumptions
The parametric GLM relies on a series of assumptions, most critically the normality of distribution of the residuals of the tests, and the homoscedasticity of residuals, i.e. equal variance across subjects. For each connection and each contrast, the normality of distribution for the residuals of the regression was tested with a composite test^{10}^{10}10as implemented in the swtest.m procedure http://www.mathworks.com/matlabcentral/fileexchange/13964shapirowilkandshapirofrancianormalitytests/content/swtest.m, retrieved on 12/2014.: ShapiroFrancia for platykurtic distributions and ShapiroWilk for leptokurtic distributions (Royston, 1993). A test for homoscedastic residuals was also implemented using the procedure of White (1980), where all variables as well as their twoway interactions (including squared variables) were regressed against the square of the residuals, and a test was performed on the combination of all exploratory variables. A value was generated at each connection, both for the normality and the homoscedasticity tests, for the highest resolution selected by MSTEPS, and multiple comparisons across all connections were corrected with the FDRBH procedure (). In addition to the MOTOR, BLIND and SCHIZO datasets, the Cambridge dataset previously used in the simulations was also employed here. The GLM only included an intercept and an arbitrary group difference, for different sample sizes (), in order to investigate how the testing of assumptions behaved for different sample sizes.
5.2 Results
Modeling assumptions
No tests of departure from normality passed correction for multiple comparison using FDRBH at . However, some trends towards significance were observed in all datasets, in particular with a large sample size. For a threshold of , uncorrected for multiple comparisons, the normality hypothesis was rejected for , and of connexions, for the MOTOR, BLIND and SCHIZO experiments, respectively (Supplementary Figure S21).
No test for heteroscedasticity survived a correction for multiple comparisons with FDRBH at
, and there was no apparent trend. At , the homoscedasticity hypothesis was rejected for , and in the MOTOR, BLIND and SCHIZO experiments, respectively. The trends observed for heteroscedasticity testing were similar to those observed in the Cambridge dataset, using random subgroups that are thus in fact homoscedastic (Supplementary Figure S22).Multiscale discoveries
The MSTEPS procedure selected 6 scales for the MOTOR and BLIND samples, and 7 on the SCHIZO sample, ranging from 7 to 300+, see Table S2 for multilevel scale parameters. The MSPC detection generated maximal percentages of discoveries at low scales for the three datasets (Figure 8). Using a grid from 10 to 300 scales with a step of 10, peak discoveries were detected at scale 10 for the SCHIZO and MOTOR contrasts, and scale 20 for the BLIND contrast. Peak percentages of discoveries were 30%, 2.3% and 15%, for the SCHIZO, BLIND and MOTOR contrasts, respectively. The omnibus test was significant () for all three contrasts, whether using a large grid of 30 scales or the 67 scales identified with MSTEPS. The overall trend was that the rate of discoveries decreased as the number of parcels increased, with the largest discovery rate found below scale 50, followed by a notable plateau from 50 to 100 clusters. These relationships between discovery rate and scales shared similarities with the sensitivity plots observed on simulations (Figures 3, 7). While the absolute percentages of discoveries were quite different for the three datasets, the relative changes in discovery rate as a function of scale were thus rather similar.
Spatial distribution of significant discoveries
Discovery percentage maps revealed which parcels were associated with the largest proportion of significant connections for any given parcel, see Figure 9 for a representation of the BASC multiscale parcels and associated discovery percentage maps for the SCHIZO analysis. For each contrast, results were shown for all 67 scales extracted with the MSTEPS procedure. The areas showing maximal percentage of discoveries were quite different for the three datasets (Figure 10). Widespread effects were observed for the SCHIZO dataset at both cortical and subcortical levels (see also Figure 9, for a volumetric representation). Parcels with the highest discovery rate were found in the temporal cortex, the medial temporal lobe, the anterior cingulate cortex and the basal ganglia. The BLIND contrast revealed more localized effects, in the occipital cortex and to a lesser extent in the temporal and frontal cortices. Finally, the MOTOR contrast identified significant effects within an extended visuomotor corticosubcortical parcel.
Despite the overall higher rate of discoveries for low scales, below 50, the spatial distributions of discoveries were fairly consistent across scales. It was also interesting to note that low scales did not always provide the highest discovery rate for a given brain parcel. For instance, the proportion of connections showing a significant effect in the basal ganglia for the SCHIZO contrast became maximal at scale 55, once the thalami were isolated as a single parcel (Figure 9). As another example, the dorsolateral prefrontal cortex only showed a significant effect in the BLIND contrast for functional brain parcellations above scale 40 (Figure 10).
Seedbased maps of statistics
The maps of discovery rate did not characterize which specific connections were identified as significant for each parcel, nor the direction of the effect (i.e. an increase vs a decrease in connectivity). We illustrated how these questions can be explored using the SCHIZO dataset, as it showed widespread changes in functional connectivity. The percentage of discovery maps were used to select a number of seed parcels of interest, i.e. showing maximal effects (Figure 11). Parcels selected at the highest scales corresponded to the hippocampus, anterior cingulate cortex and thalamus. Corresponding parcels for lower scales were selected based on their maximal overlap with the parcels chosen at the highest scales. For instance, the most distributed parcel encompassing the hippocampus at scale 7 covered the whole medial temporal lobe, the temporal pole and ventral prefrontal cortex. For each brain parcel, a FDRcorrected test map associated with the contrast of interest was generated. These test maps revealed that the alterations in functional coupling in schizophrenia essentially took the form of a decrease in connectivity for the hippocampus and associated regions as well as for the anterior cingulate cortex and its associated parcel. By contrast, the thalamus and basal ganglia showed an increase in functional connectivity with the occipital cortex, beyond decreased connectivity within the basal ganglia.
Impact of scale on statistical maps
While visual exploration of the test maps in the SCHIZO dataset revealed similarities of the effects across scales, it also highlighted some specificities. High scales indeed proved in some cases to be additionnally informative compared to low scales, despite decreased overall detection rate. For instance, the parcel centered on the hippocampus was seen to be more positively connected with the thalamus and caudate nucleus in schizophrenia only when the ventral prefrontal cortex was not part of the parcel (Figure 11). As another exemple, the thalamus showed increased connectivity with a large sensorimotor cortical parcel at scale 25 and above only, when it was not part of the same parcel as the putamen. Furthemore, the thalamus only showed a significant decrease in connectivity with the dorsolateral prefrontal cortex at scale 55 and above, when isolated as a single parcel rather than smoothed out inside the basal ganglia.
We more formally tested the level of correspondence of the effects across scales for the three seeds listed above in the SCHIZO dataset, as well as for seeds matching our a priori in the BLIND and MOTOR datasets, respectively located in the right primary visual cortex and the left primary motor cortex. Pairwise comparisons between spatial effect maps across scales mostly revealed positive correlation values in all three datasets and for all seeds (Figure 12
). Correlations for the three seeds investigated in the SCHIZO contrast were as follows: hippocampus (mean, standard deviation, minimum, maximum = 0.86, 0.06, 0.76, 0.97), anterior cingulate (0.41, 0.39, 0.20, 0.93), and thalamus (0.78, 0.09, 0.64, 0.94). High correlations were always observed when comparing high scales (above scale 55) between them. Comparisons between low and high scales remained associated with high correlations values for two out of the three seeds, namely the hippocampus and thalamus. However, results for the anterior cingulate demonstrated that a low correspondence between low and high scales was possible. Results for the seeds in the BLIND (0.71, 0.15, 0.46, 0.93) and MOTOR (0.80, 0.08, 0.70, 0.95) datasets further supported the general conclusions drawn from the SCHIZO dataset.
6 Discussion
6.1 Falsediscovery rate within and across scales
This work investigated empirically how the scale impacted the results of a GLM analysis on connectomes. We first assessed the validity of a GLM analysis at a single scale. On the three real datasets, there was no sign of substantial departure from the assumptions of a basic parametric GLM. Importantly, the control of the effective FDR within scale on simulations was exact or conservative, including the Cambridge “global null” experiment where real random subgroups were compared. Our empirical evaluation thus supports the validity of parametric GLM analysis of connectomes at a single scale.
We also investigated the specificity of MSPC analysis across scales. In most simulation experiments, the control of the FDR within each scale also implied a tight control of the FDR across scales. Under the global null hypothesis, the omnibus test precisely controlled the FDR (or, equivalently, the FWE). The omnibus test had little to no impact in scenarios with many true nonnull hypothesis, where the FDR across scales was conservatively controlled. The only simulations where a deviation of the FDR accross scales from nominal levels was observed (up to 0.09 for a nominal level at 0.05) typically resulted in very few findings, especially at high scales (above 50). Considering that the FDR was always appropriately controlled within scale, our overall conclusion was that, for scales matching those used in our simulation experiments, the inflation of FDR across scales was not a concern as long as the omnibus test was rejected at an appopriate FWE level.
6.2 Sensitivity across scales
We found some convergent evidence regarding the sensitivity of MSPC in our simulation and real data experiments. First, on simulations with independent tests, the sensitivity decreased sharply with scale, to reach a plateau around scale 50. This behaviour appeared to be a consequence of multiple testing in the FDRBH procedure, as the proportion of true nonnull hypothesis and the effect size were strictly maintained at a constant level across scales. In our simulations of dependent tests, the scale directly impacted the effect size, as some test clusters matched better the underlying simulated signals than others. In this setting, marked increase (up to 0.3) in sensitivity were observed at certain scales, sometimes quite high (200+). In summary, scaledependent gains in sensitivity were observed and appeared to be simultaneously driven by the match between the test clusters and the underlying signal as well as a mechanic effect of multiple comparisons in the FDRBH procedure.
On real data, we investigated the discovery rate, i.e. the proportion of significant connections in a connectome. The same behaviour was observed on the three datasets: highest discovery rate at low scale (below 50 parcels), a plateau from 50 to 100 and a low asymptote above 100200 parcels. This profile resembled most closely the sensitivity results from simulations with independent tests, and may reflect some intrinsic property of the FDRBH procedure. In particular, scales larger than 100, routinely used with the AAL template (TzourioMazoyer et al., 2002), was systematically associated with a much smaller discovery rate than lower scales (below 50).
In our three experiments, we did not identify strong discrepancies between statistical maps generated at different scales, consistent with the observations of Shehzad et al. (2014). This supports the idea that, in many cases a GLM analysis at a single scale may be enough to summarize the results of a GLM on connectomes. We observed that effects in some brain parcels were better captured at particular scales. For example, the difference in thalamic connectivity in the SCHIZO analysis was better seen at scale 55 and above, where the thalami were clustered in one parcel rather than agregated with the putamen and caudate nuclei.
6.3 Selection of an “optimal” scale
The control of the FDR across scales, which is central to the proposed MSPC approach, is only relevant if the results of a GLMconnectome analysis are investigated and reported at mutiple scales. It would not be advisable to run a MSPC and then simply report results at the scale with the highest discovery rate. There would be no guarantee that the FDR would be controlled for a scale that was selected precisely because of a high associated discovery rate, a classic case of circular analysis. If a single scale were considered in an analysis, the selection of this scale should be either set a priori or selected through MSPC on an independent dataset. However, as we noted previously, our findings on three experiments follow an overall trend that suggests that a single scale below 50 would likely be a better default choice than a template at scale 100+, such as AAL (TzourioMazoyer et al., 2002), in terms of discovery rate.
6.4 Findings on real datasets
The effects found on the real datasets were consistent with the existing literature. First, schizophrenia has been defined as a dysconnectivity syndrome, with aberrant functional interactions between brain regions being a core feature of this mental illness (for reviews, see Calhoun et al., 2009; PetterssonYeo et al., 2011; Fornito et al., 2012). As shown here for two out of three parcels, and observed for other unreported brain parcels, widespread decrease in connectivity was observed in patients, with the addition of more localized increases in connectivity. The prominence of decreases in connectivity in the temporal lobe, hippocampus and anterior cingulate cortex, amongst other regions, is well supported by previous studies (Williamson and Allman, 2012). Similarly, increased connectivity between the thalamus and sensorimotor cortex but decreased connectivity with striatal and prefrontal regions has been reported before (Anticevic et al., 2014). Second, restingstate fMRI studies have previously shown that congenital blindness is associated with a reorganization of the interactions between the occiptial cortex and other parts of the brain, in particular the auditory and premotor cortices (Liu et al., 2007; Qin et al., 2013, 2014), consistent with our findings. Finally, our results are in agreement with the observation that brain activity at rest is modulated by previous intensive motor practice (Albert et al., 2009; Vahdat et al., 2011; Sami et al., 2014). Even in the absence of a definite ground truth on these real life applications, our findings thus had good face validity, and suggested that MSPC could be successfully applied to a variety of clinical or experimental conditions.
6.5 Comparison with other methods
In this work, we only controlled for multiple comparisons within each scale using the FDRBH method, yet several alternative methods have recently been proposed. Shehzad et al. (2014) developped a multivariate test that applies on a regiontobrain connectivity map, called multivariate distance matrix regression (MDMR). This procedure would be used to screen for promising seedbased connectivity maps worthy to explore in a subsequent, independent analysis. The MDMR approach effectively performs one test per parcel, instead of one test per connection, and thus greatly alleviates the multiple comparison problem. It does not however provide a control of statistical risk at the level of single connections. Zalesky et al. (2010a) proposed to use uncorrected threshold on the individual values, but then to identify to which extent the connections that survive the test are interconnected. This extent measure is compared against what could be observed under a null hypothesis of no association, implemented through permutation testing. This approch, called NetworkBased Statistics (NBS), is the connectome equivalent to the “clusterlevel statistics” used in SPMs. The NBS only offers a loose control of falsepositive rate at the level of a single connection, but can be used to reject the possibility that a group of significant findings could be observed by chance in the FWE sense. Both the MDMR and NBS do not offer control of the FDR within scale, and thus cannot be used with the MSPC approach based on the argument that the FDR control extends across scales. Some of these methods have been compared in a variety of scenarios, e.g. FDRBH and NBS (Zalesky et al., 2012), but at a fixed scale. Based on the observation of important variations in sensitivity across scales for the FDRBH, an important avenue of future work would be to investigate how MDMR, NBS and FDRBH compare at different scales. Of note is the recent work of Meskaldji et al. (2014), which combines two scales to peform connectomewide testing: the low scale is used to screen for promising groups of intra or interparcel connections, and the tests at high scale are reweighted based on that screening. The weights can be adjusted to ensure control of the FDR across the connectome. This alternative approach to multiscale testing is limited to two scales, but may provide additional statistical power compared to MSPC as the two scales are analyzed in coombination rather than separately.
6.6 Beyond scale selection: choice of the brain parcellation
Although the impact of the number of parcels on CWAS sensitivity has been extensively investigated in this work, we only briefly examined how the choice of parcels, and not just their number, could impact sensitivity. We could, for example, have used random parcellations, like (Zalesky et al., 2010b), a parcellation based on anatomical landmarks such as the AAL atlas (TzourioMazoyer et al., 2002), or a functional parcellation with spatial connexity constraints (Craddock et al., 2012). From our results on simulations, it seems clear that dramatic differences in statistical power can be achieved at a given spatial scale, if a set of parcels is best adapted to the spatial distribution of an effect. The work of Craddock et al. (2012) suggested that functional brain parcels are more homogeneous than anatomical parcels. We believe that important improvement in sensitivity could be gained from the optimization of the parcellation scheme, rather than scale, and this represents an important avenue for future research. Following an idea initially explored in (Thirion et al., 2006), it may even be possible to relax the constraint of identical parcels across subjects, by matching different individualspecific parcels and use this correspondence to run grouplevel CWAS analysis.
7 Conclusion
Our overall conclusion is that the MSPC method is statistically valid (specific) and has the potential to identify biologically plausible associations in a variety of experimental conditions. An analysis at a single scale with less than 50 parcels appears as a reasonable default option, likely to have a sensitivity superior to the common approach using 100+ brain parcels in many settings. Multiscale analysis still have the potential to identify specific effects in small parcels. The MSPC method is available in the NIAK package^{11}^{11}11nitrc.org/projects/niak (Bellec et al., 2011), a free and opensource software that runs in matlab and GNU octave, and we also released a set of multiscale functional brain parcellations ^{12}^{12}12http://figshare.com/articles/Group_multiscale_functional_template_generated_with_BASC_on_the_Cambridge_sample/1285615.
8 Acknowledgments
Parts of this work were presented at the 2012 and 2013 annual meetings of the organization for human brain mapping, as well as the International Conference on RestingState Connectivity 2012 (Magdeburg). The authors are grateful to the members of the 1000 functional connectome consortium for publicly releasing the “Cambridge” and “COBRE” data samples, as well as Dr Shehzad for valuable feedback. The computational resources used to perform the data analysis were provided by Compute Canada^{13}^{13}13https://computecanada.org/ and CLUMEQ^{14}^{14}14http://www.clumeq.mcgill.ca/, which is funded in part by NSERC (MRS), FQRNT, and McGill University. This project was funded by NSERC grant number RN000028, a salary award from “Fonds de recherche du Québec – Santé” to PB as well as a salary award by the Canadian Institute of Health Research to PO.
References
References

Abou Elseoud et al. (2011)
Abou Elseoud, A., Littow, H., Remes, J., Starck, T., Nikkinen, J., Nissilä,
J., Timonen, M., Tervonen, O., Kiviniemi, V., 2011. GroupICA Model Order
Highlights Patterns of Functional Brain Connectivity. Frontiers in systems
neuroscience 5.
URL http://dx.doi.org/10.3389/fnsys.2011.00037  AdDab’bagh et al. (2006) AdDab’bagh, Y., Einarson, D., Lyttelton, O., Muehlboeck, J. S., Mok, K., Ivanov, O., Vincent, R. D., Lepage, C., Lerch, J., Fombonne, E., Evans, A. C., 2006. The CIVET ImageProcessing Environment: A Fully Automated Comprehensive Pipeline for Anatomical Neuroimaging Research. In: Corbetta, M. (Ed.), Proceedings of the 12th Annual Meeting of the Human Brain Mapping Organization. Neuroimage, Florence, Italy.

Albert et al. (2009)
Albert, N. B., Robertson, E. M., Miall, R. C., Jun. 2009. The resting human
brain and motor learning. Current biology : CB 19 (12), 1023–1027.
URL http://dx.doi.org/10.1016/j.cub.2009.04.028 
Albouy et al. (2014)
Albouy, G., Fogel, S., King, B. R., Laventure, S., Benali, H., Karni, A.,
Carrier, J., Robertson, E. M., Doyon, J., Dec. 2014. Maintaining vs.
enhancing motor sequence memories: Respective roles of striatal and
hippocampal systems. NeuroImage.
URL http://view.ncbi.nlm.nih.gov/pubmed/25542533 
Anderson (2002)
Anderson, C. W., Mar. 2002. Quantitative Methods for Current Environmental
Issues. Springer.
URL http://www.worldcat.org/isbn/1852332948  Anderson (1958) Anderson, T. W., 1958. An introduction to multivariate statistical analysis. Wiley Publications in Statistics. John Wiley and Sons, New York.

Anticevic et al. (2014)
Anticevic, A., Cole, M. W., Repovs, G., Murray, J. D., Brumbaugh, M. S.,
Winkler, A. M., Savic, A., Krystal, J. H., Pearlson, G. D., Glahn, D. C.,
Dec. 2014. Characterizing thalamocortical disturbances in schizophrenia and
bipolar illness. Cerebral cortex (New York, N.Y. : 1991) 24 (12),
3116–3130.
URL http://dx.doi.org/10.1093/cercor/bht165 
Barkhof et al. (2014)
Barkhof, F., Haller, S., Rombouts, S. A., Jul. 2014. Restingstate functional
MR imaging: a new window to the brain. Radiology 272 (1), 29–49.
URL http://view.ncbi.nlm.nih.gov/pubmed/24956047 
Bellec (2013)
Bellec, P., Jun. 2013. Mining the Hierarchy of RestingState Brain Networks: Selection of Representative Clusters in a Multiscale Structure. In: Pattern Recognition in Neuroimaging (PRNI), 2013 International Workshop on. pp. 54–57.
 Bellec et al. (2011) Bellec, P., Carbonell, F. M., Perlbarg, V., Lepage, C., Lyttelton, O., Fonov, V., Janke, A., Tohka, J., Evans, A. C., 2011. A neuroimaging analysis kit for Matlab and Octave. In: Proceedings of the 17th International Conference on Functional Mapping of the Human Brain. pp. In Press+.

Bellec et al. (2012)
Bellec, P., LavoieCourchesne, S., Dickinson, P., Lerch, J. P., Zijdenbos,
A. P., Evans, A. C., 2012. The pipeline system for Octave and Matlab (PSOM):
a lightweight scripting framework and execution engine for scientific
workflows. Frontiers in neuroinformatics 6.
URL http://dx.doi.org/10.3389/fninf.2012.00007 
Bellec et al. (2006)
Bellec, P., Perlbarg, V., Jbabdi, S., PélégriniIssac, M., Anton, J.,
Doyon, J., Benali, H., Feb. 2006. Identification of largescale networks in
the brain using fMRI. NeuroImage 29 (4), 1231–1243.
URL http://dx.doi.org/10.1016/j.neuroimage.2005.08.044 
Bellec et al. (2010)
Bellec, P., RosaNeto, P., Lyttelton, O. C., Benali, H., Evans, A. C., Jul.
2010. Multilevel bootstrap analysis of stable clusters in restingstate
fMRI. NeuroImage 51 (3), 1126–1139.
URL http://dx.doi.org/10.1016/j.neuroimage.2010.02.082  Benjamini and Hochberg (1995) Benjamini, Y., Hochberg, Y., 1995. Controlling the falsediscovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57, 289–300.

Benjamini and Yekutieli (2001)
Benjamini, Y., Yekutieli, D., 2001. The Control of the False Discovery Rate in
Multiple Testing under Dependency. The Annals of Statistics 29 (4),
1165–1188.
URL http://dx.doi.org/10.2307/2674075 
Biswal et al. (2010)
Biswal, B. B., Mennes, M., Zuo, X.N. N., Gohel, S., Kelly, C., Smith, S. M.,
Beckmann, C. F., Adelstein, J. S., Buckner, R. L., Colcombe, S., Dogonowski,
A.M. M., Ernst, M., Fair, D., Hampson, M., Hoptman, M. J., Hyde, J. S.,
Kiviniemi, V. J., Kötter, R., Li, S.J. J., Lin, C.P. P., Lowe, M. J.,
Mackay, C., Madden, D. J., Madsen, K. H., Margulies, D. S., Mayberg, H. S.,
McMahon, K., Monk, C. S., Mostofsky, S. H., Nagel, B. J., Pekar, J. J.,
Peltier, S. J., Petersen, S. E., Riedl, V., Rombouts, S. A., Rypma, B.,
Schlaggar, B. L., Schmidt, S., Seidler, R. D., Siegle, G. J., Sorg, C., Teng,
G.J. J., Veijola, J., Villringer, A., Walter, M., Wang, L., Weng, X.C. C.,
WhitfieldGabrieli, S., Williamson, P., Windischberger, C., Zang, Y.F. F.,
Zhang, H.Y. Y., Castellanos, F. X., Milham, M. P., Mar. 2010. Toward
discovery science of human brain function. Proceedings of the National
Academy of Sciences of the United States of America 107 (10), 4734–4739.
URL http://dx.doi.org/10.1073/pnas.0911855107 
Blumensath et al. (2013)
Blumensath, T., Jbabdi, S., Glasser, M. F., Van Essen, D. C., Ugurbil, K.,
Behrens, T. E. J., Smith, S. M., Aug. 2013. Spatially constrained
hierarchical parcellation of the brain with restingstate fMRI. NeuroImage
76, 313–324.
URL http://dx.doi.org/10.1016/j.neuroimage.2013.03.024 
Calhoun et al. (2009)
Calhoun, V. D., Eichele, T., Pearlson, G., 2009. Functional brain networks in
schizophrenia: a review. Frontiers in human neuroscience 3.
URL http://dx.doi.org/10.3389/neuro.09.017.2009 
Castellanos et al. (2013)
Castellanos, F. X., Di Martino, A., Craddock, R. C., Mehta, A. D., Milham,
M. P., Oct. 2013. Clinical applications of the functional connectome.
NeuroImage 80, 527–540.
URL http://dx.doi.org/10.1016/j.neuroimage.2013.04.083 
Collignon et al. (2011)
Collignon, O., Vandewalle, G., Voss, P., Albouy, G., Charbonneau, G., Lassonde,
M., Lepore, F., Mar. 2011. Functional specialization for auditoryspatial
processing in the occipital cortex of congenitally blind humans. Proceedings
of the National Academy of Sciences of the United States of America 108 (11),
4435–4440.
URL http://dx.doi.org/10.1073/pnas.1013928108 
Collins and Evans (1997)
Collins, D. L., Evans, A. C., 1997. Animal: validation and applications of nonlinear registrationbased segmentation. International Journal of Pattern Recognition and Artificial Intelligence 11, 1271–1294.

Craddock et al. (2012)
Craddock, R. C., James, G. A., Holtzheimer, P. E., Hu, X. P., Mayberg, H. S.,
Aug. 2012. A whole brain fMRI atlas generated via spatially constrained
spectral clustering. Human brain mapping 33 (8), 1914–1928.
URL http://dx.doi.org/10.1002/hbm.21333 
Damoiseaux et al. (2006)
Damoiseaux, J. S., Rombouts, S. A. R. B., Barkhof, F., Scheltens, P., Stam,
C. J., Smith, S. M., Beckmann, C. F., Sep. 2006. Consistent restingstate
networks across healthy subjects. Proceedings of the National Academy of
Sciences 103 (37), 13848–13853.
URL http://dx.doi.org/10.1073/pnas.0601417103 
De Luca et al. (2006)
De Luca, M., Beckmann, C. F., De Stefano, N., Matthews, P. M., Smith, S. M.,
Feb. 2006. fMRI resting state networks define distinct modes of
longdistance interactions in the human brain. NeuroImage 29 (4),
1359–1367.
URL http://dx.doi.org/10.1016/j.neuroimage.2005.08.035 
Efron (2008)
Efron, B., Mar. 2008. Simultaneous inference: When should hypothesis testing
problems be combined? Annals of Applied Statistics 2 (1), 197–223.
URL http://dx.doi.org/10.1214/07aoas141 
Fonov et al. (2011)
Fonov, V., Evans, A. C., Botteron, K., Almli, C. R., McKinstry, R. C., Collins,
D. L., Jan. 2011. Unbiased average ageappropriate atlases for pediatric
studies. NeuroImage 54 (1), 313–327.
URL http://dx.doi.org/10.1016/j.neuroimage.2010.07.033 
Fornito et al. (2012)
Fornito, A., Zalesky, A., Pantelis, C., Bullmore, E. T., Oct. 2012.
Schizophrenia, neuroimaging and connectomics. NeuroImage 62 (4),
2296–2314.
URL http://view.ncbi.nlm.nih.gov/pubmed/22387165 
Fox and Greicius (2010)
Fox, M. D., Greicius, M., 2010. Clinical applications of resting state
functional connectivity. Frontiers in systems neuroscience 4.
URL http://dx.doi.org/10.3389/fnsys.2010.00019 
Giove et al. (2009)
Giove, F., Gili, T., Iacovella, V., Macaluso, E., Maraviglia, B., Oct. 2009.
Imagesbased suppression of unwanted global signals in restingstate
functional connectivity studies. Magnetic resonance imaging 27 (8),
1058–1064.
URL http://dx.doi.org/10.1016/j.mri.2009.06.004 
Gordon et al. (2014)
Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M.,
Petersen, S. E., Oct. 2014. Generation and Evaluation of a Cortical Area
Parcellation from RestingState Correlations. Cerebral cortex (New York,
N.Y. : 1991), bhu239+.
URL http://dx.doi.org/10.1093/cercor/bhu239 
Jafri et al. (2008)
Jafri, M. J., Pearlson, G. D., Stevens, M., Calhoun, V. D., Feb. 2008. A
method for functional network connectivity among spatially independent
restingstate components in schizophrenia. NeuroImage 39 (4), 1666–1681.
URL http://dx.doi.org/10.1016/j.neuroimage.2007.11.001 
Liu et al. (2009)
Liu, H., Stufflebeam, S. M., Sepulcre, J., Hedden, T., Buckner, R. L., Dec.
2009. Evidence from intrinsic activity that asymmetry of the human brain is
controlled by multiple factors. Proceedings of the National Academy of
Sciences 106 (48), 20499–20503.
URL http://dx.doi.org/10.1073/pnas.0908073106 
Liu et al. (2007)
Liu, Y., Yu, C., Liang, M., Li, J., Tian, L., Zhou, Y., Qin, W., Li, K., Jiang,
T., Aug. 2007. Whole brain functional connectivity in the early blind.
Brain 130 (8), 2085–2096.
URL http://dx.doi.org/10.1093/brain/awm121 
Lu et al. (2003)
Lu, Y., Jiang, T., Zang, Y., Sep. 2003. Region growing method for the analysis
of functional MRI data. NeuroImage 20 (1), 455–465.
URL http://dx.doi.org/10.1016/s10538119(03)003525 
Marrelec et al. (2008)
Marrelec, G., Bellec, P., Krainik, A., Duffau, H., PélégriniIssac, M.,
Lehéricy, S., Benali, H., Doyon, J., Feb. 2008. Regions, systems, and
the brain: Hierarchical measures of functional integration in fMRI. Medical
image analysis.
URL http://dx.doi.org/10.1016/j.media.2008.02.002 
Marrelec et al. (2006)
Marrelec, G., Krainik, A., Duffau, H., PélégriniIssac, M.,
Lehéricy, S., Doyon, J., Benali, H., 2006. Partial correlation for
functional brain interactivity investigation in functional MRI. NeuroImage
32 (1), 228–237.
URL http://dx.doi.org/10.1016/j.neuroimage.2005.12.057 
Meskaldji et al. (2013)
Meskaldji, D. E., FischiGomez, E., Griffa, A., Hagmann, P., Morgenthaler, S.,
Thiran, J.P., Oct. 2013. Comparing connectomes across subjects and
populations at different scales. NeuroImage 80, 416–425.
URL http://dx.doi.org/10.1016/j.neuroimage.2013.04.084 
Meskaldji et al. (2014)
Meskaldji, D.E., Vasung, L., Romascano, D., Thiran, J.P., Hagmann, P.,
Morgenthaler, S., Van De Ville, D., Dec. 2014. Improved statistical
evaluation of group differences in connectomes by screening–filtering
strategy with application to study maturation of brain connections between
childhood and adolescence. NeuroImage.
URL http://dx.doi.org/10.1016/j.neuroimage.2014.11.059 
PetterssonYeo et al. (2011)
PetterssonYeo, W., Allen, P., Benetti, S., McGuire, P., Mechelli, A., Apr.
2011. Dysconnectivity in schizophrenia: where are we now? Neuroscience and
biobehavioral reviews 35 (5), 1110–1124.
URL http://view.ncbi.nlm.nih.gov/pubmed/21115039 
Power et al. (2012)
Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L., Petersen, S. E.,
Feb. 2012. Spurious but systematic correlations in functional connectivity
MRI networks arise from subject motion. NeuroImage 59 (3), 2142–2154.
URL http://dx.doi.org/10.1016/j.neuroimage.2011.10.018 
Qin et al. (2013)
Qin, W., Liu, Y., Jiang, T., Yu, C., 2013. The development of visual areas
depends differently on visual experience. PloS one 8 (1).
URL http://view.ncbi.nlm.nih.gov/pubmed/23308283 
Qin et al. (2014)
Qin, W., Xuan, Y., Liu, Y., Jiang, T., Yu, C., Mar. 2014. Functional
Connectivity Density in Congenitally and Late Blind Subjects. Cerebral
cortex (New York, N.Y. : 1991).
URL http://view.ncbi.nlm.nih.gov/pubmed/24642421  Royston (1993) Royston, P., 1993. A Toolkit for Testing for NonNormality in Complete and Censored Samples. Journal of the Royal Statistical Society. Series D (The Statistician) 42 (1), 37–43.

Sami et al. (2014)
Sami, S., Robertson, E. M., Miall, R. C., Mar. 2014. The time course of
taskspecific memory consolidation effects in resting state networks. The
Journal of neuroscience : the official journal of the Society for
Neuroscience 34 (11), 3982–3992.
URL http://dx.doi.org/10.1523/jneurosci.434113.2014 
Shehzad et al. (2014)
Shehzad, Z., Kelly, C., Reiss, P. T., Cameron Craddock, R., Emerson, J. W.,
McMahon, K., Copland, D. A., Xavier Castellanos, F., Milham, M. P., Jun.
2014. A multivariate distancebased analytic framework for connectomewide
association studies. NeuroImage 93, 74–94.
URL http://dx.doi.org/10.1016/j.neuroimage.2014.02.024 
Smith et al. (2011)
Smith, S. M., Miller, K. L., SalimiKhorshidi, G., Webster, M., Beckmann,
C. F., Nichols, T. E., Ramsey, J. D., Woolrich, M. W., Jan. 2011. Network
modelling methods for FMRI. NeuroImage 54 (2), 875–891.
URL http://dx.doi.org/10.1016/j.neuroimage.2010.08.063 
Thirion et al. (2006)
Thirion, B., Flandin, G., Pinel, P., Roche, A., Ciuciu, P., Poline, J.B. B.,
Aug. 2006. Dealing with the shortcomings of spatial normalization:
multisubject parcellation of fMRI datasets. Human brain mapping 27 (8),
678–693.
URL http://dx.doi.org/10.1002/hbm.20210 
Thirion et al. (2014)
Thirion, B., Varoquaux, G., Dohmatob, E., Poline, J.B. B., 2014. Which fMRI
clustering gives good brain parcellations? Frontiers in neuroscience 8.
URL http://view.ncbi.nlm.nih.gov/pubmed/25071425 
TzourioMazoyer et al. (2002)
TzourioMazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O.,
Delcroix, N., Mazoyer, B., Joliot, M., Jan. 2002. Automated anatomical
labeling of activations in SPM using a macroscopic anatomical parcellation of
the MNI MRI singlesubject brain. NeuroImage 15 (1), 273–289.
URL http://dx.doi.org/10.1006/nimg.2001.0978 
Vahdat et al. (2011)
Vahdat, S., Darainy, M., Milner, T. E., Ostry, D. J., Nov. 2011. Functionally
specific changes in restingstate sensorimotor networks after motor
learning. The Journal of neuroscience : the official journal of the Society
for Neuroscience 31 (47), 16907–16915.
URL http://view.ncbi.nlm.nih.gov/pubmed/22114261 
Wang et al. (2007)
Wang, K., Liang, M., Wang, L., Tian, L., Zhang, X., Li, K., Jiang, T., Oct.
2007. Altered functional connectivity in early Alzheimer’s disease: A
restingstate fMRI study. Hum. Brain Mapp. 28 (10), 967–978.
URL http://dx.doi.org/10.1002/hbm.20324  White (1980) White, H., 1980. A heteroskedasticityconsistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: Journal of the Econometric Society, 817–838.

Williamson and Allman (2012)
Williamson, P. C., Allman, J. M., 2012. A framework for interpreting
functional networks in schizophrenia. Frontiers in human neuroscience 6.
URL http://dx.doi.org/10.3389/fnhum.2012.00184 
Worsley et al. (1998)
Worsley, K. J., Cao, J., Paus, T., Petrides, M., Evans, A. C., 1998.
Applications of random field theory to functional connectivity. Hum Brain
Mapp 6 (56), 364–367.
URL http://view.ncbi.nlm.nih.gov/pubmed/9788073 
Worsley and Friston (1995)
Worsley, K. J., Friston, K. J., Sep. 1995. Analysis of fMRI TimeSeries
Revisited—Again. NeuroImage 2 (3), 173–181.
URL http://dx.doi.org/10.1006/nimg.1995.1023 
Zalesky et al. (2012)
Zalesky, A., Cocchi, L., Fornito, A., Murray, M. M., Bullmore, E., Apr. 2012.
Connectivity differences in brain networks. NeuroImage 60 (2), 1055–1062.
URL http://dx.doi.org/10.1016/j.neuroimage.2012.01.068 
Zalesky et al. (2010a)
Zalesky, A., Fornito, A., Bullmore, E. T., Dec. 2010a.
Networkbased statistic: Identifying differences in brain networks.
NeuroImage 53 (4), 1197–1207.
URL http://dx.doi.org/10.1016/j.neuroimage.2010.06.041 
Zalesky et al. (2010b)
Zalesky, A., Fornito, A., Harding, I. H., Cocchi, L., Yücel, M., Pantelis,
C., Bullmore, E. T., Apr. 2010b. Wholebrain anatomical
networks: Does the choice of nodes matter? NeuroImage 50 (3), 970–983.
URL http://dx.doi.org/10.1016/j.neuroimage.2009.12.027
Appendix A Functional connectomes
Let be a partition of the brain, i.e. parcels such that any voxel in the grey matter belongs to one and only one a parcel. The number of parcels is the (spatial) scale of the partition. For an fMRI dataset with time samples, the average time series (vector of length ) is generated for each parcel . These average time series are then used to generate a matrix of functional connectivity :
(4) 
where corr is Pearson’s linear correlation coefficient and is the Fisher’s transform. The Fisher’s transform is used to stabilize the variance of the estimated correlation coefficient (Anderson, 1958). This measure of betweenparcel connectivity was used for , but we also included a measure of withincluster average functional connectivity, which uses the voxellevel time series :
(5) 
Appendix B Ordinary least square GLM estimation
The independent and homoscedastic assumption means that the coefficients of are independent from each other and that for each connection , the coefficients are identically distributed with a zero mean and variance . In this context, the maximum likelihood (ordinary leastsquares) estimator of is:
(6) 
and the estimation of the variance of the noise is:
(7) 
For each covariate , the vector is a vectorized connectome of statistical parameters, quantifying the modulation of each connection by the covariate . This is a direct generalization of the concept of a SPM that has been widely used in taskbased fMRI analysis. Each column of the statistical parametric connectome is a actually a SPM at the parcel level (instead of the more standard voxel level), testing the modulation of the functional connectivity of a given seed region with the rest of the brain by the covariate of interest. It is possible to test the signficance of each element of the statistical parametric connectome against the null hypothesis of no association (i.e. ), using a test:
(8) 
where is a contrast (column) vector, with equals 1 for and 0 otherwise. Under , the quantity follows a Student’s distribution with degrees of freedom. By comparing
with the cumulative distribution function
of the Student’s distribution, it is possible to derive the bilateral probability of observing under :(9) 
Appendix C BenjaminiHochberg and group falsediscovery rate
For a given method of selection of significant discoveries, let be the number of false positive and the number of true positive. The FDR is the mathematical expectation of the ratio between the number of false discoveries and the total number of discoveries (with the usual convention that ). The classic BH procedure was used to control the FDR. Let’s first assume that the values have been sorted in ascending order, such that . The BH procedure is built on an estimate of the falsepositive rate, equal to . The values are screened to find the largest such that . If such an integer does not exist, there are no discoveries. Otherwise, all connections are considered as significant.
Appendix D Generation of statistical parametric connectomes under the global null hypothesis
Let be the (subjects x connections) matrix of individual connectomes at scale . A replication of the connectome matrix under the global null hypothesis is generated by recomposing the linear mixture while excluding the th covariate of interest, tested by the model. Formally, let be the reduced model where the covariate has been removed from the (subjects x covariates) matrix . Let be the ordinary least square estimate of the regression coefficients using the reduced model. Each permutation sample of the dataset is generated as described in (Anderson, 2002):
(10) 
where is a replication of the residuals of the regression of the reduced model, with permuted rows (subjects). The GLM procedure is then implemented with the and the full model to generate a replication of the volume of discoveries at scale under .
Because the same dataset at voxel resolution is used to generate all the connectome datasets , the samples are not independent. In order to respect these dependencies, for any given replication, the same permutation of the subjects is used to generate to all of the . The replication of the total volume of discoveries is then simply the sum of for all . This procedure is repeated times in order to generate replications of the total volume of discoveries under . The MonteCarlo estimation of the probability to observe a greater total volume of discoveries under than the actual total volume of discoveries generated on the original (nonpermuted) dataset is then:
(11) 
where means that the two terms are asymptotically equal as tends toward infinity.
Supplementary Material – Multiscale statistical testing for connectomewide association studies in fMRI
Submitted to Neuroimage.
P. Bellec, Y. Benhajali, F. Carbonell, C. Dansereau, G. Albouy, M. Pelland, C. Craddock, O. Collignon, J. Doyon, E. Stip, P. Orban
Functional Neuroimaging Unit, Centre de Recherche de l’Institut Universitaire de Gériatrie de Montréal
Department of Computer Science and Operations Research, University of Montreal, Montreal, Quebec, Canada
Department of Anthropology, University of Montreal, Montreal, Quebec, Canada
Biospective Incorporated, Montreal, Quebec, Canada
Department of Psychology, University of Montreal, Montreal, Quebec, Canada
Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, United States of America
Center for the Developing Brain, Child Mind Institute, New York, NY, United States of America
Department of Psychiatry, University of Montreal, Montreal, Quebec, Canada
Centre Hospitalier de l’Université de Montréal, Montreal, Quebec, Canada
For all questions regarding the paper, please address correspondence to Pierre Bellec, CRIUGM, 4545 Queen Mary, Montreal, QC, H3W 1W5, Canada. Email: pierre.bellec (at) criugm.qc.ca.
MOTOR  10  20  40  90  170  270  

7  16  36  72  153  297  
7  14  35  72  154  308  
BLIND  10  20  50  90  180  280  
7  18  40  81  162  308  
7  16  40  77  165  313  
SCHIZO  10  20  30  60  120  210  270  
7  14  27  54  108  189  324  
7  16  25  55  114  199  328 
Comments
There are no comments yet.