Log In Sign Up

Robust Spatial Extent Inference with a Semiparametric Bootstrap Joint Testing Procedure

Spatial extent inference (SEI) is widely used across neuroimaging modalities to study brain-phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF) based tools can have inflated family-wise error rates (FWERs). This has led to fervent discussion as to which preprocessing steps are necessary to control the FWER using GRF-based SEI. The failure of GRF-based methods is due to unrealistic assumptions about the covariance function of the imaging data. The permutation procedure is the most robust SEI tool because it estimates the covariance function from the imaging data. However, the permutation procedure can fail because its assumption of exchangeability is violated in many imaging modalities. Here, we propose the (semi-) parametric bootstrap joint (PBJ; sPBJ) testing procedures that are designed for SEI of multilevel imaging data. The sPBJ procedure uses a robust estimate of the covariance function, which yields consistent estimates of standard errors, even if the covariance model is misspecified. We use our methods to study the association between performance and executive functioning in a working fMRI study. The sPBJ procedure is robust to variance misspecification and maintains nominal FWER in small samples, in contrast to the GRF methods. The sPBJ also has equal or superior power to the PBJ and permutation procedures. We provide an R package to perform inference using the PBJ and sPBJ procedures


page 9

page 14


Multivariate analysis of covariance when standard assumptions are violated

In applied research, it is often sensible to account for one or several ...

A Kernel Independence Test for Random Processes

A new non parametric approach to the problem of testing the independence...

Predictive Inference Is Free with the Jackknife+-after-Bootstrap

Ensemble learning is widely used in applications to make predictions in ...

A Bayesian 'sandwich' for variance estimation and hypothesis testing

Many frequentist methods have large-sample Bayesian analogs, but widely-...

Statistical Inference for Diagnostic Test Accuracy Studies with Multiple Comparisons

Diagnostic accuracy studies assess sensitivity and specificity of a new ...

An algorithm-based multiple detection influence measure for high dimensional regression using expectile

The identification of influential observations is an important part of d...

Robust Mediation Analysis: The R Package robmed

Mediation analysis is one of the most widely used statistical techniques...

1 Introduction

Investigating neuroimage-phenotype associations is important for understanding how stimuli, environment, and diseases affect the brain. Various neuroimaging modalities are used to study brain-behavior associations. For example, imaging scientists can study how brain activation measured with functional magnetic resonance imaging (fMRI) is related to performance on an executive functioning task (Satterthwaite et al., 2013)

. Spatial extent inference (SEI) is a widely used tool across neuroimaging modalities to study image-phenotype associations. SEI views the statistical image as a realization of a stochastic process and thresholds the image at a given value in order to compute the distribution of the spatial extent of contiguous clusters of activation. A p-value is determined for each cluster based on its spatial extent by comparing the cluster extent to the distribution of the maximum cluster size under a predetermined null hypothesis

(Friston et al., 1994; Worsley et al., 1999).

A number of recent studies have shown that the standard tools used for SEI that rely on gaussian random field (GRF) approximations can have highly inflated false positive rates (Eklund et al., 2016; Silver et al., 2011; Greve and Fischl, 2018). These studies have provoked significant discussion about the conditions necessary for the GRF-based tools to produce nominal family-wise error rates (FWERs; Kessler et al., 2017; Mueller et al., 2017; Eklund et al., 2018; Slotnick, 2017). A good deal of this discussion has led to ad hoc preprocessing and thresholding suggestions for particular data types that are based on simulation results (Eklund et al., 2016; Greve and Fischl, 2018; Flandin and Friston, 2016; Slotnick, 2017; Mueller et al., 2017). For example, simulation results suggest using particular values preprocessing parameters such that the GRF based methods yield nominal FWERs (Flandin and Friston, 2016). However, GRF-based methods cannot control the FWER generally due to unrealistic assumptions about the covariance structure of the neuroimaging data. As a result ad hoc preprocessing decisions are unlikely to yield nominal FWERs across a wide range of data sets.

To date, only permutation testing procedures have been shown to robustly control the spatial extent false positive rate in most neuroimaging datasets (Eklund et al., 2016; Winkler et al., 2014). However, the permutation procedure may not control the false positive rate at the nominal level in some cases because it relies on the assumption of exchangeability. Exchangeability includes the assumption of homoskedasticity; namely, that the covariance structures of all the subjects are unaffected by the covariates. This assumption is often violated in neuroimaging datasets because many types of imaging datasets require multilevel models, in which parameters are estimated separately for each subject and subsequently analyzed at the group level. Exchangeability may also be violated because the covariance structure of imaging data is known to be affected by in-scanner motion, age, and diagnostic status (see e.g. Satterthwaite et al., 2012; Power et al., 2010; Woodward and Heckers, 2016).

In the statistics literature several procedures have been developed for hypothesis testing in spatial datasets. Sun et al. (2015) developed false discovery rate (FDR) controlling procedures in a Bayesian context in order to perform inference pointwise or on predefined clusters. Procedures to address large-scale hypothesis testing for correlated tests could also be utilized to perform inference on images (e.g. Yekutieli and Benjamini, 1999; Benjamini and Yekutieli, 2001; Romano et al., 2008). These procedures, which perform inference pointwise have the desirable property that they identify locations of the image or map where the mean is likely to be non-zero. However, due to the high-dimensionality, small sample sizes, and complex covariance functions of neuroimaging data, many studies lack power to perform inference directly on the mean function. As a solution, other proposed methods for controlling the FDR in spatial data improve power by using predefined clusters and performing inference on the clusters (Benjamini and Heller, 2007; Pacifico et al., 2004)

. SEI is similar to these cluster based approaches because a significant cluster only guarantees (with a prespecified probability) that an arbitrary nonzero portion of the cluster has nonzero mean. However, unlike the cluster based FDR methods SEI does not require that the user defines the clusters

a priori.

Here we develop two methods to perform SEI that use a (semi-) parametric bootstrap joint (PBJ; sPBJ) testing procedure to compute test statistics and estimate the joint distribution of test statistics (Section

3). The PBJ procedure accomodates violations of exchangeability by using subject weights to deweight noisy observations at each voxel. If the selected weights are inversely proportional to the subject-level variances then the covariance function is correctly specified and the parameter estimators have minimum variance. In order to address the case that subject weights are not correctly specified we introduce the sPBJ procedure that uses a robust “sandwich” estimate of the covariance function, which is an asymptotically consistent estimator of the covariance function of the stochastic process (estimator in Long and Ervin, 2000; MacKinnon and White, 1985). Because the covariance function of the stochastic process is complex in imaging data, we leverage a bootstrap procedure to estimate the distribution of the maximum cluster size (Vandekar et al., 2017). These methods are asymptotically robust across any range of preprocessing parameters and cluster forming thresholds (CFTs) because they estimate the covariance structure from the observed data.

We utilize a subset of 1,000 subjects from the Philadelphia Neurodevelopmental Cohort (PNC; Satterthwaite et al., 2014, 2016) and a novel approach to generate realistic simulated data in which the assumption of exchangeability is violated in order to show that the PBJ and sPBJ procedures are robust to violations of this assumption (Section 5). We demonstrate the procedures by studying the association between task performance and executive functioning measured with fMRI using the -back task (Sections 4 and 6; Ragland et al., 2002; Satterthwaite et al., 2013). While the procedure is generally applicable for inference in spatial data sets, we developed an R package to perform the PBJ and sPBJ SEI procedures on NIfTI images that is available for download at and from Neuroconductor (; Muschelli et al., 2018).

2 Spatial extent inference

Let the statistical map under the null, , be a stochastic process on , where denotes the normed space of bounded functions from a bounded subset to equipped with the uniform norm. Let be the observed statistical map that is sampled from an arbitrary stochastic process on . We assume that the sample paths of and are continuous almost everywhere. Let be a function that thresholds the map with a given cluster forming threshold (CFT), , and computes the volume of all suprathreshold contiguous clusters in the image. is in because there is an arbitrary number of contiguous clusters possible. To give the clusters a natural ordering, we assume that, for all sample paths , for and that there exists finite such that for all . That is, the clusters are indexed in decreasing order and there are a finite number of nonzero clusters.

Procedure 2.1 (Spatial extent inference).

For a fixed we set


and call cluster “significant” if for some predetermined rejection threshold .

The p-value (1) computed by Procedure 2.1 represents the probability of observing a cluster as or more extreme than , under the global null that is a sample from (Friston et al., 1994). Typically, the threshold is chosen such that suprathreshold regions have uncorrected p-values less than some common CFT threshold, e.g. , which corresponds to a voxel-wise under Gaussianity assumptions. The bootstrap samples generated by Procedures 3.1 and 3.2, below, can be used to perform the SEI described in Procedure 2.1 by approximating the distribution of across bootstrap samples.

3 Statistical theory for the bootstrap procedures

3.1 Overview

We propose two approaches to model the covariance function of the statistical map. The PBJ procedure assumes the within subject variance estimates are correct. The sPBJ procedure utilizes a working covariance function. An advantage of the sPBJ is that even if the working covariance function is incorrect the procedure still yields consistent standard errors.

Let , for , be stochastic processes on . represent the imaging data and is the bounded space of interest (in neuroimaging, the brain). We assume that



is a row vector of nuisance covariates including the intercept,

is a row vector of variables of interest, , , the parameter image column vectors , , , error with and covariance function for all

. Here, all capital letters denote random variables or stochastic processes. Throughout, we assume that the conditional mean

of model (2) is correctly specified. This model is a form of function-on-scalar regression (Ramsay and Silverman, 2005; Reiss et al., 2010; Morris, 2015) and encompasses many models of interest in image analysis. Let , , , . Lastly, to describe sampling procedures we define to be the number of locations in the image.

3.2 PBJ: parametric covariance functions

Without weights, the basic PBJ model assumes


for all

; i.e. every subject has the same image covariance function. This can be generalized to heteroskedastic variances by using weighted regression (with subject specific weights) where the weights are, ideally, proportional to the inverse of the standard deviation,

. Inference using the weighted procedure is conditional on the weights, which are usually estimates (or approximations) of the standard deviation of . This approach is akin to similar multilevel approaches used in neuroimaging software (Woolrich et al., 2009; Penny and Holmes, 2007). If the weights are incorrectly specified, then the standard error estimates are asymptotically biased and the FWER may not be controlled at the nominal level. For simplicity, we describe the unweighted version of the PBJ procedure here. In Section 3.3 we describe the robust covariance function estimator used by sPBJ that allows subject specific weights.

Under assumption (3) we can compute the -statistic image


where is the F-statistic image for the test of the hypothesis


denotes the inverse cumulative distribution function (CDF) of an F-distribution on

and degrees of freedom, and is the CDF of a statistic (Vandekar et al., 2017). The transformation in (4) makes asymptotically a diagonal Wishart process (Vandekar et al., 2017),


under the null (5), where . The transformation on the right hand side of (4) improves the finite sample performance of the following procedure, which samples observations from a diagonal Wishart process, conditional on an estimate of .

Procedure 3.1.
  1. Regress the observed image vector onto as in model (2) and obtain the residual image

  2. Let be after the rows are standardized to be norm 1.

  3. For sample standard normal random variables, and compute the -statistic image

The maximum cluster size of the bootstrap images are then used to compute the cluster -values (1).

3.3 sPBJ: sandwich covariance functions

Here, we utilize an estimating equations approach to obtain parameter estimates and robust standard errors. Standard error estimates using estimating equations are robust to misspecification of the covariance function which can occur when the subject variance function differs across . We use the following estimating equation


where is a weight for subject that is an estimator for for . does not need to be an estimator of for the sPBJ procedure to be asymptotically valid, however the best choices are estimators that are nearly unbiased for and have small variance. Because our outcome data are stochastic processes indexed by , (7) is also a stochastic process.

We define as the solution to (7), which is equivalent to the least squares estimate for at each location in the image using the weight function . is unbiased in finite samples, if the mean function is correctly specified, and has covariance function equal to




and we assume both limits (9) exist. We define (9) as limits because the underlying observations are not identically distributed due to having different subject-level variances. The covariance (8) is derived by performing a Taylor expansion of the estimating equation (7) centered at the true value of the parameter and then computing their asymptotic covariances (see e.g. Boos and Stefanski, 2013, p. 300). The following theorem gives a useful way to compute an estimate of the sandwich covariance function for the parameter of interest. We use a modification (by Long and Ervin (2000)) of the covariance estimate of MacKinnon and White (1985), which has shown to be best at achieving nominal type 1 error rates in small samples over several heteroskedasticity-consistent estimators (Long and Ervin, 2000).

Theorem 3.1.

The covariance function for the parameter of interest is

where and are as defined in Supplementary Section S1.1.

The estimator


is consistent for , where all objects are as defined in Supplementary Section S1.1.

The proof of Theorem 3.1 is given in Supplementary Section S1.3.

The Wald statistic variable, , for the test of the hypothesis image


where is fixed, is asymptotically chi-squared (under the null),


by Theorem 8.3 of Boos and Stefanski (2013, p. 345).

Theorem 3.2.

When the statistical image from (12) for the test of (11) is asymptotically a diagonal Wishart process,



The proof of Theorem 3.2 is also given in the Appendix. When the joint distribution of is challenging to sample from, so that case is beyond the scope of this work.

The following procedure describes how to sample from (13), conditional on the estimate (10).

Procedure 3.2.

Let all objects be as defined in Section S1.1. In order to generate samples , for from a :

  1. Perform the weighted regression of onto using weights to obtain the weighted residuals .

  2. Divide the residual vector elementwise by the diagonal of to obtain .

  3. Construct the intermediate matrix

    and let , be after the rows are standardized to be norm 1.

  4. For sample independent standard normal random variables, , and compute the asymptotically normal test statistic vector

Recall that Procedures 3.1 and 3.2

are used to estimate the probability distribution necessary to use Procedure


4 Fractal -back fMRI data description

We use the -back task fMRI data from the PNC to generate realistic data for the simulation analyses. The fractal -back task is a working memory task where complex geometric fractals were presented to the subject for 500 ms, followed by a fixed interstimulus interval of 2500 ms (Ragland et al., 2002). For each subject, approximately 11.5 minutes of -back task fMRI were acquired as BOLD-weighted timeseries (231 volumes; TR = 3000ms; TE = 32ms; FoV = 192x192mm; resolution 3mm isotropic); detailed scanning parameters are discussed in Satterthwaite et al. (2014, 2013). Images were presented under three conditions: In the 0-back condition, participants responded with a button press to a specified target. For the 1-back condition, participants responded if the current fractal was identical to the previous one; in the 2-back condition, participants responded if the current fractal was identical to the item presented two trials previously (Satterthwaite et al., 2014). We include in our analyses a subset of 1,100 subjects who satisfied image quality and clinical exclusionary criteria from the total of 1,601 scanned as part of the PNC.

We apply a series of standard image preprocessing steps: distortion-correction using FSL’s FUGUE, time-series preprocessing, rigid registration, brain extraction, temporal filtering, and spatial smoothing (Smith and Brady, 1997; Jenkinson et al., 2012). subject-level models are fit using a linear model in FSL’s FILM software including convolved task boxcar functions for each level of the task and 24 motion covariates. The 2-0 back contrast of these estimates is used as the subject-level outcome for all analyses presented in this paper. The subject-level 2-0 back contrast’s variance is also utilized as a voxel level weight in Section 6.

The result of the subject-level preprocessing is a parameter estimate image for each subject, , and a variance estimate for each subject’s parameter estimate image. The parameter estimate images are then submitted to a second level analysis where the parameter estimates are regressed onto covariates (e.g. performance, age, and motion) in order to understand the association between task activation and performance at each voxel across subjects.

5 Simulation Analysis

5.1 Methods

In order to generate nonexchangeable data with heteroskedastic covariance functions we first residualize the -back neuroimaging data set to a set of independent variables that have known associations with the imaging variable, including age, sex, in-scanner-motion (mot), and task performance (; Macmillan, 2002), using the unpenalized spline model


where is an indicator for males, and , , and are fit using thin plate splines with 10 knots (Wood and Augustin, 2002; Wood, 2015). Fixed degree spline bases were used to model continuous variables to ensure that the mean of the residuals was completely unassociated with the covariates.

As a result of fitting this model, the residualized images have little to no mean association with the independent variables. However, the residualized images do not remove the effect the independent variables can have on the covariance structure of the imaging data. By drawing subsamples of subjects from the residualized images together with the independent variables we generate data that constitutes the global null hypothesis because the covariates are unassociated with the mean, however these covariates can still have an effect on the covariance function of the data. To demonstrate how the covariates affect the covariance function of the residualized data we create covariance matrices using nodes defined in the Power atlas for the 100 lowest and 100 highest valued subjects for motion and (Figure 1; Power et al., 2011). Fox example, high motion is associated with stronger interhemispheric and anterior-posterior correlations than low motion (Figure 1).

Figure 1: Correlation weights for the 100 most extreme observations of motion and . High motion and low are each associated with stronger inter-hemisphere and anterior-posterior correlations. Sphere color indicates network membership. The networks are thresholded for visualization.

In each of 1,000 simulations we draw a bootstrap sample of size and fit the model


where are the residuals from model (14). We then perform the test of for using SEI of a -statistical map with two CFTs (; ).

We compare the PBJ and sPBJ (using 500 bootstraps) procedures to a standard GRF-based method (easythresh; Jenkinson et al., 2012) and to the permutation procedure (using 500 permutations; Winkler et al., 2014). For the simulation study we use the PBJ and sPBJ with uniform weights (PBJ(1); sPBJ(1)) and with the inverse of motion as the weight (PBJ(mot); sPBJ(mot)), which down-weights subjects with high motion. The permutation procedure is fit using a single group variance estimate (Perm) as well as a two group variance (Perm Grp), which attempts to weaken the assumption of exchangeability by allowing for group differences in variance at each location (Winkler et al., 2014)

. The groups were determined by the upper and lower quantiles of motion. The SEI FWER is estimated by computing the proportion of simulations where a cluster is determined significant at two rejection thresholds

and .

We use a novel procedure to assess power: for each of 1,000 simulations we randomly select 4 voxels within the brain and create an image of spheres, , with radii of 8, 10, 11, and 12 voxels centered at each of the 4 voxels. We set when is inside a sphere (and inside the brain) and 0 otherwise. represents the locations in the image where there is a true effect. Because the voxels are randomly selected the spheres could overlap or be cropped depending on the proximity to the edge of the brain, thus the number and size of clusters is random across simulations. For locations where , the variable is first orthogonalized to the other covariates and then we set

where is as defined in (15) and is the variance of the orthogonalized variable. This approach ensures that the voxel level effect size of is approximately equal to 0.4. Power curves as a function of cluster size are estimated for each sample size using shape constrained additive models (SCAMs) using the scam package in R (Pya and Wood, 2015). SCAMs were used to ensure the estimated power curves were monotonically increasing. Power was only assessed for procedures that had nearly nominal type 1 error rate.

5.2 Results

Simulations under the global null (5) were used to assess FWER for each of the SEI procedures. For the test of the motion variable all methods have inflated FWER for smaller sample sizes (), except the PBJ and sPBJ procedures with motion dewighting (Figure 2). Because the sPBJ procedure without weights is still asymptotically valid the FWER falls to the nominal level with increasing sample size. The permutation procedures and the PBJ procedure without uniform weights actually increase FWER with increasing sample size. The GRF-based method always has a FWER above 0.9. This pattern holds across both CFTs (0.01, 0.005) and both nominal FWER (0.01, 0.05).

Our novel null simulation approach allows us to assess how the heteroskedasticity of real variables of interest affect the FWER for each procedure. For the test of most procedures controlled the FWER at the nominal level (Figure 2). The FWER for the test of using the PBJ procedure with motion deweighting increases above 0.1 with both CFTs when the desired level is 0.05. The GRF-based procedure maintains a FWER above 0.4.

Figure 2: FWER results for each of the SEI procedures. The FWER is inflated for the motion covariate except for the PBJ and sPBJ procedure with motion deweighting. sPBJ methods approach the nominal FWER as the sample size increases. FWERs for are near the nominal level for all methods except the GRF procedure. Dashed lines are 0.05 for Exp FWER0.01 and 0.1 for Exp FWER0.05. CFTcluster forming threshold; Nom FWERnominal FWER; motmotion.

Simulations where known signal is added to the image are used to assess the power of each method. The PBJ and sPBJ with motion deweighting have comparable power, whereas all methods without motion deweighting have similar, and lower, power (Figure 3

). Oddly, the permutation procedure with group variances has a detrimental effect on the power of the procedure. The results demonstrate that the capability of the PBJ and sPBJ procedures to down-weight observations with high motion improves the power to detect smaller clusters of activation in small sample sizes (Figure


Figure 3: Power results for the SEI procedures that had nearly FWER control in the null simulations. Each plot shows the power versus cluster size for cluster forming thresholds (CFTs) of 0.01 and 0.005. The PBJ and sPBJ with motion deweighting have the largest power across sample sizes. All methods without motion deweighting have lower power and the permutation method with group variance estimators has lowest power.

6 N-back data analysis

We next investigate the effect of task performance on 2-0 back activation. We model the 2-0 back subject-level parameter estimates using model (15). We use the inverse of the subjects’ estimated variance images as weights in the estimating equation (7) for the PBJ and sPBJ procedures (using 5,000 bootstraps). We compare the results using the (s)PBJ procedures to the permutation procedure without weights (using 5,000 permutations) with a CFT of 0.005.

The PBJ and sPBJ procedures have a larger rejection volume and a more extreme average statistical value in the rejected regions (Table 1). The permutation and sPBJ procedures identify one large contiguous cluster and one small cluster of deactivation. The PBJ fails to detect the deactivated cluster, which is marginally significant. Statistical maps for the PBJ and sPBJ procedure are similar as they both deweight high variance subjects.

Procedure Rej Vol (voxels) Mean
PBJ 31,419 3.80
sPBJ 32,720 3.79
Perm 24,893 3.61
Table 1: Rejection volume and mean -statistic in rejected voxels in the -back data for each SEI procedure.

Task performance is positively associated with activation in the thalamus, dorsolateral prefrontal cortex (DLPFC), cingulate, and precuneus (Figure 4). Performance was also associated with diffuse white matter activation that may connect functional gray matter regions or be related to reaction time differences correlated with task performance (Yarkoni et al., 2009; Mazerolle et al., 2010; Ding et al., 2018). Task performance is negatively associated with activation in the ventromedial prefrontal cortex.

Figure 4: Significant results using the three SEI procedures that controlled the FWER for the effect of in Section 5 for a CFT of 0.005 and a spatial extent -value less than 0.05. The color for each map indicates clusters that are positively activated with increasing task performance and the indicates clusters that are deactivated with increasing task performance.

7 Discussion

The sPBJ procedure is a robust multileveling approach because it produces consistent covariance function estimators even when the covariance model is misspecified. When using the sPBJ, investigators can use arbitrary estimates of noise as weights to improve signal and still be confident that the FWER will be near nominal level in large samples. The permutation, GRF, and PBJ procedures that do not have consistent standard error estimates can have FWERs that grow as the sample size increases. Given that the sPBJ has closer to nominal FWER than the PBJ and permutation procedures, while maintaining equal or superior power, it is an appealing option for SEI in small and large samples.

Our findings support those presented in other papers (Eklund et al., 2016; Silver et al., 2011; Greve and Fischl, 2018): GRF-based methods do not have robust FWER control across a range of parameter settings, regardless of whether heteroskedasticity is present. The only variable that produces enough heteroskedasticity to affect the permutation procedure is the motion covariate; the permutation procedure performs well with , which is a typical covariate of interest. Thus, a model that assumes subject exchangeability is appropriate for the covariate we consider here, albeit with a slight loss in power. It is possible that covariates that are highly correlated with motion (e.g. ADHD diagnosis) may have FWERs that are far from the nominal level using the permutation procedure.

The bootstrap approach to approximating the probability (1

) is quite general and need not be used with the two models presented here. So long as the joint distribution of the test statistics is asymptotically normal or chi-squared and a square root of the covariance matrix can be computed, then the PBJ procedure can be used to sample from the joint distribution of the test statistics. This suggests the method can be used to study patterns generated by machine learning methods

(see e.g. Gaonkar and Davatzikos, 2013).

Here, we perform analyses and simulations of task fMRI data, which is one of the most common multilevel neuroimaging models. However, the PBJ procedure is appropriate for other types of imaging modalities as subject-level weighting can be used for models that are not typically considered multilevel models. For example, estimates of voxel level variance can be used to deweight noisy cerebral blood flow images computed from arterial spin labeled MRI or measurements of T1 image quality can be used to deweight images affected with motion artifact (Rosen et al., 2018). The sPBJ procedure can also be used for meta-analyses of publicly available statistical maps (e.g. Gorgolewski et al., 2015; Maumet et al., 2016). Typically, variance maps are needed to accurately fit such multilevel models, but variance maps are seldom made publicly available. An advantage of the sPBJ procedure for meta-analysis is that it does not require variance maps for valid statistical inference, so weights could be determined from other reported variables, such as the sample size.

There are two apparent limitations of the approach: First, the sPBJ procedure relies on consistency, which only provides guarantees for FWER control as the sample size gets large. Our simulations demonstrate that the procedure has nominal FWER control in relatively small samples (e.g.

), however this convergence is likely slower in very heavily skewed datasets. Second, the distribution of the statistical map (

13) is unknown when , so the procedure is not able to perform tests of multiple parameters at each location. Future work is necessary to derive the distribution of the statistical map in the case when and develop efficient methods to sample from the joint distribution of the statistical image. Despite these limitations, the sPBJ procedure is a robust multilevel method for SEI that can be used to further our understanding of brain-behavior associations.


We used BrainNet Viewer, ITK-SNAP, and Paraview to produce the figures displayed in this paper (Xia et al., 2013; Yushkevich et al., 2016; Ayachit, 2015; Madan, 2015).


RC2 grants from the National Institute of Mental Health MH089983 and MH089924. National Institutes of Health grants: (5P30CA068485-22 to S.N.V.); (R01MH107235 to R.C.G., R.E.G., R.T.S.), (R01MH107703 to T.D.S., R.T.S.), (R01MH112847 to TDS, KR, and RTS) and (R01NS085211 to R.T.S.).


  • Ayachit (2015) Ayachit, U. (2015). The ParaView Guide: A Parallel Visualization Application. Kitware, Inc., USA.
  • Benjamini and Heller (2007) Benjamini, Y. and Heller, R. (2007). False Discovery Rates for Spatial Signals. Journal of the American Statistical Association 102, 1272–1281.
  • Benjamini and Yekutieli (2001) Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of statistics pages 1165–1188.
  • Boos and Stefanski (2013) Boos, D. D. and Stefanski, L. A. (2013). Essential Statistical Inference: Theory and Methods. Springer Texts in Statistics. Springer-Verlag, New York.
  • Ding et al. (2018) Ding, Z., Huang, Y., Bailey, S. K., Gao, Y., Cutting, L. E., Rogers, B. P., Newton, A. T., and Gore, J. C. (2018). Detection of synchronous brain activity in white matter tracts at rest and under functional loading. Proceedings of the National Academy of Sciences 115, 595–600.
  • Eklund et al. (2018) Eklund, A., Knutsson, H., and Nichols, T. E. (2018). Cluster Failure Revisited: Impact of First Level Design and Data Quality on Cluster False Positive Rates. arXiv:1804.03185 [stat] arXiv: 1804.03185.
  • Eklund et al. (2016) Eklund, A., Nichols, T. E., and Knutsson, H. (2016). Cluster failure: why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences page 201602413.
  • Flandin and Friston (2016) Flandin, G. and Friston, K. J. (2016). Analysis of family-wise error rates in statistical parametric mapping using random field theory. Human brain mapping .
  • Friston et al. (1994) Friston, K. J., Worsley, K. J., Frackowiak, R. S., Mazziotta, J. C., and Evans, A. C. (1994). Assessing the significance of focal activations using their spatial extent. Human brain mapping 1, 210–220.
  • Gaonkar and Davatzikos (2013) Gaonkar, B. and Davatzikos, C. (2013).

    Analytic estimation of statistical significance maps for support vector machine based multi-variate image analysis and classification.

    NeuroImage 78, 270–283.
  • Gorgolewski et al. (2015) Gorgolewski, K. J., Varoquaux, G., Rivera, G., Schwarz, Y., Ghosh, S. S., Maumet, C., Sochat, V. V., Nichols, T. E., Poldrack, R. A., Poline, J.-B., Yarkoni, T., and Margulies, D. S. (2015). a web-based repository for collecting and sharing unthresholded statistical maps of the human brain. Frontiers in Neuroinformatics 9,.
  • Greve and Fischl (2018) Greve, D. N. and Fischl, B. (2018). False positive rates in surface-based anatomical analysis. NeuroImage 171, 6–14.
  • Jenkinson et al. (2012) Jenkinson, M., Beckmann, C. F., Behrens, T. E., Woolrich, M. W., and Smith, S. M. (2012). Fsl. Neuroimage 62, 782–790.
  • Kessler et al. (2017) Kessler, D., Angstadt, M., and Sripada, C. S. (2017). Reevaluating “cluster failure” in fMRI using nonparametric control of the false discovery rate. Proceedings of the National Academy of Sciences 114, E3372–E3373.
  • Long and Ervin (2000) Long, J. S. and Ervin, L. H. (2000).

    Using heteroscedasticity consistent standard errors in the linear regression model.

    The American Statistician 54, 217–224.
  • MacKinnon and White (1985) MacKinnon, J. G. and White, H. (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of econometrics 29, 305–325.
  • Macmillan (2002) Macmillan, N. A. (2002). Signal detection theory. Stevens’ handbook of experimental psychology 4, 43–90.
  • Madan (2015) Madan, C. R. (2015). Creating 3d visualizations of MRI data: A brief guide. F1000Research 4,.
  • Maumet et al. (2016) Maumet, C., Auer, T., Bowring, A., Chen, G., Das, S., Flandin, G., Ghosh, S., Glatard, T., Gorgolewski, K. J., Helmer, K. G., Jenkinson, M., Keator, D. B., Nichols, B. N., Poline, J.-B., Reynolds, R., Sochat, V., Turner, J., and Nichols, T. E. (2016). Sharing brain mapping statistical results with the neuroimaging data model. Scientific Data 3,.
  • Mazerolle et al. (2010) Mazerolle, E. L., Beyea, S. D., Gawryluk, J. R., Brewer, K. D., Bowen, C. V., and D’arcy, R. C. (2010). Confirming white matter fMRI activation in the corpus callosum: co-localization with DTI tractography. Neuroimage 50, 616–621.
  • Morris (2015) Morris, J. S. (2015). Functional regression. Annual Review of Statistics and Its Application 2, 321–359.
  • Mueller et al. (2017) Mueller, K., Lepsien, J., Möller, H. E., and Lohmann, G. (2017). Commentary: Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Frontiers in Human Neuroscience 11,.
  • Muschelli et al. (2018) Muschelli, J., Gherman, A., Fortin, J.-P., Avants, B., Whitcher, B., Clayden, J. D., Caffo, B. S., and Crainiceanu, C. M. (2018). Neuroconductor: an R platform for medical imaging analysis. Biostatistics .
  • Pacifico et al. (2004) Pacifico, M. P., Genovese, C., Verdinelli, I., and Wasserman, L. (2004). False Discovery Control for Random Fields. Journal of the American Statistical Association 99, 1002–1014.
  • Penny and Holmes (2007) Penny, W. and Holmes, A. P. (2007). Random effects analysis. Statistical parametric mapping: The analysis of functional brain images pages 156–165.
  • Power et al. (2011) Power, J. D., Cohen, A. L., Nelson, S. M., Wig, G. S., Barnes, K. A., Church, J. A., Vogel, A. C., Laumann, T. O., Miezin, F. M., Schlaggar, B. L., and Petersen, S. E. (2011). Functional network organization of the human brain. Neuron 72, 665–678.
  • Power et al. (2010) Power, J. D., Fair, D. A., Schlaggar, B. L., and Petersen, S. E. (2010). The Development of Human Functional Brain Networks. Neuron 67, 735–748.
  • Pya and Wood (2015) Pya, N. and Wood, S. N. (2015). Shape constrained additive models. Statistics and Computing 25, 543–559.
  • Ragland et al. (2002) Ragland, J. D., Turetsky, B. I., Gur, R. C., Gunning-Dixon, F., Turner, T., Schroeder, L., Chan, R., and Gur, R. E. (2002). Working Memory for Complex Figures: An fMRI Comparison of Letter and Fractal n-Back Tasks. Neuropsychology 16, 370–379.
  • Ramsay and Silverman (2005) Ramsay, J. and Silverman, B. W. (2005). Functional Data Analysis. Springer Series in Statistics. Springer-Verlag, New York, 2 edition.
  • Reiss et al. (2010) Reiss, P. T., Huang, L., and Mennes, M. (2010). Fast function-on-scalar regression with penalized basis expansions. The international journal of biostatistics 6,.
  • Romano et al. (2008) Romano, J. P., Shaikh, A. M., and Wolf, M. (2008). Control of the false discovery rate under dependence using the bootstrap and subsampling. Test 17, 417.
  • Rosen et al. (2018) Rosen, A. F. G., Roalf, D. R., Ruparel, K., Blake, J., Seelaus, K., Villa, L. P., Ciric, R., Cook, P. A., Davatzikos, C., Elliott, M. A., Garcia de La Garza, A., Gennatas, E. D., Quarmley, M., Schmitt, J. E., Shinohara, R. T., Tisdall, M. D., Craddock, R. C., Gur, R. E., Gur, R. C., and Satterthwaite, T. D. (2018). Quantitative assessment of structural image quality. NeuroImage 169, 407–418.
  • Satterthwaite et al. (2016) Satterthwaite, T. D., Connolly, J. J., Ruparel, K., Calkins, M. E., Jackson, C., Elliott, M. A., Roalf, D. R., Hopson, R., Prabhakaran, K., Behr, M., Qiu, H., Mentch, F. D., Chiavacci, R., Sleiman, P. M. A., Gur, R. C., Hakonarson, H., and Gur, R. E. (2016). The Philadelphia Neurodevelopmental Cohort: A publicly available resource for the study of normal and abnormal brain development in youth. NeuroImage 124, 1115–1119.
  • Satterthwaite et al. (2014) Satterthwaite, T. D., Elliott, M. A., Ruparel, K., Loughead, J., Prabhakaran, K., Calkins, M. E., Hopson, R., Jackson, C., Keefe, J., Riley, M., Mentch, F. D., Sleiman, P., Verma, R., Davatzikos, C., Hakonarson, H., Gur, R. C., and Gur, R. E. (2014). Neuroimaging of the Philadelphia neurodevelopmental cohort. NeuroImage 86, 544–553.
  • Satterthwaite et al. (2013) Satterthwaite, T. D., Wolf, D. H., Erus, G., Ruparel, K., Elliott, M. A., Gennatas, E. D., Hopson, R., Jackson, C., Prabhakaran, K., Bilker, W. B., Calkins, M. E., Loughead, J., Smith, A., Roalf, D. R., Hakonarson, H., Verma, R., Davatzikos, C., Gur, R. C., and Gur, R. E. (2013). Functional Maturation of the Executive System during Adolescence. The Journal of Neuroscience 33, 16249–16261.
  • Satterthwaite et al. (2012) Satterthwaite, T. D., Wolf, D. H., Loughead, J., Ruparel, K., Elliott, M. A., Hakonarson, H., Gur, R. C., and Gur, R. E. (2012). Impact of in-scanner head motion on multiple measures of functional connectivity: relevance for studies of neurodevelopment in youth. Neuroimage 60, 623–632.
  • Silver et al. (2011) Silver, M., Montana, G., Nichols, T. E., and Initiative, A. D. N. (2011). False positives in neuroimaging genetics using voxel-based morphometry data. Neuroimage 54, 992–1000.
  • Slotnick (2017) Slotnick, S. D. (2017). Cluster success: fMRI inferences for spatial extent have acceptable false-positive rates. Cognitive Neuroscience 8, 150–155.
  • Smith and Brady (1997) Smith, S. M. and Brady, J. M. (1997). SUSAN—a new approach to low level image processing.

    International journal of computer vision

    23, 45–78.
  • Sun et al. (2015) Sun, W., Reich, B. J., Tony Cai, T., Guindani, M., and Schwartzman, A. (2015). False discovery control in large-scale spatial multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77, 59–83.
  • Van der Vaart (2000) Van der Vaart, A. W. (2000). Asymptotic statistics, volume 3. Cambridge university press.
  • Vandekar et al. (2017) Vandekar, S. N., Satterthwaite, T. D., Rosen, A., Ciric, R., Roalf, D. R., Ruparel, K., Gur, R. C., Gur, R. E., and Shinohara, R. T. (2017). Faster family-wise error control for neuroimaging with a parametric bootstrap. Biostatistics .
  • Winkler et al. (2014) Winkler, A. M., Ridgway, G. R., Webster, M. A., Smith, S. M., and Nichols, T. E. (2014). Permutation inference for the general linear model. Neuroimage 92, 381–397.
  • Wood (2015) Wood, S. (2015). Package ‘mgcv’. R package version 1, 29.
  • Wood and Augustin (2002) Wood, S. N. and Augustin, N. H. (2002). GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. Ecological modelling 157, 157–177.
  • Woodward and Heckers (2016) Woodward, N. D. and Heckers, S. (2016). Mapping Thalamocortical Functional Connectivity in Chronic and Early Stages of Psychotic Disorders. Biological Psychiatry 79, 1016–1025.
  • Woolrich et al. (2009) Woolrich, M. W., Jbabdi, S., Patenaude, B., Chappell, M., Makni, S., Behrens, T., Beckmann, C., Jenkinson, M., and Smith, S. M. (2009). Bayesian analysis of neuroimaging data in FSL. Neuroimage 45, S173–S186.
  • Worsley et al. (1999) Worsley, K. J., Andermann, M., Koulis, T., MacDonald, D., and Evans, A. C. (1999). Detecting changes in nonisotropic images. Human brain mapping 8, 98–101.
  • Xia et al. (2013) Xia, M., Wang, J., and He, Y. (2013). BrainNet Viewer: A Network Visualization Tool for Human Brain Connectomics. PLOS One 8,.
  • Yarkoni et al. (2009) Yarkoni, T., Barch, D. M., Gray, J. R., Conturo, T. E., and Braver, T. S. (2009). BOLD Correlates of Trial-by-Trial Reaction Time Variability in Gray and White Matter: A Multi-Study fMRI Analysis. PLOS ONE 4, e4257.
  • Yekutieli and Benjamini (1999) Yekutieli, D. and Benjamini, Y. (1999). Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. Journal of Statistical Planning and Inference 82, 171–196.
  • Yushkevich et al. (2016) Yushkevich, P. A., Gao, Y., and Gerig, G. (2016). ITK-SNAP: An interactive tool for semi-automatic segmentation of multi-modality biomedical images. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 3342–3345.

S1 Supplementary Materials

s1.1 Notation

The following objects are defined for Theorem 3.1.

The following values are defined for the estimator (10) of Theorem 3.1.


and is similarly defined to . The use of in (S2) corresponds to the heteroskedasticity-consistent estimator suggested by Long and Ervin (2000).

s1.2 Assumptions for Theorems 3.1 and 3.2

The following are assumptions to ensure the covergence of the estimator (10) in Theorem 3.1.


We use Theorem 18.14 of Van der Vaart (2000) to prove weak convergence of which requires an additional assumption: for all there exists a partition of into finitely many sets , such that


s1.3 Proofs of Theorems 3.1 and 3.2


Proof of Theorem 3.1

Finding the covariance of and for involves extracting the bottom right block matrix of


Because is unwieldy the derivation is messy, but not intellectually stimulating and so is not shown here.

The estimator (10) can be rewritten


and is given in (S1).

It follows that if




then is a consistent estimator of . Equations (S6

) hold by the weak law of large numbers given assumptions (

S3). Note, because is a function of we must use a weak law that allows the averages in (S7) to depend on parameter estimates such as Theorem 7.3 of (Boos and Stefanski, 2013, p. 329). The assumptions (S3) are satisfactory to use this theorem as the function

in a neighborhood of the true parameter value and . ∎

Proof of Theorem 3.2.

Let be fixed for . Note that for . The convergence of follows because our estimating equation (7) satisfies a set of regularity conditions (see e.g. Boos and Stefanski, 2013). The additional assumptions (9) and (S3) are required to so that and do not need to be identically distributed. These assumptions are satisfactory for the convergence of the covariance

Thus, the continuous mapping theorem implies

where . follows a diagonal Wishart distribution by definition. The weak convergence of the stochastic process follows by (S4) and Theorem 18.14 of (Van der Vaart, 2000).