1 Introduction
Chen et al. (2018) discuss an important topic which is often neglected in the neuroimaging field, the use of onesided or twosided tests and the lack of multiple comparison correction for two onesided tests. As mentioned in their paper, in our work on massive empirical evaluation of task fMRI inference methods with resting state fMRI (Eklund et al., 2016) we used onesided tests (familywise error rate ). We made this choice for two reasons. The first reason was simply that for analyses of randomly created groups of healthy controls, it should make no difference if one uses a onesided or a twosided test. The second reason was more practical. FSL and SPM both run onesided tests by default, and we wished to reflect the typical (if illadvised) practices of the community. Furthermore, to perform a twosided permutation test (Winkler et al., 2014), it would be necessary to run two permutation tests per group analysis (which would double the processing time), since normally only the maximum test value over the brain (or the largest cluster) is saved for every permutation (to form the maximum null distribution).
2 Methods
To investigate if performing a twosided test (as implemented by two tests at ) lead to different false positive rates compared to a single onesided test (at = 0.05), we performed new group analyses for a subset of all the parameter settings used in our previous work (Eklund et al., 2016, 2018). Specifically, we only performed twosample ttests for the Beijing data (Biswal et al., 2010), using 40 subjects (i.e. 20 subjects per group) and a cluster defining threshold of p = 0.001. All group analyses were performed for 4 mm, 6 mm, 8 mm and 10 mm FWHM of smoothing. See our recent work (Eklund et al., 2018) for a description of the six designs (B1, B2, E1, E2, E3, E4) applied to every subject in the first level analysis.
For FSL, group analyses were only performed using FSL OLS, and not using FLAME1 (which is the default option); FLAME1 leads to conservative results if resting state fMRI data is used, while null task fMRI analyses (controlcontrol) with FLAME1 gives FWE rates comparable to FSL OLS (Eklund et al., 2016). For AFNI, we used the new ACF (autocorrelation function) option in 3dClustSim (Cox et al., 2017), which uses a longtail spatial ACF instead of a Gaussian one. It should be noted that AFNI provides another function for cluster thresholding, ETAC (equitable thresholding and clustering) (Cox, 2018), which may perform better than the longtail ACF function used here, but we used the ACF approach to be able to compare the twosided results to our recent work (Eklund et al., 2018). Contrary to Chen et al. (2018), we did not change the cluster defining threshold to p = 0.0005 when performing two onesided tests (for SPM, FSL or AFNI), as this represents yet another change in the inference configuration that we rather leave fixed to facilitate the comparison of these results to previous onesided findings.
3 Results
Figure 1
shows estimated familywise error rates for onesided and twosided tests, where both should exhibit a nominal 5% familywise false positive rate. The nonparametric permutation test produces similar results in both cases, while the parametric methods perform worse for twosided tests.
4 Discussion
We have extended our original work on cluster false positive rates (Eklund et al., 2016, 2018) to twosided tests, showing that parametric methods perform worse for twosided tests. RFT pvalues depend on a number of approximations:

Joint normality over the image,

Sufficient smoothness for lattice images to behave like continuous processes,

Homogeneous smoothness (stationarity), so that the null distribution of cluster size does not vary over space,

Spatial dependence mostly local, i.e. the spatial autocorrelation function is proportional to a Gaussian density, and

Sufficiently high clusterforming threshold so that the approximate distribution for cluster size is accurate.
On this last assumption, the control of FWE depends on the accuracy of the cluster size distribution in its tail. For example, it is of little consequence if the true cluster size FWE pvalue is 0.6 and RFT estimates it as 0.5; in contrast, twosided inference demands accuracy in the RFT approximation down to FWE 0.025, and then any inaccuracies are doubled as both positive and negative excursions are considered. In our findings, it appears that modest inaccuracies in the null cluster size distribution corresponding to FWE 0.05 (see Figure 1 (a), and general tendency to over estimate FWE) grow into larger inaccuracies when the more stringent FWE level 0.025 is used (the inference used twice for each result contributing to Figure 1 (b)).
In contrast, the nonparametric permutation test for a twosample ttest is only based on the assumption of exchangeability between subjects, and therefore performs equally well for two onesided tests at = 0.025.
Acknowledgements
The authors have no conflict of interest to declare. This study was supported by Swedish research council grants 20135229 and 201704889. Funding was also provided by the Center for Industrial Information Technology (CENIIT) at Linköping University, and the Knut and Alice Wallenberg foundation project ”Seeing organ function”. Thomas E. Nichols was supported by the Wellcome Trust (100309/Z/12/Z) and the NIH (R01 EB015611). The Nvidia Corporation, who donated the Nvidia Quadro P6000 graphics card used to run all permutation tests, is also acknowledged. This study would not be possible without the recent datasharing initiatives in the neuroimaging field. We therefore thank the Neuroimaging Informatics Tools and Resources Clearinghouse and all of the researchers who have contributed with restingstate data to the 1,000 Functional Connectomes Project.
References
 Biswal et al. (2010) Biswal, B., Mennes, M., …, X. Z., & Milham, M. (2010). Toward discovery science of human brain function. PNAS, 107, 4734–4739.
 Chen et al. (2018) Chen, G., Cox, R. W., Glen, D. R., Rajendra, J. K., Reynolds, R. C., & Taylor, P. A. (2018). A tail of two sides: Artificially doubled false positive rates in neuroimaging due to the sidedness choice with ttests. Human Brain Mapping, .
 Cox et al. (2017) Cox, R., Chen, G., Glen, D., Reynolds, R., & Taylor, P. (2017). FMRI Clustering in AFNI: FalsePositive Rates Redux. Brain Connectivity, 7, 152–171.
 Cox (2018) Cox, R. W. (2018). Equitable thresholding and clustering. bioRxiv, 10.1101/295931, .
 Eklund et al. (2018) Eklund, A., Knutsson, H., & Nichols, T. (2018). Cluster failure revisited: impact of first level design and physiological noise on cluster false positive rates. Human Brain Mapping, .
 Eklund et al. (2016) Eklund, A., Nichols, T., & Knutsson, H. (2016). Cluster failure: why fMRI inferences for spatial extent have inflated false positive rates. PNAS, 113, 7900–7905.
 Winkler et al. (2014) Winkler, A., Ridgway, G., Webster, M., Smith, S., & Nichols, T. (2014). Permutation inference for the general linear model. NeuroImage, 92, 381–397.
Comments
There are no comments yet.