1 Introduction
With the increasing availability of new and complex forms of data, there is a corresponding need for new ways to assess measurement reliability. This article aims to help meet this need by reformulating the intraclass correlation coefficient (ICC), a standard index of reliability, in terms of distances between observations.
We begin by defining the ICC as developed in classical test theory (Lord and Novick, 1968; Fleiss, 1986; Mair, 2018), which views a measured scalar quantity as the sum of an underlying true score and an error term . Suppose we have a sample of individuals with true realvalued scores
drawn from a population with variance
; and that for each in , the th individual is measured times, yielding observations(1) 
, where the ’s are drawn from a distribution with mean 0 and variance , independently of each other and of the ’s. Then for distinct , the correlation between the th and th observations for individual is easily shown to be
(2) 
This quantity is the classical ICC.
Reliability measures for more complex settings include replacing model (1) with the generalizability theory model of Cranford et al. (2006), as well as generalizations of (2) to multivariate data (Alonso et al., 2010)
, including highdimensional data
(Shou et al., 2013). All of these extensions assume a model that is more complex than (1), but still of an additive (signal plus noise) form. However, for complex objects that are measured or estimated in modern biomedical research, such as motion patterns or brain networks, such an additive representation is typically inapplicable. There is thus a need for a new reliability index appropriate for general data objects.
Our work was motivated by the study of functional connectivity in the human brain by means of restingstate functional magnetic resonance imaging (fMRI). Briefly, fMRI produces a time series of brain activity, known as the blood oxygen level dependent (BOLD) signal, at each of a set of regions of interest (ROIs). Restingstate
fMRI means that the participants in the study were not performing any particular task or viewing a stimulus during the brain scan. Functional connectivity refers to association among activity levels in different parts of the brain, and can be measured in many ways
(Yan et al., 2013). One of the most common functional connectivity measures is a simple Pearson correlation matrix of regional BOLD signals. Figure 1 displays two such correlation matrices, along with associated brain graphs, for a set of 80 ROIs to be discussed in Section 4. These particular examples were chosen to illustrate high and low connectivity, according to a metric described in Web Appendix LABEL:supplogR.For such correlation matrices, and the scientific conclusions derived from them, to be trustworthy and reproducible, it is necessary first to be able to assess their reliability. Our proposed methodology offers a means to that end.
Our basic proposal, a reformulation of the ICC based on distances between observations, is outlined in Section 2, and estimation of the resulting reliability index is discussed in Section 3. An application to an fMRI data set is presented in Section 4. In Sections 5–7 we extend the SpearmanBrown formula, a fundamental result in reliability theory, to our distancebased ICC, and revisit our fMRI data set in light of this extension. A concluding discussion appears in Section 8.
2 Distancebased reliability measurement
A novel reliability index applicable to general data objects can be defined by rederiving the ICC (2) in terms of squared distances among observations. Let and be the mean squared differences for measurements between and within individuals, respectively. Then and , and thus the ICC (2) can be reexpressed as
(3) 
The advantage of expression (3) is that, unlike (2), it extends straightforwardly to general data objects (curves, networks, etc.), as long as a distance or dissimilarity between such objects is defined. One simply redefines , in (3) in a more general sense, as the between and withinindividual mean squared distances
(4) 
Henceforth we shall refer to (3), with given by (4), as the distancebased intraclass correlation coefficient, or dbICC.
We note that the same general strategy, of rederiving variancebased formulas in terms of sums of squared distances, has been used previously to formulate distancebased hypothesis tests (McArdle and Anderson, 2001; Mielke and Berry, 2007; Reiss et al., 2010).
A simple example of extending (1) beyond the scalar realvalued case is to let be mutually independent random vectors, with covariance matrices respectively, and let be the Euclidean distance. Then (3) reduces straightforwardly to
(5) 
the multivariate reliability measure referred to as (Alonso et al., 2010), and as I2C2 (Shou et al., 2013) for images viewed as vectors. Thus the dbICC is an extension of these measures to more general distances and data types.
3 Estimating the dbICC
3.1 Point estimation
Like the classical ICC (2), the proposed dbICC (3) can be estimated in practice by plugging in consistent estimates of the population quantities (4), as follows:
(6) 
where
(7) 
(8) 
Figure 2 illustrates this schematically for a distance matrix with rows and columns grouped by individuals: one estimates by averaging the between and withinindividual distances (B and W), respectively.
3.2 Bootstrap confidence intervals
The dbICC is intended for distance functions whose distribution may not be known. It is thus natural to turn to nonparametric bootstrapping as a distributionfree approach to interval estimation for the dbICC. For with suitably large , let be a sample with replacement from ; then the th bootstrap sample consists of for and . The resulting ICC estimate is
(9) 
where are bootstrap analogues of (7), (8):
(10) 
The interval from the to the quantile of the ’s can then be used as a % confidence interval.
These bootstrap estimates , however, suffer from negative bias (over and above the wellknown negative bias of the classical ICC; Atenafu et al., 2012). Returning to the example in Figure 2, consider a bootstrap sample in which individuals 1 and 2 are duplicates, as are individuals 5 and 6 and individuals 7, 8 and 9. Then the blocks shown in the right subfigure in green nominally refer to betweenindividual differences, but in fact represent withinindividual differences. Assuming , counting these entries as betweenindividual will tend to result in underestimation of and hence in negative bias in (9). The diagonal entries of these blocks are zero, thereby compounding the bias. To remove this bias, we can simply exclude such blocks from the summations in (10); formally, we replace each occurrence of with .
3.3 A simulation study
Using multivariate data with Euclidean distance (the example from the end of Section 2), we conducted a simulation study to assess the accuracy of our point and interval estimates of the dbICC. Values were drawn from (1) where and with . By (5), the (population) dbICC is then , which equals 0.2, 0.5 and 0.8 for the above three values of . The number of subjects was set to 10, 40 and 70, and the number of measurements per subject fixed at 4. We took 500 replicates with each combination of the above values of and . Boxplots of the dbICC estimates are displayed in Figure 3. The classical negative bias of ICC estimates (Atenafu et al., 2012) is noticeable for when , but not for the other settings.
Next we considered bootstrap confidence intervals, with , without and with the bias correction of the previous subsection. We performed 500 replicates for each combination of the same and values as above, again with fixed at 4. Boxplots of the median of the 1200 bootstrap estimates within each replicate are presented in Figure 4. For and to some extent for , the correction yields a marked reduction in the observed negative bias. Accordingly, the coverage of 95% confidence intervals is improved by the correction, as can be seen in Table 1. As noted above, however, a smallsample negative bias (unrelated to bootstrapping) occurs for point estimates of dbICC as for the classical ICC, and hence the coverage remains quite poor for .
N  C  N  C  N  C  

86.0  90.8  91.6  93.2  92.2  92.6  
84.8  90.6  91.4  92.0  94.0  94.6  
85.2  89.6  90.6  92.6  92.8  94.2 
4 Functional connectivity in the human brain
As noted in the introduction, the dbICC was originally conceived as a way to evaluate the reliability of functional connectivity measures. To demonstrate how dbICC can be so applied, here we reexamine part of a data set presented by Shehzad et al. (2009) in an early study of the testretest reliability of restingstate functional connectivity. These authors, followed by others (e.g., Somandepalli et al., 2015; Choe et al., 2017), focused on ordinary ICC at each of a set of brain locations or connections. The dbICC, by contrast, offers an overall index of reliability for fMRIbased correlation matrices, viewed as gestalt measures of functional connectivity.
The data include BOLD time series of length 197, within each of 333 ROIs, for individuals, with such fMRI scans per individual; further details are provided in the Appendix. We then computed the distance between each pair of matrices among the correlation matrices thus derived, using each of three distance measures:

The distance (square root of sum of squared differences) between and .

The distance (sum of absolute differences) between and .

, where is the correlation between the lower triangular elements of and those of (correlation of correlations); the rationale for this distance is explained in Web Appendix LABEL:suppcorcor.
We stress that (i) and (ii) are not the distances induced by the matrix 2 and 1norms, since here we are interested in entrywise differences as opposed to treating the matrices as operators. Distance (i) is, rather, the distance induced by the Frobenius norm, which in turn is induced by an inner product; consequently this distance fits with the generalized true score model presented below in Section 5.2. Since the matrices are treated here as vectors, dbICC based on distance (i) is equivalent to the I2C2 estimator of Shou et al. (2013) cited at the end Section 2, although these authors focused on MRIbased images as opposed to regional connectivity matrices.
The dbICC estimates (6) based on distances (i)(iii), along with 95% bootstrap CIs, are given in the first row of Table 2. While fairly consistent with the results of Shou et al. (2013), these reliabilities are very low by classical standards.
We also examined two subsets of the 333 ROIs: 41 ROIs constituting the default mode network of the brain (DMN; Raichle et al., 2001), and 39 ROIs making up the brain’s visual network. Correlations among the ROIs within each of these networks tend to be high, as illustrated in Figure 1. Hence it comes as no surprise that dbICC values within each of these two networks, presented in the second and third rows of Table 2, are markedly higher than for the complete set of ROIs. For each set of ROIs, the dbICC values are quite consistent across the three distances.
All 333 ROIs  0.378 (0.329,0.424)  0.382 (0.335,0.426)  0.382 (0.338,0.426) 
Default mode network  0.488 (0.403,0.562)  0.493 (0.404,0.570)  0.487 (0.414,0.555) 
Visual network  0.434 (0.362,0.508)  0.435 (0.354,0.515)  0.451 (0.401,0.500) 
A likely explanation for the relatively low dbICCs for the complete set of 333 ROI’s is that many pairs of regions are essentially correlated and thus their correlation estimates largely reflect noise. This suggests that it might be possible to boost dbICC by thresholding small correlations. Figure 5 shows the effect on dbICC of softthresholding. Somewhat contrary to our expectation, softthresholding generally increased dbICC only slightly at best, and often decreased it.
5 Generalizing the SpearmanBrown formula
Is there a way to improve upon the low reliabilities found for the functional connectivity data? A general approach to boosting reliability, suggested by classical psychometrics, is to take more measurements: for example, to average over replicates of a measure, or to increase the number of questions on a test. A wellknown relation between the number of measurements and the reliability appeared in Spearman (1910) and, in a more familiar form, in Brown (1910). In this section we extend this relation to the distancebased ICC, and in Section 6 we reexamine the fMRI data results in light of our generalization of the SpearmanBrown (SB) formula.
5.1 Measurement intensity and its effect on reliability
The SB formula states that averaging each score over replicates transforms the classical ICC from to . If we let respectively denote the raw ICC and the ICC based on replicates, the formula can be written as , which with some rearrangement becomes
or alternatively
(11) 
Lord and Novick (1968) refer to
as the signaltonoise ratio (SNR), and accordingly, (
11) may be paraphrased as: the signaltonoise ratio is proportional to the number of measurements being averaged.Averaging over realvalued measurements can be viewed as just one example of a broader notion of increasing measurement intensity and thereby boosting reliability. Other instances of measurement intensity include:

An estimated covariance or correlation matrix based on a sample of multivariate observations. For functional connectivity matrices as considered above in Section 4, would be the number of time points recorded by fMRI.

A curve estimate obtained by penalized spline smoothing with observations.
Our goal in the next subsection is to derive a distancebased SB relation, i.e., an analogue of (11) in which denotes measurement intensity and is the resulting dbICC. To do this, we need a more general formulation of the true score model (1).
5.2 A true score model for general Hilbert spaces
The classical setting of realvalued measures, as well as examples (E1) and (E2), can all be viewed as instances of a general setup in which the observations are of the form (1), but the ’s are a random sample of true scores in a Hilbert space , while the ’s are random measurement errors in . We define distance in by , where is the norm induced by the inner product on . Define
(12) 
and
(13) 
for and for , where denotes expectation for measurement intensity equal to . Note that the measurement intensity affects only the expected distance between errors , but not that between scores . We make two assumptions, of which the first is implicit in (13):

The expectation in (13) is the same for versus for .

For all ,
(14)
Then
and therefore
(15) 
In the classical case where is the mean of measurements, is the mean of independent errors with mean 0 and common variance, so that
plugging this into (15) leads directly to the rearranged SB formula (11). In other cases, such as (E2), and hence the generalized SB formula (15) does not reduce to (11).
6 Applying the generalized SB formula to the fMRI data
Our goal in this section is to study the implications of the generalized SB formula (15) for correlation matrices such as those used in Section 4 as measures of functional connectivity. In Section 6.1 we show that, in the simpler setting of covariance matrix estimation, the relationship between measurement intensity and reliability is essentially the same as in the classical case of scalar measures. In Sections 6.2 and 6.3, we investigate the extent of agreement between what is expected theoretically and what is observed with simulated and real data.
6.1 An SB formula for covariance matrix estimation
Let be a random sample of covariance matrices, and for , let be sample covariance matrices, each based on independent and identically distributed (IID) observations from a
variate normal distribution with covariance matrix
. These belong to the Hilbert space of real symmetric matrices, equipped with inner product ; the norm induced by this inner product is the Frobenius (entrywise ) norm used in the fMRI example of Section 4. Note that here, unlike in the classical true score model, and are not independent since must be such that is nonnegative definite. But as shown in the Appendix, assumptions (a1) and (a2) of Section 5.2 hold, and consequently(16) 
Thus by (15),
(17) 
this is almost exactly the classical SB relation (11), but with in place of .
6.2 Loglog plots with simulated data
Suppose that, for a given collection of covariance matrices, we repeatedly generate sets of sample covariances as in Section 6.1, but with varying values of , and obtain a dbICC estimate , based on the distance, for each . Then the relation (17) suggests that the points
(18) 
should lie approximately along a line with slope 1. To test this suggestion with simulated data resembling the fMRI data analyzed in Sections 4 and 6.3, we followed the above recipe with

, and ;

() taken to be the mean of the two sample covariance matrices from the th participant’s two fMRI scans; and

a range of values from 25 to 197, approximately equally spaced on the log scale.
A plot of the resulting points (18) appears in the left panel of Figure 6
(black dots), and the bestfit line through these points has slope 0.997 with standard error 0.010, in agreement with the theoretical slope 1.
Many aspects of the fMRI data reliability analysis in Section 4 are not captured by the above simulation setup. Two of the most prominent disparities are that for the real data, (i) we computed dbICC for correlation, rather than covariance, matrices, and (ii) the multivariate observations are autocorrelated rather than independent (see Arbabshirani et al. (2014) and Zhu and Cribben (2018) regarding the impact of such autocorrelation).
The simulation study was expanded to partially address these discrepancies. Using a standard implementation (Barbosa, 2012)
for vector autoregressive models of order 1
(VAR(1); Lütkepohl, 2005), we conducted further simulations in which the th multivariate time series for the th individual was given by (), with independent innovations having zero mean and covariance matrix . The lag1 autocorrelation was set to 0.6 and 0.9; these values are consistent with AR(1) models fitted to individual ROIs in our fMRI data. The resulting points (18), with derived from sample covariance matrices, are displayed in the left panel of Figure 6. The right panel is analogous, but here is derived from sample correlation matrices. A comparison of the two panels indicates that, for given autocorrelation settings, both the estimated SNR and its dependence on are very similar for covariance versus correlation matrix estimation. Autocorrelation is seen to reduce reliability and thus to shift the SNR markedly downward. Moreover, autocorrelation seems to attenuate the linear relationship between and SNR: whereas in the IID setting the slope is 1.018 for the sample correlation matrix, again very close to the theoretical value 1, the slopes are smaller with autocorrelation 0.6 (0.986 for covariance, 0.960 for correlation) and even smaller for autocorrelation 0.9 (0.736 for covariance, 0.687 for correlation). In Web Appendix LABEL:supptable33 we present plots that are analogous to Figure 6, but based on the and distances, and we report the intercepts and slopes of the bestfit lines for all cases.6.3 Reliability based on subsets of the fMRI time series
Next we constructed loglog plots as above but based on subsets of the real fMRI time series of Section 4 rather than on simulated data. For values of ranging from 25 to the full time series length 197, we took the middle observations from each of the fMRI time series, and thus computed correlation matrices () using the same three sets of ROIs as in Section 4: all 333 ROIs proposed by Gordon et al. (2016), the default mode network, and the visual network. Loglog plots for the resulting dbICC values appear in the right panel of Figure 6. For smaller these plots are quite nonlinear and distinct from each other, but for , they each appear to stabilize with a linear pattern that is roughly parallel to the bestfit line for the simulations with lag1 autocorrelation 0.9.
This degree of agreement with the simulation results of Section 6.2
is probably as much as can be expected, given the significant discrepancies between the settings of the simulated and realdata analyses, which include the following. (i) The simulations for different
are independent, whereas with the real data, for increasing we consider a nested sequence of increasingly large subsets of the same time time series. (ii) The real time series may not be multivariate normal and presumably has a more complex pattern of autocorrelations and crosscorrelations than the simulated data.At any rate it seems clear that the theoretical loglog plot slope of 1 cannot be expected to characterize the reliability improvement attainable via longer fMRI time series. Our results offer hope that a slope around 0.7 might be attained, but at least two further caveats are in order. One is that we cannot extrapolate beyond , the full time series length for our data. A second, subtler caveat concernes the true score model (1), in the specific form outlined in Section 6.1. That model assumes that for each , the two sample covariance matrices are estimates of a common true covariance . But if in fact the underlying covariance matrix differs between the two fMRI scans for at least some of the participants, this is an additional source of withinsubject distance that is not removed by increasing the time series length , and thus will tend to level off rather than increasing linearly with . In summary, while longer fMRI scans might make correlation matrices more reliable as measures of functional connectivity, the improvement would likely be less dramatic than the results reported here might lead us to expect.
7 Further application and extension of the SB formula
Loglog plots like those in Figure 6 are a broadly applicable tool for examining the relationship between measurement intensity and reliability. As discussed in Web Appendix LABEL:suppcurvest, for penalized spline smoothing (example (E2) of Section 5.1), . Thus, arguing as in Section 6.2, a linear model fit to the points should have slope , a prediction that is borne out with simulated data.
Some distances, such as the dynamic time warping distance between signatures considered in Web Appendix LABEL:suppsigdat, do not arise from the true score model (1), even in the generalized (Hilbert spacevalued) form of Section 5.2. Whether or not the true score model applies, the dbICC (3) satisfies
(19) 
The key to the derivation of (15) is simply that, by (12)–(14),

,

, which does not depend on .
The same argument works more generally (i.e., not only in Hilbert spaces): as long as can be written as a function of whereas does not change with , it follows from (19) that
(20) 
generalizing (15), which is itself a generalization of (11).
Loglog plots might be used in this more general setting to estimate the effect of measurement intensity on , as opposed to confirming a theoretical relationship. By (20), if it is expected that for some unknown , then we can regress values of on the corresponding values of , and the resulting slope serves as an estimate of . A similar approach is used to estimate the Hurst exponent of a long memory process (Beran, 1994).
8 Discussion
In this paper we have redefined the intraclass correlation coefficient in terms of distances, and thereby extended this reliability index to arbitrary data objects for which a distance is defined. The proposed distancebased ICC leads to two extensions of the SB formula, namely (15) for Hilbert spacevalued data including covariance matrices, and (20) for more general data objects.
In an early paper on extending the ICC to multivariate data, Fleiss (1966) wrote that a classical (univariate) ICC value less than about 0.70 “is, for most purposes, taken to indicate insufficient reliability.” The much lower dbICC values that we report for functional connectivity data, along with similar results reported by others (e.g., Shou et al., 2013), are a sobering indication that in some cases, as technology has advanced, the reliability of complex new measures has retreated. This might help to explain the recentlymuchdiscussed difficulties surrounding scientific reproducibility, a desideratum that is closely related to reliability (Yu, 2013).
While our presentation has focused on testretest data, the dbICC might also be applied to assess the reliability of results obtained by algorithms, such as bootstrapping, that have a stochastic component (cf. Philipp et al., 2018).
Whereas we have developed a distancebased analogue of the intraclass correlation coefficient, the distance correlation of Székely et al. (2007) is comparable to interclass correlation coefficients. Extending ideas from distance correlation research to the intraclass setting may be an interesting avenue for future work.
A package for R (R Core Team, 2019) implementing the methods of this paper is available at https://github.com/wtagr/dbicc.
Acknowledgements
The authors thank the CoEditor, Mark Brewer, the Associate Editor and the reviewers for very helpful and thoughtful feedback, and thank Eva Petkova and Don Klein for calling attention to the need for reliable measurement in the early days of restingstate fMRI connectivity research. The work of M. Xu and P. T. Reiss was supported by Israel Science Foundation grant 1777/16 and 1076/19. The work of I. Cribben was supported by Natural Sciences and Engineering Research Council (Canada) grant RGPIN201806638 and the Xerox Faculty Fellowship, Alberta School of Business.
Supplementary Materials
Web Appendix LABEL:supplogR, referenced in Section 1, Web Appendix LABEL:suppcorcor, referenced in Section 4, Web Appendix LABEL:supptable33, referenced in Section 6.2, and Web Appendices LABEL:suppcurvest and LABEL:suppsigdat, referenced in Section 7, are available with this paper at the Biometrics website on Wiley Online Library.
References
 Alonso et al. (2010) Alonso, A., Laenen, A., Molenberghs, G., Geys, H., and Vangeneugden, T. (2010). A unified approach to multiitem reliability. Biometrics 66, 1061–1068.
 Arbabshirani et al. (2014) Arbabshirani, M. R., Damaraju, E., Phlypo, R., Plis, S., Allen, E., Ma, S., Mathalon, D., Preda, A., Vaidya, J. G., Adali, T., and Calhoun, V. D. (2014). Impact of autocorrelation on functional connectivity. NeuroImage 102, 294–308.
 Atenafu et al. (2012) Atenafu, E. G., Hamid, J. S., To, T., Willan, A. R., Feldman, B. M., and Beyene, J. (2012). Biascorrected estimator for intraclass correlation coefficient in the balanced oneway random effects model. BMC Medical Research Methodology 12, 126.
 Barbosa (2012) Barbosa, S. M. (2012). mAr: Multivariate AutoRegressive analysis. R package version 1.12.
 Beran (1994) Beran, J. (1994). Statistics for LongMemory Processes. CRC Press, Boca Raton, Florida.
 Brown (1910) Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology 3, 296–322.
 Choe et al. (2017) Choe, A. S., Nebel, M. B., Barber, A. D., Cohen, J. R., Xu, Y., Pekar, J. J., Caffo, B., and Lindquist, M. A. (2017). Comparing testretest reliability of dynamic functional connectivity methods. NeuroImage 158, 155–175.
 Cranford et al. (2006) Cranford, J. A., Shrout, P. E., Iida, M., Rafaeli, E., Yip, T., and Bolger, N. (2006). A procedure for evaluating sensitivity to withinperson change: Can mood measures in diary studies detect change reliably? Personality and Social Psychology Bulletin 32, 917–929.
 Fleiss (1966) Fleiss, J. L. (1966). Assessing the accuracy of multivariate observations. Journal of the American Statistical Association 61, 403–412.
 Fleiss (1986) Fleiss, J. L. (1986). Design and Analysis of Clinical Experiments. John Wiley & Sons, New York.
 Fujikoshi et al. (2010) Fujikoshi, Y., Ulyanov, V. V., and Shimizu, R. (2010). Multivariate Statistics: HighDimensional and LargeSample Approximations. John Wiley & Sons, Hoboken, New Jersey.
 Gordon et al. (2016) Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., and Petersen, S. E. (2016). Generation and evaluation of a cortical area parcellation from restingstate correlations. Cerebral Cortex 26, 288–303.
 Lord and Novick (1968) Lord, F. M. and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. AddisonWesley, Reading, Massachusetts.
 Lütkepohl (2005) Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer Science & Business Media.
 Mair (2018) Mair, P. (2018). Modern Psychometrics with R. Springer, Cham, Switzerland.
 McArdle and Anderson (2001) McArdle, B. H. and Anderson, M. J. (2001). Fitting multivariate models to community data: a comment on distancebased redundancy analysis. Ecology 82, 290–297.
 Mielke and Berry (2007) Mielke, P. W. and Berry, K. J. (2007). Permutation Methods: A Distance Function Approach. New York: Springer.
 Philipp et al. (2018) Philipp, M., Rusch, T., Hornik, K., and Strobl, C. (2018). Measuring the stability of results from supervised statistical learning. Journal of Computational and Graphical Statistics 27, 685–700.
 R Core Team (2019) R Core Team (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
 Raichle et al. (2001) Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., and Shulman, G. L. (2001). A default mode of brain function. Proceedings of the National Academy of Sciences 98, 676–682.
 Reiss et al. (2010) Reiss, P. T., Stevens, M. H. H., Shehzad, Z., Petkova, E., and Milham, M. P. (2010). On distancebased permutation tests for betweengroup comparisons. Biometrics 66, 636–643.
 Shehzad et al. (2009) Shehzad, Z., Kelly, A. C., Reiss, P. T., Gee, D. G., Gotimer, K., Uddin, L. Q., Lee, S. H., Margulies, D. S., Roy, A. K., Biswal, B. B., Petkova, E., Castellanos, F. X., and Milham, M. P. (2009). The resting brain: unconstrained yet reliable. Cerebral Cortex 19, 2209–2229.
 Shou et al. (2013) Shou, H., Eloyan, A., Lee, S., Zipunnikov, V., Crainiceanu, A., Nebel, M., Caffo, B., Lindquist, M., and Crainiceanu, C. (2013). Quantifying the reliability of image replication studies: the image intraclass correlation coefficient (I2C2). Cognitive, Affective, & Behavioral Neuroscience 13, 714–724.
 Somandepalli et al. (2015) Somandepalli, K., Kelly, C., Reiss, P. T., Zuo, X.N., Craddock, R. C., Yan, C.G., Petkova, E., Castellanos, F. X., Milham, M. P., and Di Martino, A. (2015). Shortterm test–retest reliability of resting state fMRI metrics in children with and without attentiondeficit/hyperactivity disorder. Developmental Cognitive Neuroscience 15, 83–93.
 Spearman (1910) Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology 3, 271–295.
 Székely et al. (2007) Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics 35, 2769–2794.
 Yan et al. (2013) Yan, C.G., Craddock, R. C., Zuo, X.N., Zang, Y.F., and Milham, M. P. (2013). Standardizing the intrinsic brain: towards robust measurement of interindividual variation in 1000 functional connectomes. NeuroImage 80, 246–262.
 Yu (2013) Yu, B. (2013). Stability. Bernoulli 19, 1484–1500.
 Zhu and Cribben (2018) Zhu, Y. and Cribben, I. (2018). Sparse graphical models for functional connectivity networks: best methods and the autocorrelation issue. Brain Connectivity 8, 139–165.
Appendix A
a.1 fMRI data description and preprocessing
The restingstate fMRI data set, downloaded from http://www.nitrc.org/projects/nyu_trt, includes 25 participants (mean age 29.44 8.64, 10 males) scanned at New York University. A Siemens Allegra 3.0Tesla scanner was used to obtain three restingstate scans for each participant, though for this analysis, we considered only the second and third scans, which were less than one hour apart. Each scan consisted of 197 contiguous EPI functional volumes with time repetition (TR) = 2000 ms; time echo (TE) = 25 ms; flip angle (FA) = ; 39 number of slices, matrix = ; field of view (FOV) = 192 mm; voxel size mm. During each scan, the participants were asked to relax and remain still with eyes open. For spatial normalization and localization, a highresolution T1weighted magnetization prepared gradient echo sequence was obtained (MPRAGE, TR = 2500 ms; TE = 4.35 ms; inversion time = 900 ms; FA = , number of slices = 176; FOV = 256 mm).
The data were preprocessed using the FSL (http://www.fmrib.ox.ac.uk) and AFNI (http://afni.nimh.nih.gov/afni) software packages. The images were (i) motion corrected using FSL’s mcflirt (rigid body transform; cost function normalized correlation; reference volume the middle volume) and then (ii) normalized into the Montreal Neurological Institute space using FSL’s flirt (affine transform; cost function mutual information). (iii) FSL’s fast was then used to obtain a probabilistic segmentation of the brain to acquire white matter and cerebrospinal fluid (CSF) probabilistic maps, thresholded at 0.99. (iv) AFNI’s 3dDetrend was then used to remove the nuisance signals, namely the six motion parameters, white matter and CSF signals, and the global signal. (v) Finally, using FSL’s fslmaths, the volumes were spatially smoothed using a Gaussian kernel with FWHM = 6mm.
The ROIs for our connectivity analysis are derived from the work of Gordon et al. (2016), who parcellated the cortical surface into 333 areas within which homogeneous connectivity patterns are observed. Time courses for these 333 ROIs were obtained for each subject by averaging over all of the voxels within each region. Each regional time course was then detrended and standardized to unit variance, and then we applied a 4thorder Butterworth filter with passband 0.01–0.10 Hertz.
a.2 (a1), (a2) and for sample covariance matrices
Sample covariance matrices of multivariate normal samples are a special case of the true score model of Section 5.2 in which, for each , , a covariance matrix, and for each ,
(21) 
where is the sample covariance matrix of an IID random sample . Here we verify assumptions (a1) and (a2) of Section 5.2 for this case, and derive expression (16) for .
By (21), in (13) are independent meanzero matrices, implying that
For , since are independent meanzero matrices. On the other hand, if then are independent and of mean zero, conditionally on , and thus again
Hence the expectation defining does not depend on whether or not , i.e., (a1) holds; and
(22) 
for as in (21).
For (a2), it suffices to show that . This follows since
while since is independent of and of mean zero.
By a standard result in multivariate analysis, conditionally on
, has a Wishart distribution with degrees of freedom; thus by Theorem 2.2.6 of Fujikoshi et al. (2010),These results lead to
Combining this with (22) gives
where the expectation is with respect to the distribution of the true covariance matrices . This confirms (16).
Comments
There are no comments yet.