I Background
Let us begin with a simple case of a onefactor independent groups design. Consider a set of data , on which we impose a linear model as follows:
where represents the grand mean, represents the treatment effect associated with group , and . In all, we have independent observations. We define two hypotheses:
Recall that for and , the Bayes factor (Kass and Raftery, 1995), denoted , is defined as the ratio of marginal likelihoods for and , respectively. That is,
This ratio indicates the extent to which the prior odds for
over are updated after observing data.DERIVE FORMULA FOR POSTERIOR MODEL PROB HERE!!
In Faulkenberry (2018), it was shown that for any independentgroups design, one can use the results of an analysis of variance to compute an approximation of that is based on a unit information prior (Wagenmakers, 2007; Masson, 2011). Specifically
(1) 
where is the ratio from a standard analysis of variance applied to these data.
As an example, consider a hypothetical dataset containing groups of observations each (for a total of independent observations). Suppose that an ANOVA produces , . This result would be considered as “significant” by conventional standards, and traditional practice would dictate that we reject in favor of . But is this result really evidential for ? We can apply Equation 1 as follows:
This result indicates quite the opposite: by definition of the Bayes factor, this implies that the observed data are almost 16 times more likely under than . Note that the appearance of such contradictory conclusions from two different testing frameworks is actually a classic result known as Lindley’s paradox (Lindley, 1957).
Ii The BIC approximation for repeated measures
Our goal now is to modify Equation 1 to the case where we have an experimental design with repeated measurements. For context, consider an experiment where measurements are taken from each of subjects. We then have a total of
observations, but they are no longer independent measurements. Assume a linear mixed model structure on the observations:
where represents the grand mean, represents the treatment effect associated with group , represents the effect of subject , and . Due to the correlated structure of these data, we have independent observations. We will define and as above.
Prior work of Wagenmakers (2007) has demonstrated that can be approximated as , where
Here, is equal to the number of independent observations; as noted above, this is equal to . represents the variability left unexplained by ; for an ANOVA, this is equal to . represents the variability left unexplained by ; for an ANOVA, this is equal to the sum of and . Finally, is equal to the difference in the number of parameters between and ; this is equal to .
We are now ready to derive a formula for . First, we will reexpress in terms of :
Thus, we can write
If we invert the term containing and divide into the resulting numerator, we get the following formula:
(2) 
where equals the number of subjects and equals the number of repeated measurements per subject.
ii.1 Some examples
We can now apply Equation 2 to compute Bayes factors for a couple of examples. The examples below are based on data from Faulkenberry et al. (2018). In this experiment, subjects were presented with pairs of single digit numerals and asked to choose the numeral that was presented in the larger font size. For each of subjects, median response times were calculated for each of conditions – congruent trials and incongruent trials. Congruent trials were defined as those in which the physically larger digit was also the numerically larger digit (e.g., 2 – 8). Incongruent trials were defined such that the physically larger digit was numerically smaller (e.g., 2 – 8). The resulting ANOVA summary table is depicted in Table 1.
Source  

Subjects  285639  22  12984  
Treatment  45360  1  45360  39.63  
Residual  25182  22  1145  
Total  356181  45 
Applying Equation 1 gives us the following:
The resulting Bayes factor displays quite powerful evidence against ; if we cast the Bayes factor in favor of , we get , indicating that the observed data are approximately 30,000 times more likely under than
. This provides overwhelming support for the presence of an effect of physical/numerical congruity on median response times. Converting the Bayes factor to a posterior model probability, we also see incredible evidence for
:Now let us consider our second example. In addition to analyzing median response times, Faulkenberry et al. (2018) also fit each subjects’ distribution
of response times to a parametric model
(i.e., the shifted Wald distribution; see Anders et al., 2016; Faulkenberry, 2017, for details), allowing them to investigate the effects of congruity on shape, scale, and location of the response time distributions. Specifically, they predicted that the leading edge, or shift, of the distributions would not differ between congruent and incongruent trials, thus providing support against an early encodingbased explanation of the observed sizecongruity effect (Santens and Verguts, 2011; Faulkenberry et al., 2016; Sobel et al., 2016, 2017). The shift parameter was calculated for both of the congruity conditions for each of the subjects. The resulting ANOVA summary table is presented in Table 2Source  

Subjects  103984  22  4727  
Treatment  739  1  739  1.336  
Residual  12176  22  553  
Total  116399  45 
Applying Equation 1 gives us the following:
This Bayes factor tells us that the observed data are approximately 2.4 times more likely under than . Converting the Bayes factor to a posterior model probability, we also see positive evidence for :
Iii Accounting for correlation between repeated measurements
In a recent paper, Nathoo and Masson (2016) took a slightly different approach to the problem we have , investigating the role of effective sample size in repeated measures designs (Jones, 2011). For singlefactor repeated measures designs, effective sample size can be computed as , where is the intraclass correlation. When , , and when , . Though is unknown, Nathoo and Masson (2016) developed a method to estimate it from values in the ANOVA, leading to the following refined estimate:
Though this estimate certainly provides a better account of the correlation between repeated measurements, the benefit comes at a price of added complexity, and certainly one cannot reduce this formula easily to a simple expression involving only as we do with Equation 2. This leads to the natural question: how well does our Equation 2 match up with the more complex approach of Nathoo and Masson (2016)?
As a first step toward answering this question, let us revisit the two examples presented above. If we apply the Nathoo and Masson formula to the ANOVA summary in Table 1, we obtain:
We can convert to a Bayes factor, giving us . As above, we cast this Bayes factor in favor of by inverting, so . This implies . Note that the general interpretation of these results is on par with our earlier method; both indicate overwhelming support for . If anything, the approximation we obtained with Equation 2 is slightly conservative regarding support for ; this is because the method of Nathoo and Masson was designed to reduce the BIC penalty for when repeated measures conditions are highly correlated; compared to the formulation upon which Equation 2 is based, this will tend to increase the support for Nathoo and Masson (2016).
Iv Simulation study
The computations in the previous section reflect two preliminary findings. First, the revised BIC formula of Nathoo and Masson (2016) yields Bayes factors and posterior model probabilities that take into account an estimate of the correlation between repeated measurements. This is a highly principled approch which our Equation 2 does not take. However, as we can see with both computations, the general conclusion remains the same regardless of whether we used Equation 2 or the Nathoo and Masson method. Given that our Equation 2 is (1) easy to use, and (2) requires only three inputs (the number of subjects , the number of repeated measurement conditions , and the statistic), could it be that Equation 2 produces results that are sufficient for daytoday work, with the risk of being conservative being outweighed by the simplicity of our formula? To answer this question, we conducted a Monte Carlo simulation to systematically investigate the relationship between Equation 2 and the Nathoo and Masson method across a wide variety of randomly generated datasets.
In this simulation, we randomly generated datasets that reflected the repeatedmeasures designs that we have discussed throughout this paper. Specifically, data were generated from the linear mixed model
where represents a grand mean, represents a treatment effect, and represents a subject effect. For convenience, we set , though similar results were obtained with other values of (not reported here). Also, we assume and . We systematically varied three components of the model:

The number of observations for each subject was set to either , , or ;

The intraclass correlation between treatment conditions was set to be either or ;

The size of the treatment effect was manipulated to be either null, small, or medium. Specifically, these effects were defined as follows. Let (i.e., the condition mean for treatment ). Then we define effect size as
and correspondingly, we set to one of three values: (null effect), (small effect), and (medium effect). Also note that since we can write the intraclass correlation as
it follows directly that we can alternatively parameterize effect size as
Using this expression, we were able to set our marginal variance to be constant across the varying values of our simulation parameters.
For each combination of number of observations (), effect size (), and intraclass correlation (
), we generated 1000 simulated datasets. For each of the datasets, we applied a repeatedmeasures ANOVA model and extracted two posterior probabilities for
; one based on Equation 2 and one based on the refined estimate of Nathoo and Masson (2016). The results are depicted in Figure 1.The primary message of Figure 1 is clear; our Equation 2, which was derived from the original BIC method (Wagenmakers, 2007; Masson, 2011; Faulkenberry, 2018) performs comparably to the refined BIC method of Nathoo and Masson (2016) across a variety of empirical situations. In the cases where was true (the first row of Figure 1, both Equation 2 and the Nathoo and Masson (2016) method produce posterior probabilities for that are reasonably large. For both methods, the variation of these estimates decreases as the number of observations increases. When the intraclass correlation is small (), the estimates from Equation 2 and the Nathoo and Masson (2016) method are virtually identical. When the intraclass correlation is large (), the Nathoo and Masson (2016) method introduces slightly more variability in the posterior probability estimates. In all, these results indicate that Equation 2 is slightly more favorable when is true.
For small effects (row 2 of Figure 1), the performance of both methods depended heavily on the correlation between repeated measurements. For small intraclass correlation (), both methods were quite supportive of , even though was the true model. This reflects the conservative nature of the BIC approximation (Wagenmakers, 2007); since the unit information prior is uninformative and puts reasonable mass on a large range of possible effect sizes, the predictive updating value for any positive effect (i.e., will be smaller than would be the case if the prior was more concentrated on smaller effects. As a result, the posterior probability for is smaller as well. Regardless, the original BIC method (Equation 2 and the Nathoo and Masson (2016) method produce similar results. The picture is different when the intraclass correlation is large (); both methods produce a wide range of posterior probabilities, though they are again highly comparable. It is worth pointing out that the posterior probability estimates all improve with increasing numbers of observations; but this should not be surprising, given that the BIC approximation underlying both Equation 2 and the Nathoo and Masson (2016) method is large sample approximation technique.
For medium effects (row 3 of Figure 1), we see much of the same message that we’ve already discussed previously. Both Equation 2 and the Nathoo and Masson (2016) method produce similar posterior probability values for . Both methods improve with increasing sample size, and at least for mediumsize effects, the computations are quite reliable for high values of correlation between repeated measurements.
V Conclusion
In this paper, we have proposed a formula for estimating Bayes factors from repeated measures ANOVA designs. These ideas extend previous work of Faulkenberry (2018), who presented such formulas for betweensubject designs. Such formulas are advantageous for researchers in a wide variety of empirical disciplines, as they provide an easytouse method for estimating Bayes factors from a minimal set of summary statistics. This gives the user a powerful index for estimating evidential value from a set of experiments, even in cases where the only data available are the summary statistics published in a paper. We think this provides a welcome addition to the collection of tools for doing Bayesian computation with summary statistics (e.g., Ly et al., 2018).
Further, we demonstrated that our formula performs similarly to a more refined, yet more complex formula of Nathoo and Masson (2016), who were able to explicitly estimate and account for the correlation between repeated measurements. Though the Nathoo and Masson (2016) approach is certainly more principled than a “onesizefitsall” approach, it does require knowledge of the various sumsofsquares components from the repeatedmeasures ANOVA, and to our knowledge, there is not yet any obvious way to recover the Nathoo and Masson (2016) estimates from the statistic alone. Thus, given the similar performance between our method compared to the Nathoo and Masson (2016) method, we think our method stands at a slight advantage, not only for its simplicity, but also its power in light of minimal available information.
References
 Anders et al. (2016) Anders, R., Alario, F.X., and Van Maanen, L. (2016). The shifted wald distribution for response time data analysis. Psychological Methods, 21(3):309–327.
 Faulkenberry (2017) Faulkenberry, T. J. (2017). A singleboundary accumulator model of response times in an addition verification task. Frontiers in Psychology, 8.
 Faulkenberry (2018) Faulkenberry, T. J. (2018). Computing bayes factors to measure evidence from experiments: An extension of the bic approximation. Biometrical Letters, 55(1):31–43.
 Faulkenberry et al. (2016) Faulkenberry, T. J., Cruise, A., Lavro, D., and Shaki, S. (2016). Response trajectories capture the continuous dynamics of the size congruity effect. Acta Psychologica, 163:114–123.
 Faulkenberry et al. (2018) Faulkenberry, T. J., Vick, A. D., and Bowman, K. A. (2018). A shifted wald decomposition of the numerical sizecongruity effect: Support for a late interaction account. Polish Psychological Bulletin, 49(4):391–397.
 Jones (2011) Jones, R. H. (2011). Bayesian information criterion for longitudinal and clustered data. Statistics in Medicine, 30(25):3050–3056.
 Kass and Raftery (1995) Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430):773–795.
 Lindley (1957) Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(12):187–192.
 Ly et al. (2018) Ly, A., Raj, A., Etz, A., Marsman, M., Gronau, Q. F., and Wagenmakers, E.J. (2018). Bayesian reanalyses from summary statistics: A guide for academic consumers. Advances in Methods and Practices in Psychological Science, 1(3):367–374.

Masson (2011)
Masson, M. E. J. (2011).
A tutorial on a practical Bayesian alternative to nullhypothesis significance testing.
Behavior Research Methods, 43(3):679–690.  Nathoo and Masson (2016) Nathoo, F. S. and Masson, M. E. (2016). Bayesian alternatives to nullhypothesis significance testing for repeatedmeasures designs. Journal of Mathematical Psychology, 72:144–157.
 Rouder et al. (2012) Rouder, J. N., Morey, R. D., Speckman, P. L., and Province, J. M. (2012). Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56(5):356–374.
 Rouder et al. (2009) Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., and Iverson, G. (2009). Bayesian tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2):225–237.
 Santens and Verguts (2011) Santens, S. and Verguts, T. (2011). The size congruity effect: Is bigger always more? Cognition, 118(1):94–110.
 Sobel et al. (2016) Sobel, K. V., Puri, A. M., and Faulkenberry, T. J. (2016). Bottomup and topdown attentional contributions to the size congruity effect. Attention, Perception, & Psychophysics, 78(5):1324–1336.
 Sobel et al. (2017) Sobel, K. V., Puri, A. M., Faulkenberry, T. J., and Dague, T. D. (2017). Visual search for conjunctions of physical and numerical size shows that they are processed independently. Journal of Experimental Psychology: Human Perception and Performance, 43(3):444–453.
 Wagenmakers (2007) Wagenmakers, E.J. (2007). A practical solution to the pervasive problems of values. Psychonomic Bulletin & Review, 14(5):779–804.
Comments
There are no comments yet.