A non-inferiority test for R-squared with random regressors

02/19/2020 ∙ by Harlan Campbell, et al. ∙ The University of British Columbia 0

Determining the lack of association between an outcome variable and a number of different explanatory variables is frequently necessary in order to disregard a proposed model. This paper proposes a non-inferiority test for the coefficient of determination (or squared multiple correlation coefficient), R-squared, in a linear regression analysis with random predictors. The test is derived from inverting a one-sided confidence interval based on a scaled central F distribution.



There are no comments yet.


page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The coefficient of determination (or squared multiple correlation coefficient), , is a well-known and well-used statistic for linear regression analysis.

summarizes the “proportion of variance explained” by the predictors in the linear model and is equal to the square of the Pearson correlation coefficient between the observed and predicted outcomes

(Nagelkerke and others, 1991; Zou et al., 2003). Despite the statistic’s ubiquitous use, its corresponding population parameter, which we will denote as , as in Cramer (1987), is rarely discussed. is sometimes known as the “parent multiple correlation coefficient” (Barten, 1962) or the “population proportion of variance accounted for” (Kelley and others, 2007); see Cramer (1987) for details.

Campbell and Lakens (2020) introduced a non-inferiority test (a one-sided equivalence test) for in order to test the hypotheses:


where is the non-inferiority margin representing a range of effect sizes of negligible magnitude. The test is useful for determining whether one can reject the hypothesis that the total proportion of variance in the outcome, , attributable to the set of covariates, , is greater than or equal to . Or phrased somewhat differently, the test asks whether we “can disregard the whole model”? (Campbell and Lakens, 2020).

Campbell and Lakens (2020)

compared their frequentist non-inferiority test with a Bayesian approach based on Bayes Factors and also provided a version of the test for the

parameter in a fixed effects (or “between subjects”) analysis of variance (ANOVA). However, the non-inferiority test put forward only applied to cases with fixed regressors. The sampling distribution of can be quite different when regressor variables are random; see Gatsonis and Sampson (1989).

Indeed, depending on whether regressors are fixed or random, certain inference procedures for

will be different. Random regressors are more common in observational studies, whereas fixed regressors are more common in experimental studies where the regressors are randomized by experimenters or otherwise fixed by some study intervention. For a standard null hypothesis significance test (i.e.,

), the same central -distributed statistic can be used for random regressors and fixed regressors. This is due to the fact that when the null hypothesis is true, the sampling distribution of is identical for both cases. However, when , the sampling distribution of does indeed depend on whether the regressors are fixed or random.

In this short article, we propose a non-inferiority test for situations with random regressors. In the social sciences and many other fields of study, the assumption of fixed regressors is often violated and therefore it is important to consider for this possibility (Bentler and Lee, 1983). In Section 2, we describe the proposed test and in Section 3 we conduct a small simulation study to examine the test’s operating characteristics.

2 A non-inferiority test for random regressors

Let be the number of observations and be the number of covariates in a standard multivariable linear regression analysis. Let be the outcome variable for the -th subject and

be the vector of covariates,

, for the -th subject. Then the matrix is a by design matrix and the linear regression model can be summarized by:


where is the column-vector of regression coefficients and is the residual variance.

As mentioned in the Introduction, we are specifically interested in the scenario of “random regressors,” in which the covariates, , are assumed to be stochastic rather than fixed. In practice, the assumption of “fixed regressors” would be more appropriate for a randomized trial, whereas the assumption of “random regressors” would be more appropriate for an observational study. We require that the rows of X be independent of each other and independent of .

A non-inferiority test -value can be obtained by inverting a one-sided confidence interval. However, constructing a confidence interval for with random regressors is not at all obvious. Several procedures have been proposed in the literature. These include Wald-type confidence intervals and bootstrap-based confidence intervals (Tan Jr, 2012). However, neither of these approaches have particularly good finite sample properties; see Algina (1999).

Helland (1987) proposes obtaining a confidence interval for by relying on a scaled central approximation of , and provides a simple iterative procedure that provides “surprisingly good” (Helland, 1987) accuracy. Tan Jr (2012) agrees. After reviewing a number of alternative methods, Tan Jr (2012) concludes that “the scaled central approximation [method] seems to be a simple and good procedure to construct an asymptotic confidence interval.” We will therefore use this proposed confidence interval, inverted, for our non-inferiority test. Note that the scaled central approximation method is based on the assumption that the covariate matrix

has a multivariate normal distribution.

For a given value for (e.g., ), and taking for an initial value, , we can obtain a one-sided confidence interval for (e.g., a one-sided upper 90% CI) by iterating between calculating and until convergence, where:




where is the % percentile of the central distribution with and degrees of freedom.

We then calculate the upper ()% confidence interval, , as follows:


Note that in the R package “MBESS” (Kelley and others, 2007), the function ci.R2 can be used to calculate a one-sided confidence interval for with random regressors. This calculation is based on the scaled non-central approximation of Lee (1971) and, in our experience, will provide a very similar result. Note that there is also SAS code and SPSS code made available from Zou (2007) for the calculation of confidence intervals based on the scaled non-central F approximation.

In order to obtain a -value for a non-inferiority test (), we must invert the upper one-sided confidence interval. We proceed as follows. First, we calculate the following -statistic:


We then iterate between calculating and until convergence:




The -value for the non-inferiority test can then be calculated as:


where is the cdf of the central -distribution with and degrees of freedom. It is important to remember that the above test makes the assumption that the residuals and the regressors are independent of one another and that both are normally distributed.

3 Simulation Study

We conducted a simple simulation study in order to better understand the operating characteristics of the non-inferiority test and to confirm that the test has correct type 1 error rates. We followed a very similar design for the simulation study as Campbell and Lakens (2020). We simulated data for each of thirty scenarios, one for each combination of the following parameters:

  • one of three variances: , , or ;

  • one of five sample sizes: , , , , or, ;

  • one of two values for , or ; with or , ( for all scenarios). The covariates values are sampled from a multivariate normal distribution. For , we have:

    For , we have:

For each single simulated dataset, we sampled a new matrix from the chosen multivariate normal distribution. Depending on the particular values of and , the true coefficient of determination for these data is either , , or . Parameters for the simulation study were chosen so as to obtain three unique values for approximately evenly spaced between 0 and 0.10.

For each of the thirty configurations, we simulated 50,000 unique datasets and calculated a non-inferiority -value with each of 19 different values of (ranging from 0.01 to 0.10). We then calculated the proportion of these -values less than .

Figure 1: Simulation study results. Upper panel shows results for ; lower panel shows results for . Both plots are presented with a restricted vertical-axis to better show the type 1 error rates. The solid horizontal black line indicates the desired type 1 error of . For each of thirty configurations, we simulated 50,000 unique datasets and calculated a non-inferiority -value with each of 19 different values of (ranging from 0.01 to 0.10).
Figure 2: Simulation study, complete results. Upper panel shows results for ; Lower panel shows results for . The solid horizontal black line indicates the desired type 1 error of . For each of thirty configurations, we simulated 50,000 unique datasets and calculated a non-inferiority -value with each of 19 different values of (ranging from 0.01 to 0.10).

Figures 1 and 2 plot the results. Note that Figure 1 is on restricted vertical axis to better show the type 1 error rates. We see that when the non-inferiority bound equals the true effect size (i.e., 0.034, 0.065, or 0.080), the type 1 error rate is exactly 0.05, as it should be, for all moderately large values of . This situation represents the boundary of the null hypothesis, i.e. . When is smaller (i.e., when or ), the type 1 error is slightly larger than the desired rate of equals the true effect size.

As the equivalence bound increases beyond the true effect size (i.e., ), the alternative hypothesis is then true and it becomes possible to correctly reject the null. As expected, the power of the test increases with larger values of , larger values of , and smaller values of . Also, in order for the test to have substantial power, the must be substantially smaller than .

4 Conclusion

If none of the explanatory variables in a linear regression analysis are statistically significant, can we simply disregard the full model? How can we formally test whether the proportion of variance attributable to the full set of explanatory variables is too small to be considered meaningful? In this short article, we introduced a non-inferiority test to help address these questions. The test can be used to reject effect sizes that are as large or larger than a pre-determined

as estimated by

. Note that researchers must decide which effect size is considered meaningful or relevant (Lakens et al., 2018), and define accordingly, prior to observing any data; see Campbell and Gustafson (2018) for details.

The non-inferiority test put forward is specifically intended for the case of random regressors which is a common case in the social sciences and in observational research more broadly. As such, this paper supplements the work of Campbell and Lakens (2020) who put forward a non-inferiority testing of the coefficient of determination in a linear regression with fixed regressors. It would be worthwhile to investigate the extent to which the two tests differ. It would also be worthwhile to expand upon the very limited simulation study from Section 3. A larger simulation study to further our understanding of how the non-inferiority tests operates in a variety of scenarios would certainly be worthwhile.

5 Appendix: R-code

Note that one can calculate the confidence interval from equation (4) and the -value from equation (8) in R with the following R code.

An R function for calculating the confidence interval from equation (4):

UpperCI_random <- function(Rsq, n, k, alpha, tol = 1.0e-12){
    Psq <- Rsq; Psq_last <- 1;Ψ # initial value
    while(abs(Psq_last - Psq) > tol){
        Psq_last  <- Psq
        v ΨΨ  <- (((n-k-1)*Psq + k)^2)/(n-1-(n-k-1)*(1-Psq)^2)
        Fstat Ψ  <- qf(alpha/2, v, n-k-1)
        Psq_num   <- (n-k-1)*Rsq - (1-Rsq)*k*Fstat
        Psq_den   <- (n-k-1)*(Rsq + (1-Rsq)*Fstat)
        Psq Ψ  <- Psq_num/Psq_den}
    UpperCI <- ((n-k-1)*Rsq - (1-Rsq)*k*Fstat) / ((n-k-1)*( Rsq + (1-Rsq)*Fstat))

## Example: a 90% upper CI for P2 with N=1250, K=6, R2=0.085:
N <- 1250; K <- 6; Rsquared <- 0.085; Alpha <- 0.10;
UpperCI_random(Rsq = Rsquared, n = N, k = K, alpha = Alpha)
# 0.1069415
# we can compare this to the CI based on the scaled noncentral F approximation:
CI_compare <- ci.R2(R2=Rsquared, K, N-K-1, TRUE, conf.level=1-2*Alpha)
# 0.1013726

An R function for calculating the -value from equation (8) :

noninvR2_random <- function(Rsq, n, k, delta, tol = 1.0e-12){

    Psq      <- Rsq; Psq_last <- 1; # initial value
    F_num Ψ <- (n-k-1)*Rsq*(delta-1)
    F_den Ψ <- ((Rsq-1) * (delta*(n-k-1) + k))
    Fstat Ψ <- F_num/F_den

    while(abs(Psq_last - Psq) > tol){
        Psq_last <- Psq
        v        <- (((n-k-1)*Psq + k)^2)/(n-1-(n-k-1)*(1-Psq)^2)
        Psq_num  <- (n-k-1)*Rsq - (1-Rsq)*k*Fstat
        Psq_den  <- (n-k-1)*(Rsq + (1-Rsq)*Fstat)
        Psq      <- Psq_num/Psq_den
pval <- pf(Fstat, v, n-k-1, lower.tail=TRUE)

## Example: a non-inferiority test for P2 with N=1250, K=6, R2=0.085 and Delta=0.10:
N <- 1250; K <- 6; Rsquared <- 0.075; Delta <- 0.10
noninvR2_random(Rsq = Rsquared, n = N, k = K, delta = Delta)
# 0.02710537


  • J. Algina (1999) A comparison of methods for constructing confidence intervals for the squared multiple correlation coefficient. Multivariate behavioral research 34 (4), pp. 493–504. Cited by: §2.
  • A. Barten (1962)

    Note on unbiased estimation of the squared multiple correlation coefficient

    Statistica Neerlandica 16 (2), pp. 151–164. Cited by: §1.
  • P. M. Bentler and S. Lee (1983) Covariance structures under polynomial constraints: applications to correlation and alpha-type structural models. Journal of Educational Statistics 8 (3), pp. 207–222. Cited by: §1.
  • H. Campbell and P. Gustafson (2018) What to make of non-inferiority and equivalence testing with a post-specified margin?. arXiv preprint arXiv:1807.03413. Cited by: §4.
  • H. Campbell and D. Lakens (2020) Can we disregard the whole model?. in press - British Journal of Mathematical and Statistical Psychology. Cited by: §1, §1, §1, §3, §4.
  • J. S. Cramer (1987) Mean and variance of R2 in small and moderate samples. Journal of Econometrics 35 (2-3), pp. 253–266. Cited by: §1.
  • C. Gatsonis and A. R. Sampson (1989) Multiple correlation: exact power and sample size calculations.. Psychological Bulletin 106 (3), pp. 516. Cited by: §1.
  • I. S. Helland (1987) On the interpretation and use of r2 in regression analysis. Biometrics, pp. 61–69. Cited by: §2.
  • K. Kelley et al. (2007) Confidence intervals for standardized effect sizes: theory, application, and implementation. Journal of Statistical Software 20 (8), pp. 1–24. Cited by: §1, §2.
  • D. Lakens, A. M. Scheel, and P. M. Isager (2018) Equivalence testing for psychological research: a tutorial. Advances in Methods and Practices in Psychological Science 1 (2), pp. 259–269; https://doi.org/10.1177/2515245918770963. Cited by: §4.
  • Y. Lee (1971) Some results on the sampling distribution of the multiple correlation coefficient. Journal of the Royal Statistical Society: Series B (Methodological) 33 (1), pp. 117–130. Cited by: §2.
  • N. J. Nagelkerke et al. (1991) A note on a general definition of the coefficient of determination. Biometrika 78 (3), pp. 691–692. Cited by: §1.
  • L. Tan Jr (2012) Confidence intervals for comparison of the squared multiple correlation coefficients of non-nested models. Cited by: §2, §2.
  • G. Y. Zou (2007) Toward using confidence intervals to compare correlations.. Psychological methods 12 (4), pp. 399. Cited by: §2.
  • K. H. Zou, K. Tuncali, and S. G. Silverman (2003) Correlation and simple linear regression. Radiology 227 (3), pp. 617–628. Cited by: §1.