Factorial experiments, initially proposed by Fisher (1935) and Yates (1937), have been widely used in the agricultural science (see textbooks by Cochran and Cox 1950; Kempthorne 1952; Hinkelmann and Kempthorne 2007; Cox and Reid 2000) and engineering (see textbooks by Box et al. 2005; Wu and Hamada 2011). Recently, factorial experiments also become popular in social sciences (e.g., Angrist et al., 2009; Dasgupta et al., 2015; Branson et al., 2016). The completely randomized factorial experiment (CRFE) balances covariates under different treatment combinations on average. However, with increasing numbers of pretreatment covariates and treatment factors, there will be an increasing chance to have unbalanced covariates with respect to multiple factorial effects. Many researchers have recognized this issue in different experimental designs (e.g., Fisher, 1926; Student, 1938; Hansen and Bowers, 2008; Bruhn and McKenzie, 2009). To avoid this, we can force a treatment allocation to have covariate balance, which is sometimes called rerandomization (e.g., Cox, 1982, 2009; Morgan and Rubin, 2012), restricted or constrained randomization (e.g., Yates, 1948; Grundy and Healy, 1950; Youden, 1972; Bailey, 1983).
Extending Morgan and Rubin (2012)’s proposal for treatment-control experiments, Branson et al. (2016) proposed to use rerandomization in factorial experiments to improve covariate balance, and studied finite sample properties of this design under the assumptions of equal sample sizes of all treatment combinations, Gaussianity of covariate and outcome means, and additive factorial effects. Without requiring any of these assumptions, we propose more general covariate balance criteria for rerandomization in
factorial experiments, extend their theory with an asymptotic analysis of the sampling distributions of the usual factorial effect estimators, and provide large-sample confidence sets for the average factorial effects.
Rerandomization in factorial experiments have two salient features that differ from rerandomization in treatment-control experiments. First, the factorial effects can have different levels of importance a priori. Many factorial experimental design principles hinge on the belief that main effects are often more important than two-way interactions, and two-way interactions are often more important than higher-order interactions (e.g., Finney, 1943; Bose, 1947; Cochran and Cox, 1950; Cox and Reid, 2000; Wu, 2015). Consequently, we need to impose different stringencies for balancing covariates with respect to factorial effects of different importance. Second, covariates may also vary in importance based on prior knowledge about their associations with the outcome. We establish a general theory that can accommodate rerandomization with tiers of both factorial effects and covariates.
Second, in treatment-control experiments, we are often interested in a single treatment effect. In factorial experiments, however, multiple factorial effects are simultaneously of interest, motivating the asymptotic theory about the joint sampling distribution of the usual factorial effect estimators. In particular, for the joint sampling distribution, we use “central convex unimodality” (Dharmadhikari and Jogdeo, 1976; Kanter, 1977) to describe its unimodal property, and “peakedness” (Sherman, 1955)
to quantify the intuition that it is more “concentrated” at the true factorial effects under rerandomization than the CRFE. These two mathematical notions for multivariate distributions extend unimodality and narrower quantile ranges for univariate distributions(Li et al., 2018a), and they are also crucial for constructing large-sample confidence sets for factorial effects.
In sum, our asymptotic analysis further demonstrates the benefits of rerandomization in factorial experiments compared to the classical CRFE (Branson et al., 2016). The proposed large-sample confidence sets for factorial effects will facilitate the practical use of rerandomization in factorial experiments and the associated repeated sampling inference.
The paper proceeds as follows. Section 2 introduces the notation. Section 3 discusses sampling properties and linear projections under the CRFE. Section 4 studies rerandomization using the Mahalanobis distance criterion. Section 5 studies rerandomization with tiers of factorial effects. Section 6 contains an application to an education dataset. Section 7 concludes with possible extensions. The online Supplementary Material (Li et al., 2018b) contains all technical details. We do not make any attempt to review the extensive literature of confounding and fractional replication in the context of factorial experiments. Instead, we focus on the repeated sampling properties of estimators under rerandomization in factorial experiments.
2 Notation for a factorial experiment
2.1 Potential outcomes and causal estimands
Consider a factorial experiment with units and treatment factors, where each factor has two levels, and . In total there are treatment combinations, and for each treatment combination , let be the levels of the factors. We use potential outcomes to define causal effects in factorial experiments (Neyman, 1923; Dasgupta et al., 2015; Branson et al., 2016). For unit , let be the potential outcome under treatment combination , and be the
dimensional row vector of all potential outcomes. Letbe the average potential outcome under treatment combination , and be the dimensional row vector of all average potential outcomes. Dasgupta et al. (2015) characterized each factorial effect by a dimensional column vector with half of its elements being and the other half being . For example, the average main effect of factor is
where is called the generating vector for the main effect of factor . For an interaction effect among several factors, the -vector is an element-wise multiplication of the -vectors for the main effects of the corresponding factors. There are in total factorial effects. Let be the generating vector for the th factorial effect (). For unit , is the th individual factorial effect, and is the dimensional column vector of all individual factorial effects. Let be the th average factorial effect, and be the dimensional column vector of all average factorial effects. The definitions of the factorial effects imply and with coefficient vectors
Intuitively, the th main effect compares the average potential outcomes when factor is at and levels, and the interaction effect among two factors compares the average potential outcomes when both factors are at the same level and different levels. We can view a higher order interaction as the difference between two conditional lower order interactions. For example, the interaction among factors 1–3 equals the difference between the interactions of factors 1 and 2 given factor 3 at and levels. See Dasgupta et al. (2015) for more details. Below we use an example to illustrate the definitions.
We consider factorial experiments with factors, and use to denote these three factors. Table 1 shows the definitions of the ’s and the ’s. Specifically, the first three columns represent the levels of three factors in all treatment combinations, and they generate the main effects of factors . The remaining columns are the element-wise multiplications of subsets of that generate the interaction effects. The coefficient vector consists of all the elements in the th row of Table 1.
2.2 Treatment assignment, covariate imbalance and rerandomization
For each unit , represents the dimensional column vector of pretreatment covariates. For instance, in the education example in Section 6, college freshmen receive different academic services and incentives after entering the university, and their pretreatment covariates include high school GPA, gender, age, and etc. Let be the treatment assignment, where if unit receives treatment combination . Let be the number of units under treatment combination , and
be the treatment assignment vector for all units. In the CRFE, the probability thattakes a particular value is , where for all . Let be the finite population covariate mean vector; for , let be the covariate mean vector for units that receive treatment combination . For , the dimensional difference-in-means vector of covariates with respect to the th factorial effect is
Let be the dimensional column vector of the difference-in-means of covariates with respect to all factorial effects. Although has mean zero under the CRFE, for a realized value of , covariate distributions are often imbalanced among different treatment combinations. For example, we consider a CRFE with factors, uncorrelated covariates, and equal treatment group sizes . In this case, with asymptotic probability , at least one of the difference-in-means in (2.2
) with respect to a covariate and a factorial effect standardized by its standard deviation is larger than 1.96, the 0.975-quantile of. This holds due to the asymptotic Gaussianity of with zero mean and diagonal covariance matrix, implied by Proposition 1 discussed shortly.
Rerandomization is a design to prevent undesirable treatment allocations. When covariate imbalance occurs for a realized randomization under a certain criterion, we discard this unlucky realization and rerandomize the treatment assignment until this criterion is satisfied. Generally, rerandomization proceeds as follows (Morgan and Rubin, 2012): first, we collect covariate data and specify a covariate balance criterion; second, we continue randomizing the units into different treatment groups until the balance criterion is satisfied; third, we conduct the physical experiment using the accepted randomization. A major goal of this paper is to discuss the statistical analysis of the data from a rerandomized factorial experiment.
There are three additional issues on covariates. First, covariates are attributes of the units that are fixed before the experiment. The experimenter may observe some covariates, but s/he can not change their values. The treatments do not affect the covariates. Second, the covariates can be general (discrete or continuous). We can use binary indicators to represent discrete covariates. Third, the covariates can include transformations of the basic covariates and their interactions. This enables us to balance the marginal and joint distributions of the basic covariates. SeeBaldi Antognini and Zagoraiou (2011) for a related discussion in the treatment-control experiment.
To facilitate the discussion, for a positive semi-definite matrix with rank , and a positive integer , we use to denote a matrix such that . Specifically, if is the eigen-decomposition of where , and , then we can choose . The choice of is generally not unique. In the special case with , we use to denote the unique positive-semidefinite matrix satisfying the definition of . We use for the Kronecker product of two matrices, and for element-wise multiplications of vectors. We say a matrix is smaller than or equal to and write as , if is positive semi-definite. We say a random vector (or its distribution) is symmetric, if have the same distribution. We say a random vector is spherically symmetric, if its distribution is invariant under orthogonal transformations. In the asymptotic analysis, we use for two sequences of random vectors converging weakly to the same distribution, after scaling by .
3 completely randomized factorial experiments
The sampling distributions of factorial effect estimators under rerandomization are the same as their conditional distributions given that the treatment assignment vector satisfies the balance criterion. Therefore, we first study the joint sampling distribution of the difference-in-means of the outcomes and covariates. It depends on the finite population variances and covariances:and for potential outcomes, for factorial effects, for covariates, and for potential outcomes and covariates. The covariance is known without any uncertainty. However, other variances or covariances (e.g., and ) involve potential outcomes or individual factorial effects and are thus generally unknown.
3.1 Asymptotic sampling distribution under the CRFE
Let be the observed outcome of unit , and be the average observed outcome under treatment combination . For , the difference-in-means estimator for the th average factorial effect is
Let be the dimensional column vector consisting of all factorial effect estimators.
In the finite population inference, the covariates and potential outcomes are all fixed, and the only random component is the treatment vector . In the asymptotic analysis, we further embed the finite population into a sequence with increasing sizes, and introduce the following regularity conditions.
As , the sequence of finite populations satisfies that for each ,
the proportion of units under treatment combination , , has a positive limit,
the finite population variance and covariances and have limiting values, and and its limit are non-degenerate,
Under the CRFE, has mean zero and sampling covariance matrix
Under the CRFE and Condition 1, .
follows from a finite population central limit theorem(Li and Ding, 2017, Theorems 3 and 5), with the proof in Appendix A2 of the Supplementary Material (Li et al., 2018b). Proposition 1 immediately gives the sampling properties of any single factorial effect estimator. Let be the th diagonal element of and be the th diagonal element of . Then is unbiased for with sampling variance , and Moreover,
cannot be unbiasedly estimated from the observed data, and it equalsunder the additivity defined below. Under the additivity, the individual treatment effect does not depend on covariates, i.e., there is no treatment-covariate interaction.
The factorial effects are additive if and only if the individual factorial effect is a constant vector for all units, or, equivalently, .
Under the CRFE, the observed sample variance is unbiased for , because the units receiving treatment combination are from a simple random sample of size . Similar to Neyman (1923), we can conservatively estimate by , and then construct Wald-type confidence sets for
. Both the sampling covariance estimator and confidence sets are asymptotically conservative unless the additivity holds. It is then straightforward to construct confidence sets for any linear transformations of.
3.2 Linear projections
First, we decompose the potential outcomes. Let be the finite population linear projection of the ’s on the ’s, and be the corresponding residual. The finite population linear projection of on is then , and the corresponding residual is . Let and be the finite population variances and covariances of and , respectively. Define
as analogues of the sampling covariance in Proposition 1, with the potential outcomes ’s replaced by the linear projections ’s and the residuals ’s, respectively. We have .
Second, we decompose the factorial effect estimator .
Under the CRFE, the linear projection of on is , the corresponding residual is , and they have sampling covariances:
Theorem 1 follows from Proposition 1 and some matrix calculations, with the proof in Appendix A2 of the Supplementary Material (Li et al., 2018b). Let and be the th diagonal elements of and , respectively. The multiple correlation in the following corollary will play an important role in the asymptotic sampling distribution of under rerandomization. We summarize its equivalent forms below.
Under the CRFE, the sampling squared multiple correlation between and has the following equivalent forms:
It reduces to , the finite population squared multiple correlation between and under the additivity in Definition 1.
4 Rerandomization using the Mahalanobis distance
As shown in Section 3.1, although has mean , its realized value can be very different from for a particular treatment allocation. Rerandomization can avoid this drawback. In the design stage, we can force balance of the covariate means by ensuring to be “small.”
4.1 Mahalanobis distance criterion
A measure of the magnitude of is the Mahalanobis distance We further let be a positive constant predetermined in the design stage. Using as the balance criterion, we accept a treatment assignment vector from the CRFE if and only if . Below we use ReFM to denote rerandomized factorial experiments using as the criterion, and to denote the event that the treatment vector satisfies this criterion. From Proposition 1, is asymptotically , and therefore the asymptotic acceptance probability is under ReFM. In practice, we usually choose a small threshold , or equivalently a small , e.g., . However, we do not advocate choosing to be too small, because an extremely small may lead to too few configurations of treatment allocations in ReFM.
4.2 Asymptotic sampling distribution of under ReFM
Rerandomization in the design stage accepts only the treatment assignments resulting in covariate balance, which consequently changes the sampling distribution of . Understanding the asymptotic sampling distribution of is crucial for conducting the classical repeated sampling inference of . Intuitively, has two parts: one part is orthogonal to and thus unaffected by ReFM, and the other part is the linear projection onto and thus affected by ReFM. Let be an dimensional standard Gaussian random vector, and be an dimensional truncated Gaussian random vector, where . The following theorem shows the asymptotic sampling distribution of .
Under ReFM and Condition 1,
where and are independent.
Theorem 2 holds because the sampling distribution of under rerandomization is the same as the conditional distribution of given . Its proof is in Appendix A3 of the Supplementary Material (Li et al., 2018b). We emphasize that, although the matrix may not be unique, the asymptotic sampling distribution (4.1) is. Therefore, the asymptotic sampling distribution of under ReFM depends only on , , and . Theorem 2 immediately implies the asymptotic sampling distribution of a single factorial effect estimator. Let , and be the first coordinate of .
Under ReFM and Condition 1, for ,
4.3 Review of the central convex unimodality
In this subsection, we review a generalization of unimodality to multivariate distributions and apply it to study the asymptotic sampling distribution (4.1). This property will be important for constructing conservative large-sample confidence sets later.
Although the definition of symmetric unimodality for univariate distribution is simple and intuitive, it is nontrivial to generalize it to multivariate distribution. Here we adopt the central convex unimodality proposed by Dharmadhikari and Jogdeo (1976) based on the results of Sherman (1955), which is also equivalent to the symmetric unimodality in Kanter (1977). For a set of distributions on , we say that is closed convex if it satisfies two conditions: (i) for any distributions and for any , the distribution is in , and (ii) a distribution is in if there exists a sequence of distributions in converging weakly to . For any set of distributions, let the closed convex hull of be the smallest closed convex set containing . A compact convex set in Euclidean space is called a convex body if it has a nonempty interior. A set is symmetric if . Below we introduce the definition.
A distribution on is central convex unimodal if it is in the closed convex hull of , where is the set of all uniform distributions on symmetric convex bodies in
is the set of all uniform distributions on symmetric convex bodies in.
The class of central convex unimodal distributions is closed under convolution, marginality, product measure, and weak convergence (Kanter, 1977)
. A sufficient condition for the central convex unimodality is having a log-concave probability density function(Kanter, 1977; Dharmadhikari and Joag-Dev, 1988). The following proposition states the central convex unimodality of the asymptotic sampling distribution of under ReFM.
The standard Gaussian random vector , the truncated Gaussian random vector , and the asymptotic sampling distribution (4.1) are all central convex unimodal.
4.4 Representation for the asymptotic sampling distribution of
In this subsection, we further represent (4.1) using well-known distributions to gain more insights. Let be a truncated random variable, be an dimensional random vector whose coordinates are independent random signs with probability of being , and be an dimensional Dirichlet random vector with parameters . Let be the element-wise square root of the vector , and .
is spherically symmetric with covariance . It follows where are jointly independent.
Proposition 3 follows from the spherical symmetry of the standard multivariate Gaussian random vector, with the proof in Appendix A3 of the Supplementary Material (Li et al., 2018b). Proposition 3 allows for easy simulations of the asymptotic sampling distribution (4.1), which is useful for the repeated sampling inference discussed shortly. For simplicity, in the remaining paper, we assume that is invertible whenever we mention its inverse; otherwise we can focus on a lower dimensional linear transformation of (Li et al., 2018b). Let be the matrix measuring the relative sampling covariance of explained by , and be its eigen-decomposition, where
is an orthogonal matrix and
is a diagonal matrix with nonnegative elements. The eigenvaluesare the canonical correlations between the sampling distributions of and under the CRFE, which measure the association between the potential outcomes and covariates. Under the additivity, . The following corollary gives an equivalent form of (4.1) highlighting the dependence on the canonical correlations .
The proof of Corollary 3 is in Appendix A3 of the Supplementary Material (Li et al., 2018b). The second term in (4.3), affected by rerandomization, depends on the canonical correlations and the asymptotic acceptance probability of ReFM. Below we use a numerical example to illustrate such dependence.
We consider the case with , and , and focus on the standardized distribution which depends on and . First, we fix . Figure 0(a) shows the density of the first two coordinates of for different . As increases, the density becomes more concentrated around zero, showing that the stronger the association is between the potential outcomes and covariates, the more precise the factorial effect estimators are.
Second, we fix . Figure 0(b) shows the density of the first two coordinates of for different . As the asymptotic acceptance probability decreases, the density becomes more concentrated around zero, confirming the intuition that a smaller asymptotic acceptance probability gives us more precise factorial effect estimators. Note that the first component in the asymptotic sampling distribution (4.3) does not depend on and is usually nonzero. For example, when is positive definite, is positive definite, as well as the coefficient of in (4.3). Therefore, the gain of ReFM by decreasing usually becomes smaller as decreases.
4.5 Asymptotic unbiasedness, sampling covariance and peakedness
In this subsection, we further study the asymptotic properties of under ReFM. First, the factorial effects estimator is consistent for Because covariates are potential outcomes unaffected by the treatment, the difference-in-means of any observed or unobserved covariate with respect to any factorial effect has asymptotic mean zero.
Second, we compare the asymptotic sampling covariance matrices of under ReFM and the CRFE, which also gives the reduction in asymptotic sampling covariances of difference-in-means of covariates as a special case.
Under Condition 1, the asymptotic sampling covariance matrix of under ReFM is smaller than or equal to that under the CRFE, and the reduction in asymptotic sampling covariance is . Specifically, the percentage reduction in asymptotic sampling variance (PRIASV) of is .