1 Introduction
Randomized experiments are often considered the “gold standard” of scientific investigations because, on average, randomization balances all potential confounders, both observed and unobserved (Krause & Howard, 2003). However, many have noted that randomized experiments can yield “bad allocations,” where some covariates are not wellbalanced across treatment groups (Seidenfeld, 1981; Lindley, 1982; Papineau, 1994; Rosenberger & Sverdlov, 2008). Covariate imbalance among different treatment groups complicates the interpretation of estimated causal effects, and thus covariate adjustments are often employed, typically through regression or other comparable methods.
However, it would be better to prevent such covariate imbalances from occurring before treatment is administered, rather than depend on assumptions for covariate adjustment posttreatment which may not hold (Freedman, 2008). One common experimental design tool is blocking, where units are first grouped together based on categorical covariates, and then treatment is randomized within these groups. However, blocking is less intuitive when there are noncategorical covariates. A more recent experimental design tool that prevents covariate imbalance and allows for noncategorical covariates is the rerandomization scheme of Morgan & Rubin (2012), where units are randomized until a prespecified level of covariate balance is achieved. Rerandomization has been discussed as early as R.A. Fisher (e.g., see Fisher (1992)), and more recent works (e.g., Cox (2009), Bruhn & McKenzie (2009), and Worrall (2010)) recommend rerandomization. Morgan & Rubin (2012) formalized these recommendations in treatmentversuscontrol settings and was one of the first works to establish a theoretical framework for rerandomization schemes. Since Morgan & Rubin (2012), several extensions have been made. Morgan & Rubin (2015) developed rerandomization for treatmentversuscontrol experiments where there are tiers of covariates that vary in importance; Branson et al. (2016) extended rerandomization to factorial designs; and Zhou et al. (2017) developed a rerandomization scheme for sequential designs. Finally, Li et al. (2016) established asymptotic results for the rerandomization schemes considered in Morgan & Rubin (2012) and Morgan & Rubin (2015).
All of these works focus on using an omnibus measure of covariate balance—the Mahalanobis distance (Mahalanobis, 1936)—during the rerandomization scheme. The Mahalanobis distance is wellknown within the matching and observational study literature, where it is used to find subsets of the treatment and control that are similar (Rubin, 1974; Rosenbaum & Rubin, 1985; Gu & Rosenbaum, 1993; Rubin & Thomas, 2000)
. The Mahalanobis distance is particularly useful in rerandomization schemes because (1) it is symmetric in the treatment assignment, which leads to unbiased estimators of the average treatment effect under rerandomization; and (2) it is equalpercent variance reducing if the covariates are ellipsoidally symmetric, meaning that rerandomization using the Mahalanobis distance reduces the variance of all covariate mean differences by the same percentage
(Morgan & Rubin, 2012).However, the Mahalanobis distance is known to perform poorly in matching for observational studies when covariates are not ellipsoidally symmetric, there are strong collinearities among the covariates, or there are many covariates (Gu & Rosenbaum, 1993; Olsen, 1997; Stuart, 2010). One reason for this is that matching using the Mahalanobis distance places equal importance on balancing all covariates as well as their interactions (Stuart, 2010), and this issue also occurs in rerandomization schemes that use the Mahalanobis distance. This issue was partially addressed by Morgan & Rubin (2015), who proposed an extension of Morgan & Rubin (2012) that incorporates tiers of covariates that vary in importance, such that the most important covariates receive the most variance reduction. However, this requires researchers to specify an explicit hierarchy of importance for the covariates, which might be difficult, especially when the number of covariates is large.
As an alternative, we consider a rerandomization scheme using a modified Mahalanobis distance that inflates the eigenvalues of the covariates’ covariance matrix to alleviate collinearities among the covariates, which has connections to ridge regression
(Hoerl & Kennard, 1970). Such a quantity has remained largely unexplored in the literature. First we establish several theoretical properties about this quantity, as well as several properties about a rerandomization scheme that uses this quantity. We show through simulation that a rerandomization scheme that incorporates this modified criterion can be beneficial in terms of variance reduction when there are strong collinearities among the covariates. In particular, this rerandomization scheme automatically specifies a hierarchy of importance based on the eigenstructure of the covariates, which can be useful when researchers are unsure about how much importance they should place on each covariate when designing a randomized experiment. We also discuss how this modified Mahalanobis distance connects to other criteria, such as principal components and the Euclidean distance. Because the rerandomization literature has focused almost exclusively on the Mahalanobis distance, this work also contributes to the literature by exploring the use of other criteria besides the Mahalanobis distance for rerandomization schemes.The remainder of this paper is organized as follows. In Section 2, we introduce the notation that will be used throughout the paper. In Section 3, we review the rerandomization scheme of Morgan & Rubin (2012). In Section 4, we outline our proposed rerandomization approach and establish several theoretical properties of this approach, as well as several theoretical properties about the modified Mahalanobis distance. In Section 5, we provide simulation evidence that suggests that our rerandomization approach is often preferable over other rerandomization approaches, particularly in highdimensional or highcollinearity settings. In Section 6, we conclude with a discussion of future work.
2 Notation
We use the colon notation for tuples of objects, and we let for any univariate function . We respectively denote by and the identity matrix and the
dimensional column vector whose coefficients are all equal to
. Given a matrix , we denote by its coefficient, its th row, its th column, its transpose, and its trace when is square. Given two symmetric matrices and of the same size, we write (resp. ) if the matrix is positive definite (resp. semidefinite).Let be the matrix representing covariates measured on experimental units. Let if unit is assigned to treatment and 0 otherwise, and let . Unless stated otherwise, we will focus on completely randomized experiments (Imbens & Rubin, 2015, see Definition 4.2) with a fixed number of treated units and control units. For a given assignment vector , we define and as the respective covariate mean vectors within treatment and control. Finally, we define the covariance matrix of the covariate mean differences with respect to the distribution of given , and we assume . The spectral decomposition ensures that is diagonalizable with eigenvalues . Let
be the orthogonal matrix of corresponding eigenvectors, so that we may write
, where denotes the diagonal matrix whose coefficient is .For completely randomized experiments, we have , where is the sample covariance matrix of with (Morgan & Rubin, 2012). Thus, and its eigenstructure are available in closedform, and the latter coincides with the eigenstructure of up to a scaling factor. We let
denote a chisquared distribution with
degrees of freedom,its cumulative distribution function (CDF) evaluated at
, and itsquantile for
.3 Review of Rerandomization
We follow the potential outcomes framework (Rubin, 1990, 2005), where each unit has fixed potential outcomes and , which denote the outcome for unit under treatment and control, respectively. Thus, the observed outcome for unit is . Define as the vector of observed outcomes. We focus on the average treatment effect as the causal estimand, defined as
(1) 
Furthermore, we focus on the meandifference estimator
(2) 
where and are the average treatment and control outcomes, respectively. When conducting a randomized experiment, ideally we would like and to be close; otherwise, the estimator could be confounded by imbalances in the covariate means.
Morgan & Rubin (2012) focused on a rerandomization scheme using the Mahalanobis distance to ensure that the covariate means are reasonably balanced for a particular treatment assignment. The Mahalanobis distance between the treatment and control covariate means is defined as
(3) 
where the dependence of on the assignment vector is implicit through . Morgan & Rubin (2012) suggest randomizing units to treatment and control by performing independent draws from the distribution of until for some threshold . Hereafter, we refer to this procedure of randomizing units until as rerandomization. The expected number draws until the first acceptable randomization is equal to , where
is the probability that a particular realization of
yields a Mahalanobis distance less than or equal to . Thus, fixing effectively allocates an expected computational budget and induces a corresponding threshold : the smaller the acceptance probability , the smaller the threshold and thus the more balanced the two groups, but the larger the expected computational cost of drawing an acceptable . For example, to restrict rerandomization to the “best” 1% randomizations, one would set , which implicitly sets equal to the quantile of the distribution of given . If one assumes , then , so that can be chosen equal to the quantile of a chisquared distribution with degrees of freedom. The assumptioncan be justified by invoking the finite population Central Limit Theorem
(Erdös & Rényi, 1959; Li & Ding, 2017). When the distribution of is unknown, one can approximate it via Monte Carlo by simulating independent draws of and setting to the quantile of ’s empirical distribution.Morgan & Rubin (2012) established that the meandifference estimator under this rerandomization scheme is unbiased in estimating the average treatment effect , i.e., that . Furthermore, they also established that under rerandomization, if and , then not only are the covariate mean differences centered at , i.e., , but also they are more closely concentrated around than they would be under randomization. More precisely, Morgan & Rubin (2012) proved that
(4)  
(5) 
Therefore, under their assumptions, rerandomization using the Mahalanobis distance reduces the variance of each covariate mean difference by compared to randomization. Morgan & Rubin (2012) call this last property equally percent variance reducing (EPVR). Thus, using the Mahalanobis distance for rerandomization can be quite appealing, but Morgan & Rubin (2012) rightly point out that nonEPVR rerandomization schemes may be preferable in settings with covariates of unequal importances. This is in part addressed by Morgan & Rubin (2015), who developed a rerandomization scheme that incorporates tiers of covariates that vary in importance. However, this requires researchers to specify an explicit hierarchy of covariate importance, which may not be immediately clear, especially when the number of covariates is large.
4 Ridge Rerandomization
As an alternative, we define a modified Mahalanobis distance as
(6) 
for some prespecified . Guidelines for choosing will be provided in Section 4.2. The eigenvalues of in (6) are inflated in a way that is reminiscent of ridge regression (Hoerl & Kennard, 1970). For this reason, we will refer to the quantity as the ridge Mahalanobis distance. To our knowledge, the ridge Mahalanobis distance has remained largely unexplored, except for Kato et al. (1999), who used it in an application for a Chinese and Japanese character recognition system. Our proposed rerandomization scheme, referred to as ridge rerandomization, involves using the ridge Mahalanobis distance in place of the standard Mahalanobis distance within the rerandomization framework of Morgan & Rubin (2012). In other words, one randomizes the assignment vector until for some threshold .
In order to make a fair comparison between rerandomization and ridge rerandomization, we will fix the expected computational cost of ridge rerandomization by calibrating the respective thresholds so that
(7) 
Thus, fixing implicitly determines the pair , so that for every fixed and corresponds a unique that satisfies (7).
As we will discuss in Section 4.3, the ridge Mahalanobis distance alleviates collinearity among the covariate mean differences by placing higher importance on the directions that account for the most variation. In that section we also discuss how ridge rerandomization encapsulates a spectrum of other standard rerandomization schemes. But first, in Section 4.1 we establish several theoretical properties of ridge rerandomization for some prespecified , and in Section 4.2 we provide guidelines for specifying .
4.1 Properties of Ridge Rerandomization
The following theorem establishes that, on average, the covariate means in the treatment and control groups are balanced under ridge rerandomization, and that is an unbiased estimator of under ridge rerandomization.
Theorem 4.1 (Unbiasedness under ridge rerandomization).
Let and be some prespecified constants. If , then
and
Theorem 4.1 is a particular case of Theorem 2.1 and Corollary 2.2 from Morgan & Rubin (2012). Theorem 4.1 follows from the symmetry of in treatment and control, in the sense that both assignments and yield the same value of . From Morgan & Rubin (2012), we even have the stronger result that for any covariate , regardless of whether is observed or not.
Now we establish the covariance structure of under ridge rerandomization. To do this, we first derive the exact distribution of . The following lemma establishes that if we assume , then is distributed as a weighted sum of independent random variables, where the sizes of the weights are ordered in the same fashion as the sizes of the eigenvalues of .
Lemma 4.1 (Distribution of ).
Let be some prespecified constant. If , then
(8) 
where and are the eigenvalues of .
The proof of Lemma 4.1 is provided in the Appendix; see Section 7.1. Under the Normality assumption, the representation in (8) provides a straightforward way to simulate independent draws of , despite its CDF being typically intractable and requiring numerical approximations (e.g., see Bodenham & Adams, 2016, and references therein).
Using Lemma 4.1, we can derive the covariance structure of under ridge rerandomization, as stated by the following theorem.
Theorem 4.2 (Covariance structure under ridge rerandomization).
Let and be some prespecified constants. If and , then
(9) 
where is the orthogonal matrix of eigenvectors of corresponding to the ordered eigenvalues , and for all ,
(10) 
with .
The proof of Theorem 4.2 is in the Appendix in Section 7.2. The quantities are intractable functions of and and thus need to be approximated numerically, as explained in Section 4.2.1. Conditioning on in (10) effectively constrains the magnitude of the positive random variables . Since the weights of their respective contributions to are positive and nonincreasing with , we may conjecture that . Possible directions for a proof may make use of Proposition 2.1 from Palombi & Toti (2013) and Equation (A.1) from Palombi et al. (2017).
Using the above results, we can now compare randomization, rerandomization, and ridge rerandomization. Under the assumptions stated in Theorem 4.2, the covariance matrices of under randomization, rerandomization, and ridge rerandomization can be respectively written as
(11)  
(12)  
(13) 
where (12) follows from Theorem 3.1 in Morgan & Rubin (2012) with , and (13) follows from Theorem 4.2 with defined in (10). If we define new covariates as the principal components of the original ones, i.e., , then (12) and (13) respectively yield
(14) 
and
(15) 
for all , where is the th principal component mean difference between the treatment and control groups, i.e., the th coefficient of . From (14) we see that rerandomization reduces the variances of the principal component mean differences equally by and is thus EPVR for the principal components, as well as for the original covariates, as discussed in Section 3. On the other hand, ridge rerandomization reduces these variances by unequal amounts: the variance of the th principal component mean difference is reduced by , and because typically , ridge rerandomization places more importance on the first principal components.
Translating (15) back to the original covariates yields the following corollary, which establishes that ridge rerandomization is always preferable over randomization in terms of reducing the variance of each covariate mean difference.
Corollary 4.1 (Variance reduction for ridge rerandomization).
Under the assumptions of Theorem 4.2, ridge rerandomization reduces the variance of the th covariate mean difference by , where
(16) 
satisfies , so that
(17) 
The proof of Corollary 4.1 is provided in the Appendix; see Section 7.3. Reducing the variance of the covariate mean differences is beneficial for precisely estimating the average treatment effect if the outcomes are correlated with the covariates. For example, Theorem 3.2 of Morgan & Rubin (2012) establishes that—under several assumptions, including additivity of the treatment effect—rerandomization reduces the variance of defined in (2) by percent, where denotes the squared multiple correlation between the outcomes and the covariates. Now we establish how the variance of behaves under ridge rerandomization.
In the rest of this section, we assume—as in Morgan & Rubin (2012)—that the treatment effect is additive. Without loss of generality, for all , we can write the outcome of unit as
(18) 
where is the projection of the potential outcomes onto the linear space spanned by , and captures any misspecification of the linear relationship between the outcomes and . Let and , where .
Theorem 4.3 below establishes that the variance of under ridge rerandomization is always less than or equal to the variance of under randomization. Thus, ridge rerandomization always leads to a more precise treatment effect estimator than randomization.
Theorem 4.3.
The proof of Theorem 4.3 is in the Appendix; see Section 7.4. The conditional independence assumption was also leveraged in the proof of Theorem 3.2 in Morgan & Rubin (2012).
The fact that ridge rerandomization performs better than randomization is arguably a low bar, because this is the purpose of any rerandomization scheme. The following corollary quantifies how ridge rerandomization performs compared to the rerandomization scheme of Morgan & Rubin (2012).
Corollary 4.2.
Under the assumptions of Theorem 4.3, the difference in variances of between rerandomization and ridge rerandomization is
It is not necessarily the case that for all , and so it is not guaranteed that ridge rerandomization will perform better or worse than rerandomization in terms of treatment effect estimation. Ultimately, the comparison of rerandomization and ridge rerandomization depends on , which is typically not known until after the experiment has been conducted.
However, in Section 5.3
, we provide some heuristic arguments for when ridge rerandomization would be preferable over rerandomization, along with simulation evidence that confirms these heuristic arguments. In particular, we demonstrate that ridge rerandomization is preferable over rerandomization when there are strong collinearities among the covariates. We also discuss a “worstcase scenario” for ridge rerandomization, where
is specified such that ridge rerandomization should perform worse than rerandomization in terms of treatment effect estimation accuracy.In order to implement ridge rerandomization, researchers must specify the threshold and the regularization parameter . The next section provides guidelines for choosing these parameters.
4.2 Guidelines for choosing and
For ridge rerandomization, we recommend starting by specifying an acceptance probability , which then binds and together via the identity (7). Once is fixed, there exists a uniquely determined threshold for each such that . As in Morgan & Rubin (2012), acceptable treatment allocations under ridge rerandomization are generated by randomizing units to treatment and control until . Thus, a smaller leads to stronger covariate balance according to at the expense of computation time.
The only choice that remains after fixing is the regularization parameter . Section 4.2.1 details how is automatically calibrated once we fix and . The choice of is investigated in Section 4.2.3, after discussing how to assess the performance of ridge rerandomization in Section 4.2.2.
4.2.1 Calibration of
Given and , we can choose to set equal to the quantile of the quadratic form defined by
(19) 
where . Such a choice of is a good approximation of the quantile of , especially when is large enough for to be approximately Normal, as motivated by Lemma 4.1. Let denote the CDF of . Since is a weighted sum of independent
variables, its characteristic function
is given by , which can then be inverted to yieldwhere
(20) 
as detailed in Equation (3.2) of Imhof (1961). In practice, for any fixed , can be computed with arbitrary precision and at a negligible cost by using any (deterministic) univariate numerical integration scheme. We can then approximate with by choosing large enough. As explained in Imhof (1961), the approximation tends to improve as the number of covariates increases, and one can guarantee a truncation error of at most in absolute value by choosing . Computationally cheaper but less accurate alternatives to approximate are discussed in Bodenham & Adams (2016).
Finally, we approximate the quantile of by
(21) 
i.e., the quantile of . The hat on only reflects the distributional approximation of by , whereas the errors due to numerical integration and truncation can be regarded as virtually nonexistent compared to the Monte Carlo errors involved in the later approximations of . In the simulations of Section 5, we will use by default.
4.2.2 Approximation of and
We will use Corollary 4.1 and Theorem 4.2 as a proxy for how ridge rerandomization improves the variance of each covariate mean difference as compared to rerandomization. We would like to set so that the ’s defined in (10) are small, in a sense to be made precise in the next section. To achieve this, we would need to compute for all , which involves intractable conditional expectations. By considering simulated sets of independent variables for and , the expectations appearing in (10) can be consistently estimated via Monte Carlo, for all , by
(22) 
with and defined in (21), where denotes the indicator function of a set . Using (22), we can then estimate from Corollary 4.1 consistently as , for all , by
(23) 
For simplicity, we will regard the computational cost of generating independent Normal variables as negligible compared to the expected cost of generating successive random assignment vectors and testing the acceptability of each assignment, since the former can be done in parallel at virtually the same cost as generating one single Normal random variable.
4.2.3 Choosing
In this section, assume that has been fixed. Note that choosing corresponds to rerandomization using the Mahalanobis distance. Thus, we would only choose some if it is preferable over rerandomization, in the following sense. There are many metrics that could be used for comparing rerandomization and ridge rerandomization; for simplicity, we focus on the average percent reduction in variance across covariate mean differences. Arguably, one rerandomization scheme is preferable over another if it is able to achieve a higher average reduction in variance across covariates. Thus, ideally, we would only choose a particular if . In practice, we will use the criterion
(24) 
where and are respectively defined in (5) and (23), with being set to , i.e., the choice of as recommended by Morgan & Rubin (2012). Proving the existence of some such that (24) holds is challenging, so we propose the following iterative procedure for choosing such a if it exists. The procedure relies on (5), (21), and (23), where the auxiliary Normal variables only need to be simulated once and can then be reused when testing different values of .
Procedure for finding a desirable
Specify , , , and .
Initialize and .
While :
Set .
If , then set .
If , then return .
Else, define for all , and return:
The justification of our proposed procedure stems from the following facts. By definition, we have for all . By taking the limit as under the assumptions of Lemma 4.1, we get
so that
(26) 
where is the quantile of the distribution of . This in turn implies that, for all , we have
(27) 
where for all . Since the limits in (27) are strictly positive, this shows that increasing beyond a certain value will no longer yield any practical gain. This is in line with the intuition that the ridge Mahalanobis distance degenerates to the Euclidean distance when , as discussed further in Section 4.3. Thus, in practice, it is sufficient to search for only over a bounded range of values. The lower bound corresponds to rerandomization with the standard Mahalanobis distance; the upper bound is determined dynamically via Step 3, which is guaranteed to stop in finite time by using an argument similar to (26). The step size can be chosen as a fraction of the smallest strictly positive gap between consecutive eigenvalues, i.e., with the convention . Finally, among all the acceptable ’s satisfying (24), Step 4 returns the that aims at altering the covariance structure of the least, in the sense of minimizing the distance between and the linear span of , i.e.,
where stands for the Frobenius norm. The inner minimization can be written as
which is attained at with for all , thus yielding (25). The outer minimization is then straightforward since the set of candidates is finite by construction.
When the set is empty, we simply return , although the following heuristic argument illustrates why we would expect the existence of at least one such that (24) holds. The rerandomization scheme of Morgan & Rubin (2012) spreads the benefits of variance reduction across all covariates equally; however, note that the term is monotonically increasing in the number of covariates for a fixed acceptance probability . A consequence of this is that if one can instead determine a smaller set of covariates that is most relevant, then that smaller set of covariates can benefit from a greater variance reduction than what would be achieved by considering all covariates. As we mentioned at the end of Section 3, this idea was partially addressed in Morgan & Rubin (2015), which extended the rerandomization scheme of Morgan & Rubin (2012) to allow for tiers of covariate importance specified by the researcher, such that the most important covariates receive the most variance reduction. Ridge rerandomization, on the other hand, automatically specifies a hierarchy of importance based on the eigenstructure of the covariate mean differences. To provide intuition for this idea, consider a simple case where the smallest eigenvalues are all arbitrarily close to . In this case, we can find such that for the largest eigenvalues and for the remaining eigenvalues, so that would be approximately distributed as with an effective number of degrees of freedom strictly less than . For some fixed acceptance probability and corresponding thresholds and , we would then have
(28) 
since is fixed and . The relative variance reduction for ridge rerandomization would then be for the first principal components—which in this simple example make up the total variation in the covariate mean differences—while the relative variance reduction for rerandomization would be for the covariates. Thus, in this case, ridge rerandomization would achieve a greater variance reduction on a lowerdimensional representation of the covariates than typical rerandomization.
This heuristic argument also hints that our method has connections to a principalcomponents rerandomization scheme, where one instead balances on some lower dimension of principal components rather than on the covariates themselves. We discuss this point further in Section 4.3.
4.3 Connections to Other Rerandomization Schemes
Ridge rerandomization has connections to other rerandomization schemes. Ridge rerandomization requires specifying the parameter ; thus, consider two extreme choices of :

: , i.e., corresponds to the Mahalanobis distance.

: , i.e., tends to a scaled Euclidean distance.
Thus, for any finite , the distance defined by can be regarded as a compromise between the Mahalanobis and Euclidean distances. Rerandomization using the Euclidean distance is similar to a rerandomization scheme that places a separate caliper on each covariate, which was proposed by Moulton (2004), Maclure et al. (2006), Bruhn & McKenzie (2009), and Cox (2009). However, Morgan & Rubin (2012) note that such a rerandomization scheme is not affinely invariant and does not preserve the correlation structure of across randomizations. See Morgan & Rubin (2012) for a full discussion of the benefits of using affinely invariant rerandomization criteria. As discussed in Section 4.2.3, our proposed procedure aims for larger variance reductions of important covariate mean differences while mitigating the perturbation of the correlation structure of .
As an illustration, consider a randomized experiment where units are assigned to treatment and control; and furthermore, where there are two correlated covariates, generated as and for . Figure 1 shows the distribution of across 1000 randomizations, rerandomizations (with ), ridge rerandomizations (with and ), and rerandomizations using the Euclidean distance instead of the Mahalanobis distance.
All three rerandomization schemes reduce the variance of for , compared to randomization; however, rerandomization using the Euclidean distance destroys the correlation structure of , while rerandomization and ridge rerandomization largely maintain it. This provides further motivation for Step 4 of the procedure presented in Section 4.2.3.
Furthermore, as discussed in Sections 4.1 and 4.2.3, ridge rerandomization can be regarded as a “softthresholding” version of a rerandomization scheme that would focus solely on the first principal components of . A “hardthresholding” rerandomization scheme would use a truncated version of the Mahalanobis distance, defined as
with
Comments
There are no comments yet.