I Introduction
The standard method to measure the causal relationship between two variables is the Average Treatment Effect (ATE) [Rubin1974]. The term ATE refers to the average outcome change that a certain intervention (which is called treatment) can make in a population in contrast to not making the intervention.
The “fundamental problem of causal inference” [Holland] is that each individual person in the population can only be assigned to either the treatment group (the group who receives the treatment) or the control group (the group who doesn’t receive the treatment). Therefore, the outcome of an individual person given the treatment and that of the same person not given the treatment cannot be observed in the same trial. As a result, half of the the required data for estimating the ATE is unobservable.
In [dawid], the author argued that estimating the unobserved potential outcomes can result in erroneous or metaphysical conclusion that are not substantiated by the data. Thus solutions for the “fundamental problem of causal inference” are dubious and cannot be supported by evidences in the experiment. [Pearempicism] and [shpitser] argued against this paradigm by providing a framework that given some structural information about the causal relationships in the system, identifies cases where the unobserved potential outcomes can be discerned by observations. Their arguments support the claim that the estimation of unobserved potential outcomes is a mathematical, not metaphysical, question.
Randomized controlled trials (RCTs) are the gold standard for conducting quantitative experimental science. RCT experimental design consists of recruiting a study population and splitting the participants into two groups: treatment and control. The difference between the average outcomes of the two groups are then compared to estimate the average treatment effect (ATE).
A reliable RCT should have external validity [shadish]
, meaning that the trial should be generalizable to different settings and populations. External validity requires a diverse population of participants in the trial, which in turn increases the variance of the measured ATE for different treatment assignments.
Using randomized experiments, [atheyreview]
proved that the difference between the average outcomes of the treatment and control groups is an unbiased estimator of the ATE over the population. However, although it is an unbiased estimator, it is a single observation estimate (since the trial is typically conducted only once), which makes it prone to selection bias.
Covariate balance measures (CBMs) are commonly used to evaluate the validity and reliability of the results in an RCT. CBMs assess the similarity between the covariates’ distribution among the treatment and control group. Means difference of the covariates in the treatment and the control group is a CBM which is commonly used to evaluate the reliability and validity of the selected treatment assignment in RCTs [atheyreview, Thereview].
In [atheyreview]
the authors provided a confidence interval for the ATE estimation. However, first, this confidence interval does not provide information about the worstcase estimation error. Second, the provided confidence interval works only for trials where the treatment assignment is randomly selected. If covariate balancing methods were deployed in the RCT, then the possibility for different treatment assignments would not be uniform and the confidence interval would be different. It is not clear how covariate balancing methods decrease the worstcase error in ATE estimation.
For a given population and a given CBM, the set of admissible treatment assignments contains all treatment assignments that are evaluated as sufficiently balanced by the CBM. The adversarial treatment assignments are admissible treatment assignments that result in noticeably large errors in the estimated ATE (see Fig 1). A given RCT is said to be adversarially vulnerable for a given CBM if there exist adversarial treatment assignments for it.
Adversarial vulnerability opens up potentials for (intentionally or unintentionally) untraceable deviations in the RCT that can degrade the estimated ATE. If any of the adversarial treatment assignments is selected and used for the RCT, the resulting ATE estimate will not only be erroneous, but also hard to be detected as unreliable. In some sensitive applications like clinical trials, the adversarial vulnerability is intolerable because it can lead to false and unreliable decisions based on untraceably falsified clinical results. One might argue that the adversarial treatment assignments have negligible chance of getting selected. We argue against that by pointing out that for the trials where covariate balancing methods are used, the set of admissible treatment assignments is relatively small and the posterior probability of adversarial treatment assignments getting selected is not negligible anymore.
In this work, we empirically demonstrate for the first time that the means difference CBM is not a reliable measure to evaluate the selected treatment assignments in an RCT. To do this, we propose a method that finds the adversarial treatment assignments in some frequently used covariate balancing methods that we dub the Adversarial Treatment ASsignment in TREatment Effect Trials (ATASTREET). To illustrate our general arguments about the used CBM, we use semisynthetic IHDP1000 [Gr, Hill2011, Sh] dataset which provides both potential outcomes for each participant. Our proposed approach to find the adversarial treatment assignments is not directly applicable to all the RCTs, but our main arguments may generalize to all the RCTs with similar CBMs.
Using ATASTREET, we found adversarial treatment assignment of the means difference CBM on IHDP1000 dataset (Fig 1). These found adversarial treatment assignments could have been selected in the IHDP trial, and estimate the ATE with an unacceptable error of 1.1 (more than ). Note that the means difference CBM evaluates these treatment assignments as admissible. This shows our claim that means difference CBM is not a reliable measure for validity of trials.
We summarize our contributions as follows. First, we formally define the adversarial treatment assignments of RCTs and argue why it is important to use adversarially robust covariate balance measures. Second, we propose an optimization based algorithm to demonstrate the adversarial vulnerability of the means difference CBM.
Ii Background
The ATE is defined using the potential outcome framework [Rubin1974]. For each person in the population, we call the potential outcomes of that individual being assigned to the treatment or the control group . The ATE is defined as the average of the differences of the potential outcomes for all the individuals over the population
(1) 
where is the population size.
In a trial to measure the ATE of a certain treatment (intervention), a treatment assignment divides the population to either the treatment group or the control group. For each individual, the is the the observed outcome based on the selected treatment assignment.
(2) 
In the random treatment assignment method (also known as the randomized controlled experiment)[atheyreview], The trial is conducted using a randomly selected treatment assignment . The Measured Average Treatment Effect (MATE) is then defined as:
(3) 
where and are the number of individuals assigned to the control and treatment groups, respectively.
Given the population, [atheyreview] proved that the introduced MATE is an unbiased estimator of the ATE. It means that the expected value of MATE over the random treatment assignment is equal to the true value of ATE.
Since the goal of RCT is to generalize its findings to a broader population, it is important to recruit a population that represents the diversity in the bigger population which the trial aims to generalize to. [shadish] argues this external validity is of high importance in experimental designs. The need to use a diverse population promotes a variance in the MATE for different random treatment assignments. Unfortunately, it is not possible to control this variance by sacrificing diversity as it would question the generalizability of the trial.
Although the introduced estimator is unbiased, it is a single observation estimate since the trial is typically conducted once. As a result, the variance promotes uncertainty is the estimated ATE. Another possible way to control the variance in MATE is to increase the population size used in the RCT. However, the variance can still be undesirably large for the affordable population size. In order to empirically show this issue, we measured the MATE for 10000 different random treatment assignments in the IHDP dataset [Gr, Hill2011, Sh] for different subpopulation sizes. Figure 2
shows the empirical probability density distribution of the MATE. Clearly, the variance shrinks as the population size grows; however, variance is still undesirable for the affordable population sizes (in this case
).In [atheyreview] several reasons are provided to justify the need to validate the selected treatment assignment before or after the RCTs. Since the trial is often conducted once, only one treatment assignment can be used for the trial. Thus it is of high importance to ensure that one selected treatment assignment is selected properly. Even in the case of proper randomization, it may be informative to check whether the selected treatment assignment has imbalanced covariates by chance. Furthermore, it is common in practice that some participants dropout before the trial is finished. This dropouts make the trial population different from the original population which was used in the randomization. Which in turn might induce a selection bias. For all of the mentioned reasons, it is important to use CBMs in RCTs. Also, results of conducted trials should be validated by evaluating the CBM of the selected treatment assignment.
There have been numerous efforts to reduce the estimation variance. A family of such efforts is the covariate balancing methods, in which the more balanced treatment assignments have a higher chance to be selected for the trial. In covariate balancing methods, all of the variables that are expected to be related to the outcome are recorded for the population as the covariates. Covariate balancing methods try to favor treatment assignments that have more similarity between the covariates’ distributions in the treatment group and the control group. Since the treatment and the control group are “similar" in such balanced treatment assignments, selection bias can thereby be reduced.
The first step of any covariate balancing methods in RCTs includes recording the covariates for the whole population. Then, balanced treatment assignments are found by minimizing the covariate imbalance among the two groups. In the next stage, the trial is conducted according to the obtained balanced treatment assignment. The MATE, then, is calculated afterwards.
Covariate balancing methods require a covariate balance measure (also referred to as the balancing score) that evaluates the similarity of the covariate distributions of the control and treatment groups. The maximal balance is defined to be the treatment assignment that has the optimum covariate balance measurement.
Numerous covariate balancing methods have been proposed for a given CBM. A treatment assignment can be selected randomly and then a greedy minimization modifies the treatment assignment until it reaches a desirable covariate balance measurement[Thereview]. Alternatively, a treatment assignment can be selected randomly and if it does not reach a desirable covariate balance measurement, then the whole randomization process is repeated until a treatment assignment with a desired covariate balance measurement is reached [Thereview]. Another options is that one exhaustively checks all the possible treatment assignments in order to find the treatment assignment that is maximally balanced. Alternatively, one can find a set of desirable treatment assignments, and then select one of them randomly.
One of the common used CBMs is the difference of the means of each covariate between the treatment and the control group. In order to avoid scaling issues, this CBM standardizes the difference of the means of each covariate by the variance of that covariate [Thereview, rosenbaum1985b]. In [rubin2001] three different CBMs are proposed based on the propensity score as a scalar representation for the covariates of each individual. Using the propensity score concept, the three proposed CBMs are 1) the difference of means of the propensity scores normalized to the variances, 2) the ratio of the variance of the propensity scores in the control and the treatment group, and finally, 3) the the ratio of the variance of each covariate orthogonal to the propensity score in the treatment and the control group.
The motivation behind the means difference CBM as well as the other three proposed CBMs is that a good CBM should evaluate the similarity of the distributions of the covariates between the treatment and control groups. In cases with large numbers of covariates, especially when the population size is limited, estimating the higher order moments of the distributions gets harder and such higher order moments’ comparisons become less informative. Therefore, comparing the means and the variances of covariates in the treatment and the control group is favored over other statistics. Several authors also suggested that the means difference CBM is a common way to evaluate the balance in the covariates
[atheyreview, Thereview].Iii What is adversarial treatment assignment?
To show that covariate balance methods and covariate balance measures are vulnerable, we first provide a formal definition for the admissible treatment assignment. Then, the adversarial treatment assignment is defined as an admissible treatment assignment that leads the noticeable errors in the MATE.
The covariate balance measure (CBM) (also referred to as the balancing score) is a scalar function that returns the amount of covariate imbalance of a given treatment assignment. Note that a higher covariate balance measurement means that the treatment assignment is more imbalanced.
The expected imbalance is defined as the expected value of the covariate balance measurement over all the possible treatment assignments in the trial.
The minimum imbalance is defined as the minimum value of the covariate balance measurement over all the possible treatment assignments in the trial.
The admissible treatment assignments set is defined as the set of all the treatment assignments that have a covariate balance measurement less then . Where is a parameter that controls the amount of balance induced by the CBM. The larger relaxes the covariate balancing and allows for more covariate imbalance in the selected treatment assignment.
(4) 
Adversarial treatment assignments are the admissible treatment assignments that lead to a noticeable error in ATE estimation. Note that a randomly selected treatment assignment without covariate balancing is highly likely to have less estimation error.
(5) 
where
is standard deviation of the ATE estimation over all possible treatment assignments
(6) 
Iv How to find adversarial treatment assignments?
Following the discussion in the previous section, the means difference is one of the mostly used CBMs. We demonstrate its vulnerability against adversarial treatment assignments.
In this work, we use and the
for vector norms in order to be able to compare covariate balance measurements of different treatment assignment in cases with more than one covariate. We define the CBMs
and as(7) 
We assume that all of the covariates have the same variance without loss of generality. If that is not the case, one can simply divide each covariate by its variance. Also, it is possible to give arbitrary weights to the covariates in order to make balancing more sensitive in some covariates than the others.
We are interested in finding the set of adversarial treatment assignments of the used CBM in the trial.
Problem: Assume that the potential outcomes of assigning each person to the treatment or the control group are provided for a population size of . The potential outcome for the person being assigned to the treatment group or the control group is and , respectively. For each person in the population, covariates are provided as an dimensional vector . The objective is to find the treatment assignment dividing the population into two groups with equal sizes such that it maximizes the MATE and minimizes the covariate balancing measurement .
(8) 
We obtain ATASTREET solution using mixed linear integer programming [milp] as the solution to this problem.
Each treatment assignment corresponds to a MATE and a . We visualize different treatments assignments as points in a 2D space having the MATE and the as its basis (this space is visualized in Figures 3 to 6.
We tested our solution for the proposed problem on the IHDP dataset for different values of for both and . The resulting treatment assignments obtained from the ATASTREET are visualized in Figure 3 with black dots. In order to compare results of ATASTREET with other treatment assignments, a set of treatment assignments are selected randomly and visualized as blue points in Figures 3 to 6.
We discuss implications of our results in 6 different arguments.

The given CBM is vulnerable against adversarial treatment assignments. Analysing the ATASTREET’s resulting treatment assignments for different values of reveals some of the adversarial treatment assignments in (See Figure 5). Therefore, it is possible to find admissible treatment assignments where groups are wellbalanced, but the MATE has error higher than . This shows that covariate balancing methods with the standardized means difference CBM fail to exclude treatment assignments with large ATE estimation error.

Followed by the previous argument, the given CBM is vulnerable against deceitful activities or unintended deviations. This adversarial vulnerability opens up unwanted potentials for untraceable deviations (intended or unintended) with considerable effects on the MATE. Restricting such potentials is very important in some applications like medical trials. In Figure 5, the corresponding treatment assignment of the marked point is admissible with regards to having balanced covariates, yet yields a highly deviated MATE. CBMs admit a small subset of all the possible treatment assignments. The probability of each admissible treatment assignments getting selected in covariate balancing methods is considerably higher than that treatment assignment getting selected randomly in the absence of covariate balancing methods. Thus, it is a wrong reasoning that the possibility of adversarial treatment assignments getting selected in covariate balancing methods is negligible because there aren’t many of them. Also, different methods of minimizing a given CBM in RCTs induce different posterior possibilities for each admissible treatment assignment getting selected. For this reason, it is important to bound the worstcase estimation error using adversarially robust CBMs.

The ATE estimation based on the balanced assignments is not unbiased anymore. The MATE estimation is unbiased when the treatment assignment is selected uniformly random from all the possible treatment assignments. However, CBMs only allow admissible treatment assignments and the ATE is estimated using a nonuniform randomly selected treatment assignments. Thus, there is no guarantee that the estimator is unbiased anymore. In this work, We empirically observed the estimation bias. The true value for the ATE equals to the expected value of MATE over all possible treatment assignments, which in our experiment is 4.16 whereas the MATE of the most balanced assignment using the and the is 3.99 and 4.10, respectively. These are the maximally balanced treatment assignments for the respective CBMs. The gap between the resulting MATEs and the true ATE suggests that the estimation based on the CBMs is not unbiased.

In a very similar approach, one can modify ATASTREET to acquire the set of adversarial treatment assignments for different CBMs, and compare their adversarial robustness. Doing this for different RCTs allows researchers to empirically find CBMs that are adversarially robust, and use them as reliable CBMs in RCTs.

Our empirical results for different choice of and as the CBM suggests that our arguments do not depend on the vector norm used in the CBM. We infer that the observed vulnerability is inherent in the means difference CBM, and not the vector norm that we used.

For , ATASTREET finds the treatment assignment which has the minimum covariate balance measurement. It is usable in cases where one wants to find admissible treatment assignments given the parameter . In another language, it provides the stop criterion for covariate balancing methods.
ATASTREET solves the constrained combinatorial optimization by using Mixed Integer Linear Programming (MILP)
[milp] which is a well studied problem in combinatorial optimization and can be done in an acceptable time[milp, KLOTZ, Narendra, NAU, Clausen, Land, Bader2005].V Geometrical Interpretation
Looking at Figure 6
, the joint probability density function of (
,MATE) is ellipsoidal. The contour lines of all the point with is almost ellipsoid. The less the value of is, the radius of the ellipsoid scales larger and larger until it hits the margin. The solid black dots are all located on the margin. The geometrical interpretation of our findings is that the ellipsoid shape of the contour flattens at one end as we move towards the origin. This flat region at the end is the vulnerable region where there are points having the same but highly deviated MATE values.Vi Conclusions
In this work, we have provided arguments to demonstrate that covariate balancing with the standardized means difference, one of the mostly used approaches to reduce selection bias in RCTs, is vulnerable against adversarial treatment assignments (Figure 4). In order to demonstrate this vulnerability, we proposed ATASTREET to find wellbalanced treatment assignments where the covariate balancing fails in preventing large errors in the MATE. This is a significant drawback, the means difference CBM cannot be used to evaluate the reliability of results in RCTs. It opens up opportunities for deceitful activities to exploit adversarial treatment assignments in order to deviate the measured average treatment effect towards a desired direction.
Our work suggests interesting future research directions. One direction is to evaluate the adversarial robustness of other existing covariate balancing methods. Another direction is how to find a CBM with the best adversarial robustness. Such method is desirable in cases where the nature of the trial has a high importance level that brings the need to use a method which is robust against any deceitful action (e.g. clinical trials in deadly pandemics).
Vii Acknowledgements
This work was supported by NSF grants 1842378, 1937134, CCF1911094, IIS1838177, and IIS1730574; ONR grants N000141812571, N000142012534, and MURI N000142012787; AFOSR grant FA95501810478, and a Vannevar Bush Faculty Fellowship, ONR grant N000141812047.