1 Introduction
The evaluation of the possible effects of a treatment on an outcome plays a central role in theoretical as well as applied statistical and econometrical literature; cfr. the excellent review papers by [3] and [12]. The main quantity of interest, traditionally, is the average effect of the treatment on outcome, or better the difference between the expected valued of outcomes for treated and control (untreated) subjects, i.e. (Average Treatment Effect). Another quantity of interest is the effects of treatment on outcome quantiles, which is summarized by (Quantile Treatment Effect). The main source of difficulty is that data are usually observational, so that the estimation of the treatment effect by simply comparing outcomes for treated vs. control subjects is prone to a relevant source of bias: receiving a treatment is not a “purely random” event, and there could be relevant differences between treated and control subjects. This motivates the need to account for confounding covariates.
In the literature, several different techniques have been proposed to estimate , under various assumptions (see [3], [12] and references therein). As far as is concerned, cfr. the paper by [9]. The problem of evaluating possible differences in the distribution function of potential outcomes with binary instrumental variables is studied in [1] via a KolmogorvSmirnov type test.
In the present paper we essentially focus on evaluating the possible effects of the treatment on the whole outcome probability distribution. The starting point is to use outcome weighting similar to those introduced in
[11] and [9]. Using this approach, estimates of the distribution function (d.f.) for treated and control subjects will be obtained. Such estimators essentially play a role similar to the empirical d.f. in nonparametric statistics. It will be shown that the resulting “empirical processes” weakly converge to an appropriate Gaussian process. Although it is non a Brownian bridge, it possesses several properties similar to the Brownian bridge (continuity of trajectories, etc.). These theoretical results are applied to the construction of confidence bands for the outcome distribution under treatment and under control, as well as to construct a new statistical test to compare treated and untreated subjects. In a sense, such a test is a version of the classical WilcoxonMannWhitney test for two groups comparison. Its main merit is to capture the possible difference between treated and untreated subjects even when is equal to zero. Another application of interest will be the construction of a test for stochastic dominance of treatment w.r.t. control, which is of interest, for instance, in programme evaluation exercises ([15]), welfare outcome, etc..The paper is organized as follows. In Section 2 the problem is described. In Section 3.2 the main asymptotic large sample results are provided, and in Section 4 approximations based on subsampling are considered. Particularizations to and are given in Section 5. Section 6 is devoted to the construction of confidence bands for the d.f. of outcomes, for both treated and untreated subjects. In Section 7 a Wilcoxontype statistic to test for treatment effect of the d.f of outcomes in introduced, and in Section 8 an elementary test for firstorder stochastic dominance of treated vs. untreated is studied. The finite sample performance of the proposed methodologies is studied via Monte Carlo simulation in Section 9.
2 The problem
Let be an outcome of interest, observed on a sample of subjects. Some of the sample units are treated with an appropriate treatment (treated group); the other sample units are untreated (control group). If denotes the treatment indicator variable, then whenever , is observed; otherwise, if , is observed. Here and are the potential outcomes due to receiving and not receiving the treatment, respectively. The observed outcome is then equal to . In the sequel, will denote the distribution function (d.f.) of , and the d.f. of .
As already said in the introduction, receiving a treatment is not a “purely random” event, as in experimental framework. On the contrary, there could be relevant differences between treated and untreated subjects, due to the presence of confounding covariates. In the sequel, we will denote by
the (random) vector of relevant covariates, that is assumed to be observed.
In order to get consistent estimates, identification restrictions are necessary. The relevant restriction assumed in the sequel is selection of treatment is based on observable variables: given a set of observed covariates, assignment either to the treatment group or to the control group is random. Formally speaking, let be the conditional probability of receiving the treatment given covariates ; it is termed propensity score. The marginal probability of being treated, , is equal to .
In the sequel, our main assumption is that the strong ignorability conditions (cfr. [18]
) are fulfilled. In more detail, consider next the joint distribution of (
), and denote by the support of . The following assumptions are assumed to hold.
Unconfoundedness (cfr. [19]): given , are jointly independent of : .

The support of , is a compact subset of .

Common support: there exists for which , so that , .
Assumption is also known as Conditional Independence Assumption ().
For the sake of simplicity, we will use in the sequel the notation
(1) 
From the above assumptions, the basic relationships
(2)  
are obtained.
The Average Treatment Effect (ATE) is defined as . The estimation of ATE is a problem of primary importance in the literature, and several different approaches have been proposed ([3] and references therein). Another parameter of interest in the Quantile Treatment Effect (QTE), which is the difference between quantiles of and : , with ; cfr. [9]. In particular, when it reduces to the Median Treatment Effect.
3 Estimation of
3.1 Basics
The basic approach to the estimation of , follows, in principle, the ideas developed in [11] to estimate ATE. First of all, the propensity score is estimated by a sieve estimator , say; cfr. [11], [9]. Let , be a dimensional vector of polynomials in , such that

;

;

includes all polynomials up to order whenever , with as .
The propensity score is approximated by a linear combination of
on a logit scale, with coefficients estimated by maximizing a pseudolikelihood. More formally, if
, then , where the dimensional vector is estimated by maximum likelihood method:In the sequel, the following result will be widely used.
Theorem 1.
Assume that S1  S3 are fulfilled, and that is continuously differentiable of order , with . If , with , then
(3) 
Proof. See [11].
Again, for notational simplicity, and similarly to , define:
(4) 
In order to estimate and , the following “Hájek  type” estimators are considered:
(5) 
where
(6) 
It is immediate to see that are proper d.f.s, i.e. they are bona fide estimators.
As alternative estimators of , , the following “HorvitzThompson  type” estimators could be considered:
(7) 
We will mainly concentrate on for two reasons. First of all, are not proper d.f.s, because , with positive probability. In the second place, as it will be seen in the sequel, are asymptotically equivalent to .
3.2 Basic asymptotic results
The goal of the present section is to study the asymptotic, large sample, properties of estimators . Our first result is a Glivenko  Cantelli type result, showing the uniform consistency (in probability) of , .
Proposition 1.
Assume that the conditions of Th. 1 are fulfilled. Then:
(8) 
Proof. See Appendix.
Next step consists in studying the limit, large sample distribution of the above estimators. Define first the stochastic process
(9) 
The bivariate stochastic process essentially plays the same role as the empirical process in classical nonparametric statistics, with a complication due to the presence of , instead of the usual empirical distribution function.
The weak convergence of can be proved similarly to the classical empirical process, with modifications. In the first place, from
and from Lemma 2 , it is seen that the limiting distribution of , if it exists, coincides with the limiting distribution of
(10) 
In the second place, by repeating verbatim the arguments in Th. 1 in [11], and [10], with instead of and instead of , it is seen that, if , with , then the relationship
(11) 
holds, where
(12) 
The term appearing in depends on , and, as it appears by using the bounds in [10], convergence in probability to zero (or better, to the vector ) holds uniformly over compact sets of s. Hence, in order to prove that the sequence of stochastic processes converges weakly to a limit process, it is enough to prove that converges weakly to a limiting process.
Proposition 2.
Assume that the conditions of Th. 1 are fulfilled, and that , , , are continuous. Then, the sequence of stochastic processes converges weakly, as goes to infinity, to a Gaussian process with null mean function (, ) and covariance kernel:
(13) 
where:
(14)  
(15)  
(16) 
Weak convergence takes place in the set of bounded functions equipped with the supnorm (if ) .
Proof. See Appendix.
Due to the continuity of , , the weak convergence of Proposition 2 also holds in the space of valued càdlàg functions equipped with the Skorokhod topology.
Consider now the HorvitzThompson estimators , and define:
From the proof of Proposition 2, it appears that the sequence of stochastic processes converges weakly to the same Gaussian limiting process that appears in Proposition 2. Hence, the HorvitzThompson estimators are asymptotically equivalent to the Hájek estimators .
As well known, in classical nonparametric statistics the empirical process converges weakly to a Brownian bridge, on the scale of the population ditribution function. The limiting process in Proposition 2 is not a Browinian bridge, of course, although it is a Gaussian process. However, it shares with the Brownian bridge an important property: it possesses trajectories that are a.s. continuous.
Proposition 3.
If and are continuous, the limiting process possesses trajectories that are continuous with probability 1.
Proof. See Appendix.
3.3 Differentiable functionals
The result of Proposition 2 can be immediately extended to general Hadamard differentiable functionals of , again assuming the continuity of , . Consider a general functional:
where is equipped with the norm metric and is a normed space equipped with a norm . As seen in Proposition 3, the limiting process concentrates on , where is the set of continuous functions on the extended real line . Note that functions in are bounded.
The functional is Hadamard differantiable at tangentially to if there exists a linear application
such that:
Using Theorem 20.8 in [20], we then have:
(17) 
In general, since is a linear functional of a Gaussian process, it is a Gaussian process, as well. In particular, if is a realvalued functional, then
has a Gaussian distribution with zero expectation and variance
(18) 
For the sake of simplicity, let be equal to . The above result can be rewritten as
(19) 
where the asymptotic variance is given by .
4 Subsampling approximation
Consider a functional
. In order to construct a confidence interval on the basis of
, a consistent estimate of the asymptotic variance is necessary. Unfortunately, apart a few cases, this is not simple, because could depend on , in a complicate way, and a direct estimation could not be possible. This is the case, for instance, of quantiles, that will be dealt with in next section. Here we briefly present a simple approach based on subsampling.Define , , and consider all the subsamples of size of . Let further be the statistic computed for the th subsample of size . Next, consider then the empirical distribution function of the quantities . In symbols:
(20) 
If:

;

depends on in such a way that , ;
then, using Th. 2.1 in [17], we have
(21) 
where is the distribution function of the Gaussian distribution. The convergence in (21) is uniform in .
Relationship tells us that can be (uniformly) approximated by , as and get large. From the continuity and strict monotonicity of , it follows that the empirical quantile converges in probability to the quantile of order of the distribution .
The number of subsamples of size , in can be very large, and then could be difficult to be computed. In this case a “stochastic” version of can be considered according to the following steps.

Select independent subsamples of size from .

Compute the corresponding values of the statistic .

Compute of the corresponding empirical distribution function:
(22)
It can be easily verified that if , and , then has the same limiting behaviour as . These results can be used to obtain confidence intervals for
and for testing statistical hypotheses
via inversion of confidence intervals. In more detail, letbe the th quantile of . It is easy to show that the interval:
(23) 
is confidence interval for of asymptotic level .
The confidence interval can be also used for testing the hypothesis:
If is in the confidence interval, then is accepted, otherwise it is rejected. Clearly, this is a test of asymptotic significance level .
5 Average and Quantile Treatment Effect
The results obtained so far allow one to reobtain, as special cases, results previously obtained by [11] and [9]. They are presented below.
5.1 Average Treatment Effect
The Average Treatment Effect (ATE, for short) is defined as:
(24) 
In the sequel, we will assume that and are both finite. As an estimator of , consider
(25)  
where the weights , are given by .
As it appears from , is a linear functional of and hence Hadamard differentiable. An integration by parts shows that the asymptotic distribution of coincides with that
that turns out to normal with zero mean and variance
It is not difficult to see that the estimator is asymptotically equivalent to that introduced in [11].
5.2 Quantiles and Quantile Treatment Effect
Let , be the quantile of order of , . In the sequel, we will assume that , are in the common support of , . Furthermore, we will denote by the support of , .
Suppose that , are continuous with positive density functions , , respectively:
As a consequence of the above assumption, is strictly monotonic (in its support).
Consider now () such that lie in the common support of , . It is intuitive to estimate the quantile by its “empirical counterpart”
(26) 
Let now be the set of the restrictions of the distribution functions in to , and let be the set of càdlàg functions in . From [20], it is seen that the map (from onto is Hadamard differentiable at tangentially to with derivative:
Using then Th. 20.8 in [20], (cfr. [7] for an equivalent approach), the process
(27) 
converges weakly as (on equipped with the norm) to a Gaussian process defined as:
(28) 
The stochastic process is a Gaussian process with zero mean function and covariance kernel:
Note that due to the symmetry of the Gaussian distribution.
In [9] the difference between corresponding quantiles:
(29) 
is considered. It is known as Quantile Treatment Effect (QTE, for short). From it is intuitive to estimate by
(30) 
The estimator is asymptotically equivalent to the estimator of QTE defined in [9]. In fact, from it appears that
(31) 
tends in distribution, as goes to infinity, to a Gaussian distribution with zero mean and variance:
(32)  
Comments
There are no comments yet.