1 Introduction
Approximately designunbiased modelassisted estimation is not new. It has become the standard practice in survey sampling, following many influential works such as Särndal et al. (1992), Deville and Särndal (1992). However, there lacks so far a theory, which allows one to generally incorporate the many common machinelearning (ML) techniques. For instance, according to Breit and Opsomer (2017, p. 203), they “are not aware of direct uses of random forests in a modelassisted survey estimator”. Since modern ML techniques can often generate more flexible and powerful prediction models, when rich auxiliary feature data are available, the potentials are worth exploring, in any situation where the practical advantages of linear weighting are not essential compared to the efficiency gains that can be achieved by alternative nonlinear ML techniques.
We propose a subsampling RaoBlackwell (SRB) method, which enables exactly designunbiased estimation with the help of linear or nonlinear prediction models. Monte Carlo (MC) versions of the proposed method can be used in cases where exact RB method is computationally too costly. The MCSRB method is still exactly designunbiased, despite it is somewhat less efficient due to the additional MC error. In practice, though, one can easily balance between the numerical efficiency of the MCSRB method against the statistical efficiency of the corresponding exact RB method.
The SRB method makes use of three classic ideas from Statistical Science and Machine Learning. On the one hand, the trainingtest split of the sample of observations in ML generates errors in the test set rather than residuals, conditional on the training dataset, which as we shall explain is the key to achieving exact designunbiasedness. For modelassisted survey estimation we use this idea to remove the finitesample bias. On the other hand, RaoBlackwellisation (Rao, 1945; Blackwell, 1947) and modelassisted estimation (Cassel et al., 1976) are powerful ideas in Statistics and survey sampling, which we apply to ML techniques to obtain designunbiased survey estimators at the population level.
We shall refer to the amalgamation as statistical learning
, since the term modelassisted estimation is entrenched with the property of approximate designunbiasedness (e.g. Särndal 2010; Breit and Opsomer, 2017), whereas the focus of populationlevel estimation and associated variance estimation is unusual in the ML literature.
In applications one needs to ensure designconsistency of the proposed SRB method, in addition to exact designunbiasedness. The property can readily be established for parametric or many semiparametric assisting models. But the conditions required for nonparametric algorithmic ML prediction models have so far eluded a treatment in the literature. Indeed, this has been a main reason preventing the incorporation of such ML techniques in modelassisted estimation from survey sampling. We shall develop general stability conditions for designconsistency under both simple random sampling and arbitrary unequal probability sampling designs.
For the first time, designunbiased modelassisted estimation can thereby be achieved generally in survey sampling. Wherever rich feature data are available, the approach of statistical learning developed in this paper enables one to adopt suitable ML techniques, which can make much more efficient use of the available auxiliary information.
The rest of the paper is organised as follows. In Section 2, we describe the SRB method that uses an assisting linear model. The underlying ideas of designunbiased statistical learning are explained, as well as the differences to the standard modelassisted generalised regression estimation. Some basic methods of variance estimation are outlined, where a novel jackknife variance estimator is developed for the SRB method. We move on to nonlinear ML techniques in Section (3). The similarity and difference to the bootstrap aggregating (Breiman, 1996b) approach are explored. Moreover, we investigate and prove the stability conditions for designconsistency of SRB method that uses nonparametric algorithmic prediction models. Two simulation studies are presented in Section 4, which illustrate the potential gains of the proposed unbiased statistical learning approach, compared to standard linear modelassisted or modelbased approaches. A brief summary and topics for future research will be given in Section 5.
2 Unbiased linear estimation
In this section we consider unbiased linear estimation in survey sampling, which builds on generalised regression (GREG) estimation (Särndal et al. 1992). The GREG estimator is the most common estimation method in practical survey sampling. It is consistent under mild regularity conditions, and is often more efficient than exactly unbiased HorvitzThompson (HT) estimation (Horvitz and Thompson, 1952). The proposes subsampling RaoBlackwellisation (SRB) method removes the finitesample bias of GREG generally, whose relative efficiency is comparable to the standard GREG estimator.
2.1 Bias correction by subsampling
Let be a sample (of size ) selected from the population of size , with probability , where over all possible samples under a given sampling design. Let be the sample inclusion probability, for each . Let be a survey variable, for , with unknown population total .
Let the assisting linear model expectation of be given by , where
is the vector of covariates for each
. Let be the estimator of , where is a weighted least squares (WLS) estimator of. It is possible to attach additional heteroscedasticity weights in the WLS; but the development below is invariant to such variations, so that it is more convenient to simply ignore it in the notation. Let
. The GREG estimator of is given asWhile is designconsistent under mild regularity conditions (e.g. Särndal et al. 1992), as , it is usually biased given finite sample size , except in special cases such as when and , where and .
To remove the potential finitesample bias of , consider subsampling of , with known probability , such as SRS with fixed , where . The induced probability of selecting from is given by
where is the corresponding inclusion probability for . Let be the complement of in . Let the conditional sampling probability of given be
and let be the corresponding conditional inclusion probability in for . Let be the estimate of based on the subsample , where . Let
(1) 
In other words, it is the sum of in and a difference estimator of the remaining population total based on , via that does not depend on the observations in .
Proposition
The estimator is conditionally unbiased for over given , denoted by , as well as unconditionally over , denoted by .
Proof: As is fixed for any given , the last two terms on the righthand side of (1) is unbiased for given . It follows that is conditionally unbiased for given ; hence, designunbiased over unconditionally as well.
Example: Simple random sampling (SRS)
Suppose SRS without replacement of from , and from with fixed size , such that and . In the special case of , is the sample mean in , and
which amounts to using the sample mean in to estimate the population mean outside of the given , i.e., instead of using the sample mean in for the whole population mean. Thus, achieves unbiasedness generally, but at a cost of increased variance.
2.2 RaoBlackwellisation
One can reduce the variance of by the RaoBlackwell method (Rao, 1945; Blackwell, 1947). The minimal sufficient statistic in the finite population sampling setting is simply . Applying the RB method to by (1) yields , which is given by the conditional expectation of given , i.e.
(2) 
where the expectation is evaluated with respect to , and the second expression is leaner as long as one keeps in mind that are treated as fixed constants associated with the distinctive units.
Proposition
The estimator is designunbiased for , denoted by .
Proof: By construction, the combined randomisation distribution induced by and is the same as that induced by and , for any and . Thus,
Next, for the variance of over , i.e. , we notice
since . Juxtaposing the two expressions of above, we obtain
(3) 
where is the variance reduction compared to .
Proposition
Provided unbiased variance estimator with respect to , i.e. , a designunbiased variance estimator for is given by
Proof: By stipulation, we have , which is the first term on the righthand side of (3). The result follows immediately.
Example: SRS, cont’d
In the special case of and , we have
if and denotes the mean in . The RB estimator follows as
which is the usual unbiased fullsample expansion estimator in this case. The RB method thus recovers the lost efficiency of any on its own.
Let , and . To express as a linear combination of , we rewrite as
where
It follows that the RB estimator (2) can be given as a linear estimator
(4) 
This has an important practical advantage that can be applied to produce numerically consistent crosstabulation of multiple survey variables of interest.
In the case of SRS of with , the RB weight in (4) is the average of ’s over possible subsamples , for a given unit , where when does not include the unit , otherwise is the corresponding GREG weight for , which is different for each of the rest subsamples that includes the unit .
2.3 Relative efficiency to GREG
Let and for . Expanding the GREG estimator around yields
For , the first two terms on the righthand side of (1) becomes if there exists a vector such that , in which case is a function of , i.e.
where is conditionally unbiased for given , and similarly for . Let and . We have , since and aim at the same population parameter, especially if is close to . In any case, expanding around yields
and
where if and 0 of . Thus, we obtain
(5) 
Notice that is a constant. Thus, compared to , the variance of involves that of in addition. As , the first term on the righthand side of (5) is provided , whereas the second term is if provided the usual regularity conditions for GREG. As long as the sampling fraction is small, the first term will dominate, in which case the variance of the RB estimator is of the same order as that of the GREG estimator .
Example: SRS, cont’d
Let , where . We have
Let be the population variance of . The variance of the firstterm in (5) is
which is actually smaller than the approximate variance of the GREG estimator under SRS, although the difference will not be noteworthy in practical terms, if the sampling fraction is small, since . Meanwhile, due to the additional variance of , the estimator by unbiased RB method can possibly have a larger variance than the biased GREG (with general ). It seems that one should use large if possible, to keep the additional variance due to small.
2.4 Deleteone RB method
The largest possible size of is . We refer to RaoBlackwellisation based on SRS of with as the deleteone (or leaveoneout, LOO) RB method. The conditional sampling design is not measurable in this case, in that one cannot have an unbiased variance estimator based on a single observation in . For an approximate variance estimator, we reconsider the basic case where form a sample of independent and identically distributed (IID) observations, in order to develop an analogy to the classic jackknife variance estimation (Tukey, 1958).
Denote by the population mean that is also the expectation of each , for . As before, let denote the mean in the subsample . Following (1), let
be the delete estimator of , where acts as an unbiased estimator of the population mean outside . The RB method yields the whole sample mean, denoted by
Observe that we have , where
(6) 
Thus, the RB estimator is the mean of an IID sample of observations , for , as in the development of classic jackknife variance estimation, so that we obtain
Notice that, in this case, the IID observations used for the classic development of jackknife method are given by instead of (6), where .
For the deleteone RB method based on (1) and (2) given auxiliary , we have , such that the estimator can be denoted by , based on , where it is simply the delete jackknife regression coefficients estimator. Rewrite the corresponding population total estimator by (1) as
such that the RB method yields by (2), as the mean of over . We propose a jackknife variance estimator for , given by
(7) 
where
Notice that it may be the case under general unequal probability sampling that the conditional inclusion probability given is not exactly known. However, in many situations where the sampling fraction is low, it is reasonable that
An approximate deleteone RB estimator following (2) can then be given as
(8) 
with for jackknife variance estimation on replacing by . Meanwhile, the deleteone jackknife replicates of GREG can be written as
The estimator is quite close to the approximate RBestimator (8); indeed, identical apart from in the special case of . This is not surprising, since the jackknifebased is an alternative for reducing the bias of the GREG estimator. The difference is that, provided is known, the proposed RB method will be exactly designunbiased, but not the jackknifebased . Finally, the resemblance between and is another indication that the relative efficiency of the deleteone RB method is usually not a concern compared to the standard GREG estimator .
2.5 Monte Carlo RB
Exact RaoBlackwellisation can be computationally expensive, when the cardinality of the subsample space (of ) is large. Instead of calculating the RB estimator exactly, consider the Monte Carlo (MC) RB estimator given as follows:
(9) 
where is the estimator based on the th subsample, for , which are realisations of from , such that is a Monte Carlo approximation of .
Proposition
The estimator is designunbiased for , denoted by .
Proof: The result follows from .
Adopting a computationally manageable entails an increase of variance, i.e. , compared to , so that the variance of is given by
(10) 
Due to the IID construction of , an unbiased estimator of is given by
This allows one to control the statistical efficiency of the MCRB method, i.e. the choice of is acceptable when is deemed small enough in practical terms.
Proposition
Provided unbiased variance estimator with respect to , i.e. , a designunbiased variance estimator for is given by
Proof: Due to the IID construction of , is an unbiased estimator of the first term on the righthand side of (10), while is an unbiased estimator of the second term. The result follows.
Finally, for the deleteone RB method, where unbiased variance estimator is not available now that , a practical option is to first apply the jackknife variance estimator (7) to the samples, as if where the exact RB estimator , and then add to it the extra term for the additional Monte Carlo error. This would allow one to use the Monte Carlo deleteone RB method in general.
3 Unbiased nonlinear learning
In this section we consider designunbiased estimation in survey sampling, which builds on arbitrary ML technique that can be nonlinear as well as nonparametric.
3.1 Designunbiased ML for survey sampling
Denote by the model or algorithm that aims to predict given . Let be the training set, and the test set. Let be the trained model based on , yielding as the corresponding predictor of given . Apply the trained model to yields the prediction errors of conditional on , denoted by . In contrast, the same discrepancy is referred to as the residuals of , when it is calculated for , denoted by , including when the training set is equal to . In standard ML, the errors in the test set are used to select different trained algorithms, or to assess how well a trained algorithm can be expected to perform when applied to the units with unknown ’s.
From an inference point of view, a basic problem with the standard ML approach above arises because one needs to be able to ‘extrapolate’ the information in to the units outside
, in order for supervised learning to have any value at all. This is simply because
are all observed and prediction in any form is unnecessary for . No matter how the trainingtest split is carried out, one cannot ensure valid for , unless is selected from the entire reference set of units, i.e. the population , in some noninformative (or representative) manner. This is the wellknown problem of observational studies in statistical science, which is sometimes recast as the problem of concept drift in the ML literature (e.g. Tsymbal, 2004).A design unbiased approach to Massisted estimation of population total can be achieved with respective to

a probability sample from , with probability , and

a probabilistic scheme for the trainingtest split given .
Explicitly, let be the estimator of obtained from the realised sample and subsample given the model . It is said to be design unbiased for , provided
Comments
There are no comments yet.