Collaborative targeted minimum loss inference from continuously indexed nuisance parameter estimators
Suppose that we wish to infer the value of a statistical parameter at a law from which we sample independent observations. Suppose that this parameter is smooth and that we can define two variation-independent, infinite-dimensional features of the law, its so called Q- and G-components (comp.), such that if we estimate them consistently at a fast enough product of rates, then we can build a confidence interval (CI) with a given asymptotic level based on a plain targeted minimum loss estimator (TMLE). The estimators of the Q- and G-comp. would typically be by products of machine learning algorithms. We focus on the case that the machine learning algorithm for the G-comp. is fine-tuned by a real-valued parameter h. Then, a plain TMLE with an h chosen by cross-validation would typically not lend itself to the construction of a CI, because the selection of h would trade-off its empirical bias with something akin to the empirical variance of the estimator of the G-comp. as opposed to that of the TMLE. A collaborative TMLE (C-TMLE) might, however, succeed in achieving the relevant trade-off. We construct a C-TMLE and show that, under high-level empirical processes conditions, and if there exists an oracle h that makes a bulky remainder term asymptotically Gaussian, then the C-TMLE is asymptotically Gaussian hence amenable to building a CI provided that its asymptotic variance can be estimated too. We illustrate the construction and main result with the inference of the average treatment effect, where the Q-comp. consists in a marginal law and a conditional expectation, and the G-comp. is a propensity score (a conditional probability). We also conduct a multi-faceted simulation study to investigate the empirical properties of the collaborative TMLE when the G-comp. is estimated by the LASSO. Here, h is the bound on the l1-norm of the candidate coefficients.
READ FULL TEXT