    # On doubly robust estimation for logistic partially linear models

Consider a logistic partially linear model, in which the logit of the mean of a binary response is related to a linear function of some covariates and a nonparametric function of other covariates. We derive simple, doubly robust estimators of coefficient for the covariates in the linear component of the partially linear model. Such estimators remain consistent if either a nuisance model is correctly specified for the nonparametric component, or another nuisance model is correctly specified for the means of the covariates of interest given other covariates and the response at a fixed value. In previous works, conditional density models are needed for the latter purposes unless a scalar, binary covariate is handled. We also propose two specific doubly robust estimators: one is locally-efficient like in our class of doubly robust estimators and the other is numerically and statistically simpler and can achieve reasonable efficiency especially when the true coefficients are close to 0.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Generalized partially linear models are a semiparametric extension of generalized linear models (McCullagh & Nelder 1989), such that the conditional mean of a response variable

is related to a linear function of some covariates and a smooth function of other covariates . Let

be independent and identically distributed observations from the joint distribution of

. Consider the following model

 E(Y|Z,X)=Ψ{βTZ+g(X)}, (1)

where is an inverse link function,

is a vector of unknown parameters,

is an unknown, smooth function. Estimation in such models has been studied in such models in at least two approaches. In one approach, theory and methods have been developed in the case where is low-dimensional (for example, a scalar) and kernel or spline smoothing is used to estimate at suitable rates of convergence (e.g., Speckman 1988; Severini & Staniswalis 1994). In another approach with relatively high-dimensional, doubly robust methods have been proposed to obtain estimators of which remain consistent and asymptotically normal at rate

if either a parametric model for

or another parametric model about, for example, is correctly specified (Robins & Rotnitzky 2001; Tchetgen Tchetgen et al. 2010).

In this note, we are concerned with model (1) with a binary response (taking value 0 or 1) and a logistic link, hence a logistic partially linear model:

 P(Y=1|Z,X)=expit{βTZ+g(X)}, (2)

where . We provide a new class of doubly robust estimators of which remain consistent and asymptotically normal at rate if either a parametric model for or a parametric model for is correctly specified, under mild regularity conditions but without additional parametric or smoothness restriction.

Previously, doubly robust estimators of were derived in model (1) with respect to parametric models for and , in the case of an identity link, , or a log link, (Robins & Rotnitzky 2001). For the logistic link, however, no doubly robust estimator of can be constructed in this manner with respect to parametric models about and (Tchetgen Tchetgen et al. 2010). In fact, doubly robust estimators of in model (2) were obtained with respect to parametric models about and , the conditional density of given and (Chen 2007; Tchetgen Tchetgen et al. 2010). Therefore, our result in general allows doubly robust estimation for in model (2) with respect to more flexible nuisance models about the conditional mean than about the conditional density . In the special case of binary , our class of doubly robust estimators of is equivalent to that in Tchetgen Tchetgen et al. (2010), but involves use of the parametric model for in a more direct manner.

We also propose two specific doubly robust estimators of in model (2) based on efficiency considerations. The first estimator requires numerical evaluation of expectations under a model for beyond the conditional mean unless

is binary, but can be shown to achieve the minimum asymptotic variance among our class of doubly robust estimators when both models for

and are correctly specified. Compared with the locally efficient, doubly robust estimators in Tchetgen Tchetgen et al. (2010), this estimator remains consistent if the model for is misspecified but the less restrictive model for is correctly specified. Our second estimator is numerically and statistically simpler than our first one: it does not involve numerical integration or a parametric specification of the conditional density , and can achieve a similar asymptotic variance as our first estimator, especially when the true value of is close to 0.

## 2 Doubly robust estimation

For a semiparametric model, doubly robust estimation can often be derived by studying the orthogonal complement of the nuisance tangent space (Robins & Rotnitzky 2001). Denote by the Hilbert space of functions , with the inner product defined as . Denote , , and by and the truth of and . For model (2), the orthogonal complement of the nuisance tangent space is known to be (Bickel et al. 1993; Robins & Rotnitzky 2001)

 Λ⊥ ={ε∗(h−E[hπ∗(1−π∗)|X]E[π∗(1−π∗)|X]):h≡h(Z,X) unrestricted }∩L2. (3)

Our first result is a reformulation of as follows. See the Appendix for all proofs.

###### Proposition 1.

Assume that almost surely. The space can be equivalently expressed as

 Λ⊥ ={ε∗(h−E[hπ∗|Y=0,X]E[π∗|Y=0,X]):h≡h(Z,X) unrestricted }∩L2 (4) ={ζ∗0(u−E[u|Y=0,X]):u≡u(Z,X) % unrestricted }∩L2, (5)

where is a function and

 ζ∗0=ε∗π∗=Y1−π∗π∗−(1−Y)=Ye−β∗TZ−g∗(X)−(1−Y).

Our reformulation (5) suggests the following set of doubly robust estimating functions. Let be a parametric model for and, independently, be a parametric model for . The two functions and are variation independent, because and are variation independent (Chen 2007). For a function , define

 r(Y,Z,X;β,α,γ,ϕ)={Ye−βTZ−g(X;α)−(1−Y)}ϕ(X){Z−f(X;γ)}, (6)

by letting in (5). Then

is an unbiased estimating function for

if either model or is correctly specified.

###### Proposition 2.

If either for some or for some , then

 E{r(Y,Z,X;β∗,α,γ,ϕ)}=0,

provided that the above expectation exists.

Various doubly robust estimators can be constructed through (6). In general, let be an estimator of , for example, the maximum likelihood estimator, which satisfies for some constant and influence function such that if model is correctly specified. Let be an estimator of , for example, the least-squares or related estimator, which satisfies for some constant and influence function such that if model is correctly specified. Define an estimator as a solution to

 1nn∑i=1r(Yi,Zi,Xi;β,^α,^γ,ϕ)=0.

Under suitable regularity conditions (e.g., Manski 1988), it can be shown that if either model or is correctly specified, then

 ^β(ϕ)−β∗ =H−1nn∑i=1{r(Yi,Zi,Xi;β∗,¯α,¯γ,ϕ) −B1s1(Yi,Zi,Xi;¯α,¯β)−B2s2(Yi,Zi,Xi;¯γ)}+op(n−1/2), (7)

where , , and . The asymptotic variance of can be estimated by using the sample variance of an estimated version of the influence function in (7).

We now provide several remarks. First, estimating function (6) can be expressed as

 r(Y,Z,X;β,α,γ,ϕ)={Yπ(Z,X;β,α)−1}ϕ(X){Z−f(X;γ)}, (8)

where

, representing the conditional probability

under the conjunction of model (2) and model . Therefore, our doubly robust estimating function involves the product of two “residuals”, and . Similar products can also be found in previous doubly robust estimating functions for in model (1) with the identity or log link (Robins & Rotnitzky 2001). However, a notable feature in (8) is that the residual used from the model is , associated with the estimating equation for calibrated estimation (Tan 2017), which in the case gives

 1nn∑i=1{Yπ(Z,X;β,α)−1}(ZT,XT)T=0.

The standard residual from logistic regression is

, associated with the score equation for maximum likelihood estimation, which in the case gives

 1nn∑i=1{Y−π(Z,X;β,α)}(ZT,XT)T=0.

In general, the estimating function is not unbiased for if model is correctly specified but model is misspecified.

Second, our results can also be used to shed light on the class of doubly robust estimators in Tchetgen Tchetgen et al. (2010), which are briefly reviewed as follows. For model (2), the conditional distribution of jointly given can be determined as (Chen 2007)

 p(y,z|X)=c−1(X)eβT(z−z0)yp(z|Y=0,X)p(y|Z=z0,X), (9)

where is some fixed value (assumed to be 0 hereafter), , and the conditional densities and are variation-independent nuisance parameters. Let be some pre-specified conditional densities and . By using (9), the ortho-complement of the nuisance tangent space in model (2) can be characterized as (Tchetgen Tchetgen et al. 2010)

 Λ⊥={[d(Y,Z,X)−d†(Y,Z,X)]p†(Y,Z|X)p(Y,Z|X):d(Y,Z,X) unrestricted }∩L2, (10)

where for , and denotes the expectation under . It can be verified by direct calculation that the two sets on the right hand sides of (3) and (10) are equivalent to each other: each element in the right hand side of (10) can be expressed in the form of elements in the right hand side of (3), and vice versa. Let or equivalently be a parametric model for or , and let be a parametric model for . For a function , the estimating function based on (10) in Tchetgen Tchetgen et al. (2010) can be equivalently defined, based on (3), as

 τ(Y,Z,X;β,α,θ,h)={Y−π(Z,X;β,α)}{h(Z,X)−E[hπ(1−π)|X;β,α,θ]E[π(1−π)|X;β,α,θ]}, (11)

where and denotes the expectation under the law defined as (9), but evaluated at and . The estimating function (11) is doubly robust, i.e. unbiased for if either model or is correctly specified. Although (11) appears to be asymmetric in and , the double robustness of (11) follows from that of its equivalent version based on (10), as shown by exploiting the symmetry in and in Tchetgen Tchetgen et al. (2010). See also Tchetgen Tchetgen and Rotnitzky (2011) for an explicit demonstration of symmetry of (11) in and with in the case of a binary .

As an interesting implication of our reformulation (4) in Proposition 1, the estimating function (11) can be equivalently expressed as

 τ(Y,Z,X;β,α,θ,h)={Y−π(Z,X;β,α)}{h(Z,X)−E[hπ|Y=0,X;θ]E[π|Y=0,X;θ]}, (12)

which involves the expectation under , instead of under the law (9) evaluated at and . Therefore, (12) is computationally much simpler than (11) and its equivalent version based on (10). Moreover, the double robustness of (12) with respect to and can be directly shown as in the Appendix, without invoking its equivalent version based on (10).

Third, we compare our doubly robust estimating functions with those in Tchetgen Tchetgen et al. (2010). For a function , consider the estimating funtion

 τ′(Y,Z,X;β,α,θ,u)={Yπ(Z,X;β,α)−1}{u(Z,X)−E[u|Y=0,X;θ]}. (13)

By our reformulation (5), the class of estimating functions over all possible choices of is equivalent to that of over all possible choices of as used in Tchetgen Tchetgen et al. (2010). A subtle point is that the mapping between and depends on , but this does not affect our subsequent discussion. Similarly as (12), the estimating function (13) can be shown to be doubly robust for with respect to models and .

By comparing (6) and (13), we see that our estimating function (6) corresponds to a particular choice of estimating function (13) with , such that (6) depends only on a parametric model for the conditional expectation , but not the conditional density . Therefore, our class of (6) is in general a strict subset of the class of (13) to achieve double robustness with respect to conditional mean models for , except when is binary and hence the classes of (6) and (13) are equivalent.

Fourth, there is a similar characterization of as in Proposition 1, involving expectations under instead of . By symmetry, it can be shown that

 Λ⊥ ={ε∗(h−E[h(1−π∗)|Y=1,X]E[1−π∗|Y=1,X]):h≡h(Z,X) unrestricted }∩L2 ={ζ∗1(u−E[u|Y=1,X]):u≡u(Z,X) % unrestricted }∩L2,

where is a function and . Consequently, a similar estimating function as (6) can be derived such that it is doubly robust for with respect to parametric models for and .

## 3 Efficiency considerations

For our class of doubly robust estimating functions (6), we study how to choose the function based on efficiency considerations. First, the following result gives the optimal choice of with correctly specified models and .

###### Proposition 3.

If both models and are correctly specified for and respectively, then the optimal choice of in minimizing the asymptotic variance of which admits asymptotic expansion (7) is

 ϕ\tiny opt(X) =E[(Z−E(Z|Y=0,X))⊗2|Y=0,X] ×E−1[π∗−1(Z,X)(Z−E(Z|Y=0,X))⊗2|Y=0,X],

where for a column vector .

From this result, it is straightforward to derive a locally-efficient like, doubly robust estimator for . Let be the maximum likelihood estimator in the model , and be the maximum likelihood estimator in a conditional density model as in (11) but compatible with model for , where and is a variance parameter. Consider the estimator with

 ^ϕ\tiny opt(X) =E[(Z−f(X;^γ)⊗2)|Y=0,X;^θ] ×E−1[π−1(Z,X;^β,^α)(Z−f(X;^γ))⊗2|Y=0,X;^θ].

Then it can be shown under suitable regularity conditions that is doubly robust, i.e. remains consistent for if either model or is correctly specified, and achieves the minimum asymptotic variance among all estimators when both models and including are correctly specified.

It is interesting to compare with the locally efficient, doubly robust estimator for in Tchetgen Tchetgen et al. (2010). For a function , define an estimator as a solution to , where are maximum likelihood estimators as above or, without affecting our discussion here, profile maximum likelihood estimators as in Tchetgen Tchetgen et al. (2010). Then the optimal choice of in minimizing the asymptotic variance of is . In fact, the estimator is locally efficient, i.e. achieving the semiparametruc variance bound in model (2) when both models and are correctly specified. Unless is binary, this semiparametric variance bound is in general strictly smaller than the asymptotic variance achieved by when both models and are correctly specified, because the class of estimating functions (6) is strictly a subset of the class (11), (12), or (13), as discussed in Section 2. In the case of a binary and hence , the two estimators and are equivalent. On the other hand, is doubly robust only with respect to models and , whereas is doubly robust with respect to and and hence remains consistent for if model is misspecified but the less restrictive model for is correctly specified.

Evaluation of the function and hence the estimator in general requires cumbersome numerical integration with respect to the density . For computational simplicity, consider the estimator with scalar . The corresponding estimating function can be shown to become

 r(Y,Z,X;β,^α,^γ,ϕ\tiny simp)=Ye−βTZ−(1−Y)eg(X;^α)1+eg(X;^α){Z−f(X;^γ)}. (14)

The particular choice can be motivated by the fact that if the true then . Then is nearly as efficient as and, by similar reasoning, also whenever is close to 0. This is analogous to how the easy-to-compute estimator is related to the locally efficient estimator in Tchetgen Tchetgen et al. (2010, Section 4). Moreover, the estimating function (3) can be equivalently expressed as

 r(Y,Z,X;β,^α,^γ,ϕ\tiny simp)=e−βTZY[Y−expit{g(X;^α)}]{Z−f(X;^γ)},

which, in the case of a binary , coincides with the estimating function underlying the closed-form estimator for in Tchetgen Tchetgen (2013).

## 4 Conclusion

We derive simple, doubly robust estimators of coefficients for the covariates in the linear component in a logistic partially linear model. Such estimators remain consistent if either a nuisance model is correctly specified for the nonparametric component of the partially linear model, or a conditional mean model is correctly specified for the covariates of interest given other covariates and the response at a fixed value. These estimators can be useful in conventional settings with a limited number of covariates. Moreover, there have been various works exploiting doubly robust estimating functions to obtain valid inferences in high-dimensional problems (e.g., Farrell 2015; Chernozhukov et al. 2018; Tan 2018). Our estimating functions can potentially be employed to achieve similar properties in high-dimensional settings.

## 5 Appendix

Proof of Proposition 1. First, we show that for any ,

 E[hπ∗(1−π∗)|X]=P(Y=0|X)E[hπ∗|Y=0,X].

This follows because

by the law of iterated expectations and then the law of total probability. Then the set (

3) is equivalent to (4). Next, the set (4) is equivalent to , and the set (5) is equivalent to . The two sets are equivalent to each other, by letting .

Proof of Proposition 2. By the law of iterated expectations, we have

 =E[(1−Y){eg∗(X)−g(X;α)−1}ϕ(X){Z−f(X;γ)}].

This immediately shows that if either or , then .

Proof of double robustness of (12). By the law of iterated expectations, we have

This immediately shows that if either or , then .

Proof of Proposition 3. Suppose that both models and are correctly specified, such that and . Then by direct calculation, and hence (7) reduces to

 ^β(ϕ)−β∗ =H−1nn∑i=1r(Yi,Zi,Xi;β∗,¯α,¯γ,ϕ)+op(n−1/2).

By the proof of Proposition 2, we actually have , where

 ϱ(Y,Z,X;β)={Ye−βTZ−g(X;¯α)−(1−Y)}{Z−f(X;¯γ)}.

Therefore, is asymptotically equivalent to a solution to , which can be seen as an estimator for

under the conditional moment condition

. By Chamberlain (1987), the optimal choice of in minimizing the asymptotic variance of such an estimator is , which can be simplified as by direct calculation.

References

Bickel, P.J., Klaassen, C.A.J., Ritov, Y., and Wellner, J.A. (1993) Efficient and Adaptive Estimation for Semiparametric Models, The Johns Hopkins University Press, Baltimore.

Chamberlain, G. (1987) “Asymptotic efficiency in estimation with conditional moment restrictions,” Journal of Econometrics, 34, 305-334.

Chen, H.Y. (2007) “A semiparametric odds ratio model for measuring association, Biometrics, 63, 413-421.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W.K., and Robins, J.M. (2018) “Double/debiased machine learning for treatment and structural parameters,”

Econometrics Journal, 21, C1-C68.

Farrell, M.H. (2015) “Robust inference on average treatment effects with possibly more covariates than observations.” Journal of Econometrics, 189, 1–23.

Manski, C.F. (1988) Analog Estimation Methods in Econometrics, Chapman & Hall, New York

McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models (2nd edition), Chapman & Hall, London.

Robins, J.M., and Rotnitzky, A. (2001) Comment on the Bickel and Kwon Article, “Inference for semiparametric models: Some questions and an answer,” Statistica Sinica, 11, 920-936.

Severini, T.A. and Staniswalis, J.G. (1994) “Quasi-likelihood estimation in semiparametric models,” Journal of the American Statistical Association, 89, 501-511.

Speckman, P. (1988) “Kernel smoothing in partial linear models,” Journal of the Royal Statistical Society, Ser. B, 50, 413-436.

Tan, Z. (2017) “Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data,” arXiv:1710.08074.

Tan, Z. (2018) “Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data,” arXiv:1801.09817.

Tchetgen Tchetgen, E.J. (2013) “On a closed-form doubly robust estimator of the adjusted odds ratio for a binary exposure,” American Journal of Epidemiology, 177, 1314-1316.

Tchetgen Tchetgen E.J. and Rotnitzky A. (2011) “Double-robust estimation of an exposure-outcome odds ratio adjusting for confounding in cohort and case-control studies,” Statistics in Medicline, 30, 335-347.

Tchetgen Tchetgen, E.J., Robins, J.M., and Rotnitzky, A. (2010) “On doubly robust estimation in a semiparametric odds ratio model,” Biometrika, 97, 171-180.