# Semiparametric response model with nonignorable nonresponse

How to deal with nonignorable response is often a challenging problem encountered in statistical analysis with missing data. Parametric model assumption for the response mechanism is often made and there is no way to validate the model assumption with missing data. We consider a semiparametric response model that relaxes the parametric model assumption in the response mechanism. Two types of efficient estimators, profile maximum likelihood estimator and profile calibration estimator, are proposed and their asymptotic properties are investigated. Two extensive simulation studies are used to compare with some existing methods. We present an application of our method using Korean Labor and Income Panel Survey data.

## Authors

• 26 publications
• 20 publications
09/11/2018

### A Profile Likelihood Approach to Semiparametric Estimation with Nonignorable Nonresponse

Statistical inference with nonresponse is quite challenging, especially ...
08/04/2019

### Full-semiparametric-likelihood-based inference for non-ignorable missing data

During the past few decades, missing-data problems have been studied ext...
04/27/2021

### Propensity Score Estimation Using Density Ratio Model under Item Nonresponse

Missing data is frequently encountered in practice. Propensity score est...
04/28/2020

### A lite parametric model for the Hemodynamic Response Function

When working with task-related fMRI data, one of the most crucial parts ...
03/09/2022

### A-Optimal Split Questionnaire Designs for Multivariate Continuous Variables

A split questionnaire design (SQD), an alternative to full questionnaire...
03/28/2019

### Consistency and Asymptotic Normality of Stochastic Block Models Estimators from Sampled Data

Statistical analysis of network is an active research area and the liter...
03/24/2022

### Estimating Viral Genetic Linkage Rates in the Presence of Missing Data

Although the interest in the the use of social and information networks ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Statistical analysis with missing data is an area under extensive research in recent years. To analyze partially missing data, the first step is to understand the response mechanism that causes missingness in the data. If the missingness for the study variable is conditionally independent of that variable, conditional on the other auxiliary variables, the response mechanism is called missing at random or ignorable, in the sense of Rubin (1976). Otherwise, the response mechanism is called missing not at random or nonignorable. It is more challenging to handle nonignorable nonresponse because the assumed response model cannot be verified from the observed study variables only and the response model may not be identifiable. Some identification assumptions are often made for the response model for the sake of making a valid inference from the incomplete data. Furthermore, the parametric model approach is known to be sensitive to the failure of the assumed parametric model (Kenward, 1998). To obtain a robust result, it is desirable to make the weakest possible model assumptions on the response mechanism. Kim and Shao (2013) contains a comprehensive review of the methods for parameter estimation under nonignorable nonresponse.

Instead of making parametric model assumptions for the response mechanism, we consider a semiparametric response model which allows more flexibility for the response mechanism. The semiparametric response model was first considered in Kim and Yu (2011), but the proposed method requires a validation sample for estimating model parameters. Shao and Wang (2016) also considered the same semiparametric model and proposed a parameter estimation method based on a calibration approach (Kott and Chang, 2008). The proposed method of Shao and Wang (2016) is not necessarily efficient. We consider more efficient parameter estimation methods under the same semiparametric response model.

We consider the following goals to meet in this paper. First of all, by taking the profile maximum likelihood estimation and profile calibration approach, we propose more efficient estimators under the semiparametric response model than the previous proposed estimator in Shao and Wang (2016)

. Next, we also propose a more efficient estimator for mean functional than the inverse probability weighted estimator used in

Shao and Wang (2016). However, the proposed method can be computationally heavy as it involves another nonparametric estimator for the outcome model in addition to the nonparametric part in the response model. To solve this problem, we propose an approach using a parametric working model for the outcome model in the same spirit of the generalized estimating equation (Liang and Zeger, 1986)

. Under such approach, when the working model is well-specified, the asymptotic variance form is not changed compared with an estimator when using a nonparametric model for the outcome model. Additionally, even when the working model is mis-specified, the estimators for the response probability and mean functional are still consistent.

The paper is organized as follows. In Section 2, the basic setup and the model are introduced. In Section 3, the proposed method for estimating the parameters in the semiparametric response model is presented. In Section 4, estimation of mean functional using the proposed method is discussed. In Section 5, results from two extensive simulation studies are presented. In Section 6, we present an application of the proposed method using Korean Labor and Income Panel Survey data. Some concluding remarks are made in Section 7.

## 2 Basic setup

Let

be a vector of random variables from

, completely unspecified, and is subject to missingness. We are interested in estimating from the independent observations of , , where is the response indicator function of , that is, if is observed and otherwise. If the response mechanism is ignorable in the sense that

 P(δ=1∣x,y)=P(δ=1∣x), (1)

then parameter estimation in the model can be made without an explicit model assumption for the response mechanism. If assumption (1) is believed to be unrealistic, then we often make a strong model assumption on , such as

 P(δ=1∣x,y)=π(x,y;ϕ) (2)

for some known function with unknown parameter

. For example, the logistic regression model

 π(x,y;ϕ)=exp(ϕ0+ϕ1x+ϕ2y)1+exp(ϕ0+ϕ1x+ϕ2y) (3)

can be used in the response model (2). Estimating the model parameter in (2) is challenging because the model can be non-identifiable. A sufficient condition for model identification is to assume that and is conditionally independent of given . Variable is called nonresponse instrumental variable (Wang et al., 2014).

Under the correct specification of the parametric response model and instrumental variable assumption, several methods are available for estimating the parameter . There are mainly four approaches: (1) maximum likelihood estimation approach (Riddles et al., 2016; Morikawa et al., 2017)

, (2) calibration approach based on Generalized Method of Moments

(Chang and Kott, 2008; Shao and Wang, 2016; Morikawa and Kim, 2017), (3) empirical likelihood approach (Qin et al., 2002; Tang et al., 2014), (4) pesudo likelihood approach (Tang et al., 2003; Zhao and Shao, 2015). Note that (2) and (3) are essentially the same (Morikawa and Kim, 2018; Qin, 2017). Elaborating on the first and the second approaches, we propose estimation methods under the following semiparametric response model, which is a more general class than the parametric response model. The semiparametric response model is represented as

 π(x,y)=exp{g(x)−γy}1+exp{g(x)−γy}, (4)

where is completely unspecified. This model was firstly introduced in Kim and Yu (2011).

There are two things to note about this model. First, this is a more flexible model compared with commonly used parametric logistic models such as (3). Secondly, it is also viewed as a natural extension of the nonparametric response model in the ignorable case (Hirano et al., 2003) by fixing . Although this semiparametric response model has the above good properties, it was required that is either known or estimable from a validation sample. Shao and Wang (2016) introduced an estimation method without using a validation sample by assuming the existence of in , which is the nonresponse instrumental variable. Under this assumption, becomes a function of . We also assume this assumption for handling the identification problem of .

## 3 Estimation under semiparametric response model

Under the semiparametric response model (4) and nonresponse instrumental variable assumption, we propose new methods for estimating and . We denote the true as and the true as for clarity. We also write the evaluating function of and at and as and .

Under the semiparametric response model in (4), we can obtain

 E{δπ(X1,Y)−1∣X1=x1}∣g∗,γ∗=0,

 exp{gγ(x1)}=E{δexp(γY)∣x1}E{1−δ∣x1},exp{g∗(x1)}=exp{gγ(x1)}∣γ∗.

For fixed , we can use nonparametric methods to estimate . More specifically, when the sample space is continuous, we can use a Kernel regression estimator

 exp{^gγ(x1)}=∑ni=1δiexp(γyi)Kh(x1−x1i)∑ni=1(1−δi)Kh(x1−x1i),

where is a kernel with bandwidth .

When the sample space is discrete, we can use

 exp{^gγ(x1)}=∑ni=1δiexp(γyi)I(x1=x1i)∑ni=1(1−δi)I(x1=x1i),

where denotes an indicator function. For the convenience of notation, whether the sample space is continuous or not, we denote them as

 exp{^gγ(x1)}=~E{δexp(γY)∣x1}~E{1−δ∣x1}.

Then, as discussed in Shao and Wang (2016), the profile response probability is obtained:

 πp(x1i,yi;γ)=exp{^gγ(x1i)−γyi}1+exp{^gγ(x1i)−γyi}.

Using the above profile response probability, we take two approaches for the estimation of : (1) maximum likelihood estimation approach, (2) calibration approach. In both cases, the idea is to use in the objective function, which is considered to be suitable when is known, in the same spirit of the profile likelihood approach in Murphy and Van Der Vaart (2000). Note that once is estimated, we can use as the estimated profile response probability. Here, we first explain the maximum likelihood estimation approach and the calibration approach later.

### 3.1 Maximum likelihood estimation approach

To compute the maximum likelihood estimator in the missing data setting, we usually take two directions: (1) maximizing observed likelihood, (2) solving mean score equation. We start with the second direction. In appendix, we show that the estimator derived from the second perspective can also be interpreted from the first perspective.

When is known, the observed score equation is

 0=n∑i=1[δi{1−π(x1i,yi;g∗,γ)}yi−(1−δi)E0{π(X,Y;g∗,γ)Y∣xi}],

where denotes the expectation with respect to the distribution of given among . By replacing with and with a nonparametric estimator, the estimator is defined as the solution to

 0=n∑i=1[δi{1−πp(x1i,yi;γ)}yi−(1−δi)~E{δexp(γY)πp(X1,Y;γ))Y∣xi}~E{δexp(γY)∣xi}]. (5)

This is based on the following relationship (Kim and Yu, 2011):

 E0{e(X,Y)∣x}=E1{exp(γY)e(X,Y)∣x}E1{exp(γY)∣x},

where is any function of .

We now derive asymptotic results of the estimator obtained from (5). By imposing technical conditions, we can ensure consistency by following Theorem 5.11 in van der Vaart (2002). For detail, see appendix. We obtain the following lemma as a first step. This lemma is also important in the next subsection for developing a new estimator from the calibration approach.

###### Lemma 3.1

We define the right hand side of (5) as . Under certain conditions, we have

 1nt(w)=1nn∑i=1{δiπp(x1i,yi;γ)−1}E{δexp(γY)πp(X,Y;γ)∣x}E{δexp(γY)∣x}+op(n−1/2).

Based on this lemma, we obtain the following result.

###### Theorem 1

The asymptotic variance of is where

 A1 =E[O(X,Y)π(X,Y)(Y−E0[Y∣X1]){E0(πY∣X)−E0(πY∣X1)}]∣g∗,γ∗, B1 =E[O(X,Y){E0(πY∣X)−E0(πY∣X1)}2]∣g∗,γ∗, and O(X,Y)

In practice, the estimator is unstable because of the double use of kernel estimators. Instead of using a kernel estimator for the calculation of the conditional expectation , one can also use a model for a density given and

and apply fractional imputation

(Kim, 2011). We can replace the conditional expectation term in (5) with

 ∑sj=1exp(γyij)πp(x1i,yij;γ)yij∑sj=1exp(γyij),

where is a -size sample obtained from the conditional density and is the maximum likelihood estimator based on the observed samples. Throughout our work, we call the conditional distribution of given and as an outcome distribution and as a working parametric model. To avoid confusion, we write the estimator using this working parametric model as .

### 3.2 Calibration approach

Here, we consider the calibration approach (Chang and Kott, 2008). Based on this approach, when

is a discrete random variable taking values from

to , Shao and Wang (2016) introduced an estimator from the following moment conditions

 (6)

where , using Generalized Method of Moments (Hansen, 1982) .

Although this estimator does not require a validation sample for estimating , its performance is unstable because of the poor choice of in the the moment condition. We consider a broader class of estimators based on the following moment conditions:

 (7)

where is a function vector of including parameter . We call this estimator .

Here, the problem is how to choose the control variable in (7). We can easily deduce that the efficiency will increase as we increase the dimension of . However, for the current problem, the response probability includes a kernel estimator; thus, the calibration estimator (7) from using a high-dimensional is computationally heavy. Thus, we suggest two one-dimensional moment conditions by deriving an asymptotic result of the estimator based on (7).

The asymptotic result of the calibration estimator from (7) is given in the following theorem.

###### Theorem 2

The asymptotic variance of the estimator is :

 Am =E(O(X,Y)π(X,Y){Y−E0(Y∣X1)}[m(X;γ)−E0{m(X;γ)∣X1}])∣g∗,γ∗, Bm =E(O(X,Y)[m(X;γ)−E0{m(X;γ)∣X1}]2)∣g∗,γ∗.

There are two things to note. First, when is known, the term and

will vanish. Second, a consistent estimator for the asymptotic variance, which can be used to construct a confidence interval, is derived as

:

 ^Am =1nn∑i=11−^πi^πiδi{yi−~E0(Y∣x1i;^γ)}{m(xi)−~E0{m(X)∣x1i;^γ}}, ^Bm =1nn∑i=11−^πi^πi[m(xi)−~E0{m(X)∣x1i;^γ}]2,

where denotes an estimated response probability.

Based on the result of Theorem 2, two choices of one-dimensional can be suggested. The first choice is the function appearing in Lemma 3.1. This function is derived by maximum likelihood estimation; thus, we can expect a good performance. Specifically, the estimator is defined as the solution to (7) using

 m(x;γ)=~E{δexp(γY)πp(X1,Y;γ)Y∣x}~E{δexp(γY)|x},

which is an approximation of the term in Lemma 3.1.

The second choice of is

We call the estimator from this as . This is an approximation of . This choice is based on an optimal if there are no and , that is, is known. This result is already known in other literature from different perspectives (Rotnitzky and Robins, 1997; Morikawa and Kim, 2017). We derive this result from a more direct approach.

###### Lemma 3.2

When is known, the asymptotic variance is minimized when .

Although is unknown in practice and the optimality does not hold in general, we can still consider this function as a good candidate of .

Finally, we derive asymptotic results of two estimators.

###### Theorem 3

The asymptotic variance of the estimator is the same as , which is given in Theorem 1, . The asymptotic variance of is .

 A2 =E{O(X,Y)π{Y−E0(Y∣X1)}(E{O(X,Y)πY∣X}E{O(X,Y)∣X}−E0[E{O(X,Y)πY∣X}E{O(X,Y)∣X}∣X1])}∣g∗,γ∗, B2 =E⎧⎨⎩O(X,Y)(E{O(X,Y)πY∣X}E{O(X,Y)∣X}−E0[E{O(X,Y)πY∣X}E{O(X,Y)∣X}∣X1])2⎫⎬⎭∣g∗,γ∗.

We can see that the asymptotic variance of and are the same. This result is natural because

derived from the asymptotic analysis of

is directly used for the construction of the estimator . As for the comparison between and , it is difficult to say which one is superior theoretically. In  5, we experimentally confirm that the two estimators perform similarly. In addition, the asymptotic variances can be estimated like . However, in the case of , it might be difficult to estimate practically because of triply nested expectations.

To avoid using kernels twice, we can use a parametric model for the density of given and to calculate the conditional expectation. As in the previous section, write the estimators using as and . For these estimators, the following asymptotic property holds. This result is similar to the property of generalized estimating equation (Liang and Zeger, 1986).

###### Lemma 3.3

When the working parametric model is well-specified, the forms of the asymptotic variance of and are the same as in Theorem 3. When the working parametric model is mis-specified, and are -consistent.

Note that has a different asymptotic property compared with and . When the parametric model is well-specified, the form of the asymptotic variance of will be changed from that of Theorem 1. In addition, when the parametric model is mis-specified, is not consistent anymore. Therefore, is considered to be more robust than although asymptotic variances of and are the same.

Between and , it is difficult to say which one is superior theoretically in terms of statistical efficiency. We can state that is computationally superior to because if the parametric working model belongs to an exponential family, the function can be calculated analytically without relying on Monte Carlo integration (Morikawa et al., 2017).

## 4 Estimation of mean functional

We have discussed estimation of so far. Here, we discuss estimation for the mean functional . We can consider the following three estimators:

 ^μipw =1nn∑i=1yiπp(x1i;^γ),^μmp=1nn∑i=1[δiyi+(1−δi)~E{δexp(^γY)Y∣xi}~E{δexp(^γY)∣xi}], end ^μdb =1nn∑i=1[δiyiπp(x1i;^γ)+{1−δiπp(x1i;^γ)}~E{δexp(^γY)Y∣xi}~E{δexp(^γY)∣xi}].

The estimator was used in Wang et al. (2014). However, if we consider the original motivation of the semiparametric response model first introduced in Kim and Yu (2011), it will be natural to use the estimator . The estimator is introduced using the analogy of doubly robust estimator in the ignorable case (Robins et al., 1994). Note that this estimator is different from other doubly robust form estimators in the nonignorable case (Miao and Tchetgen Tchetgen, 2016; Morikawa and Kim, 2017).

When is a -consistent estimator, the following asymptotic result can be established.

###### Theorem 4

Under certain regularity conditions, for estimators , and , we have

 ^μipw =C1+C2+C3+op(n−1/2), C1 ^μdb =^μmp+op(n−1/2)=D1+D2+D3+op(n−1/2), D1

where

 H1=E[(1−π){Y−E0(Y∣X1)}2]|g∗,γ∗,H2=E[(1−π){Y−E0(Y∣X)}2]∣g∗,γ∗.

There are three things to note. First, we can see that the asymptotic variances of and are the same. This result makes sense if you consider the ignorable case, which has been well established in literature (Rotnitzky and Vansteelandt, 2014). Second, the asymptotic variance of estimators and are generally smaller than that of because the estimators use more information in the conditional expectation of and . Third, the asymptotic variance can be estimated by taking the variance in the expression of Theorem 4.

We can also use the working parametric model for calculating the conditional expectation for . We write the estimators as and . We have the following properties.

###### Lemma 4.1

When the working parametric model is well-specified, the forms of the asymptotic variance of and are not changed from that of Theorem 4. Also, when the working parametric model is well-specified, is consistent.

This suggests that when the parametric outcome model is used, is considered to be superior to because even if this model is mis-specified, is consistent, while is not generally consistent. It is related to a doubly robust form estimators in nonignorable nonresponse cases (Miao et al., 2015; Miao and Tchetgen Tchetgen, 2016) although our estimator has a different form. They proposed doubly robust estimators in the sense that it is consistent even if the underlying baseline response model () or outcome model (

) is mis-specified under the correct assumption of an odds ratio model (

). In our setup, the response model is nonparamaetric, and therefore cannot be mis-specified. Our proposed estimator is robust in the sense that the estimator is consistent under the correct assumption of the odds ratio model even if the outcome model is mis-specified.

## 5 Simulation Study

We conducted two simulation studies to compare the performance of the estimators for and . For the estimators of , we compared , , and or , , and . Note that the estimator with corresponds to a baseline estimator proposed in Wang et al. (2014). In the estimator , we adopt as . For the estimation of mean functional , we compared three estimators: , and or , and . We consider simulations under three conditions: (1) and are discrete, (2) and are continuous and is discrete, and (3) and are continuous. For the case of (3), see next section. In all cases, the parameter values under the missing data models were chosen so that the overall missing rate was about 30%.

### 5.1 Case where X and Y are discrete

Let be a categorical random variable taking values

and Y be a binary variable taking values

and Z be a binary variable taking value . The random variable follows a multinomial distribution with probability and the random variable , independent of

, follows a Bernoulli distribution with probability

. The random variable follows a Bernoulli distribution with probability . The random variable follows a Bernoulli distribution as follows; M1: , where , M2: , where , and M3: , where . The simulation is replicated times with two sample sizes; and . Here, as all the variables are discrete, we use a nonparametric model for given and .

The result for the estimation of and are presented in Table 1 and Table 2 respectively. First, we can see that our proposed estimators perform better than the baseline estimator in terms of mean square errors. Next, it is seen that and are superior to in terms of efficiency, which is consistent with our theory in Theorem 4. Finally, it is confirmed that when the sample size is large (), mean square errors of and are almost the same. This matches to theoretical results in   3. It is also seen that mean square errors of and are almost the same.

### 5.2 Cases where X1 is discrete and Y,x2 are continuous

Let be a binary random variable, whose distribution is described above, and

be an uniform distribution

. As for the random variable , we make an assumption for the conditional distribution of given and , and also . Note that the distribution of is uniquely determined by these two distributions. For the generation of samples in this setting, see Morikawa et al. (2017). First, let the distribution of given and . Second, let the response mechanism be the following logistic models; M1: , where , M2: , where , and M3: , where .

We used a well-specified parametric outcome model to assist with the calculation of the conditional expectation and compared , and . Note that the estimator for is still constructed from an empirical distribution without using a kernel estimator. For the calculation of the condition expectation term based on Monte Carlo integration, which appears in the objective function of , and , we obtain samples from the auxiliary distribution . However, in the case of , actually, we can perform calculation analytically because

(Morikawa and Kim, 2017). We write the estimator relying on Monte Carlo integration as and the other that does not rely on Monte Carlo integration as .

The simulation is replicated times with sample size and . The result is reported in Table 3 and 4. There are five things to note. First, it is seen that all of the proposed estimators are superior to the baseline estimator . Secondly, whether or is better depends on the true distribution. Thirdly, the mean square error of is smaller than that of and . However, as noted earlier, is not a consistent estimator when the working model is mis-specified. Fourth, by comparing two estimator and , we can see the variance increases negligibly due to Monte Carlo integration. Finally, it is seen that and are superior to in terms of mean square errors.

### 5.3 Coverage probability

We further examined the confidence intervals of and under the setting  5.1 with . As an estimator for , we adopted . We nonparametrically estimated the asymptotic variances based on the forms in the  3 and  4. Note that our method for constructing confidence intervals does not rely on bootstrapping unlike Shao and Wang (2016) because there is no guarantee that bootstrapping would work. The result is reported in Table 5 with the appropriate coverage rate.