Analysis of Two-Phase Studies using Generalized Method of Moments
Two-phase design can reduce the cost of epidemiological studies by limiting the ascertainment of expensive covariates or/and exposures to an efficiently selected subset (phase-II) of a larger (phase-I) study. Efficient analysis of the resulting dataset combining disparate information from phase-I and phase-II, however, can be complex. Most of the existing methods including semiparametric maximum-likelihood estimator, require the information in phase-I to be summarized into a fixed number of strata. In this paper, we describe a novel method for analysis of two-phase studies where information from phase-I is summarized by parameters associated with a reduced logistic regression model of the disease outcome on available covariates. We then setup estimating equations for parameters associated with the desired extended logistic regression model, based on information on the reduced model parameters from phase-I and complete data available at phase-II after accounting for non-random sampling design at phase-II. We use the generalized method of moments to solve overly identified estimating equations and develop the resulting asymptotic theory for the proposed estimator. Simulation studies show that the use of reduced parametric models, as opposed to summarizing data into strata, can lead to more efficient utilization of phase-I data. An application of the proposed method is illustrated using the US National Wilms Tumor study data.
READ FULL TEXT