Choosing good subsamples for regression modelling

03/21/2022
by   Thomas Lumley, et al.
0

A common problem in health research is that we have a large database with many variables measured on a large number of individuals. We are interested in measuring additional variables on a subsample; these measurements may be newly available, or expensive, or simply not considered when the data were first collected. The intended use for the new measurements is to fit a regression model generalisable to the whole cohort (and to its source population). This is a two-phase sampling problem; it differs from some other two-phase sampling problems in the richness of the phase I data and in the goal of regression modelling. In particular, an important special case is measurement-error models, where a variable strongly correlated with the phase II measurements is available at phase I. We will explain how influence functions have been useful as a unifying concept for extending classical results to this setting, and describe the steps from designing for a simple weighted estimator at known parameter values through adaptive multiwave designs and the use of prior information. We will conclude with some comments on the information gap between design-based and model-based estimators in this setting.

READ FULL TEXT
research
06/16/2021

Optimal sampling for design-based estimators of regression models

Two-phase designs measure variables of interest on a subcohort where the...
research
05/28/2020

Optimal multi-wave sampling for regression modelling in two-phase designs

Two-phase designs involve measuring extra variables on a subset of the c...
research
10/26/2019

Analysis of Two-Phase Studies using Generalized Method of Moments

Two-phase design can reduce the cost of epidemiological studies by limit...
research
05/12/2020

Two-phase analysis and study design for survival models with error-prone exposures

Increasingly, medical research is dependent on data collected for non-re...
research
12/13/2018

Optimal designs for series estimation in nonparametric regression with correlated data

In this paper we investigate the problem of designing experiments for se...
research
11/11/2022

Optimal Designs of Two-Phase Case-Control Studies for General Predictor Effects

Under two-phase designs, the outcome and several covariates and confound...
research
11/08/2015

Statistical physics of inference: Thresholds and algorithms

Many questions of fundamental interest in todays science can be formulat...

Please sign up or login with your details

Forgot password? Click here to reset