On Data Enriched Logistic Regression

11/14/2019
by   Cheng Zheng, et al.
0

Biomedical researchers usually study the effects of certain exposures on disease risks among a well-defined population. To achieve this goal, the gold standard is to design a trial with an appropriate sample from that population. Due to the high cost of such trials, usually the sample size collected is limited and is not enough to accurately estimate some exposures' effect. In this paper, we discuss how to leverage the information from external `big data' (data with much larger sample size) to improve the estimation accuracy at the risk of introducing small bias. We proposed a family of weighted estimators to balance the bias increase and variance reduction when including the big data. We connect our proposed estimator to the established penalized regression estimators. We derive the optimal weights using both second order and higher order asymptotic expansions. Using extensive simulation studies, we showed that the improvement in terms of mean square error (MSE) for the regression coefficient can be substantial even with finite sample sizes and our weighted method outperformed the existing methods such as penalized regression and James Stein's approach. Also we provide theoretical guarantee that the proposed estimators will never lead to asymptotic MSE larger than the maximum likelihood estimator using small data only in general. We applied our proposed methods to the Asia Cohort Consortium China cohort data to estimate the relationships between age, BMI, smoking, alcohol use and mortality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2021

Liu Estimator in the Multinomial Logistic Regression Model

This paper considers the Liu estimator in the multinomial logistic regre...
research
02/19/2020

Asymptotically Optimal Bias Reduction for Parametric Models

An important challenge in statistical analysis concerns the control of t...
research
09/28/2021

Penalized Likelihood Methods for Modeling of Reading Count Data

The paper considers parameter estimation in count data models using pena...
research
01/28/2022

Asymptotic behaviour of penalized robust estimators in logistic regression when dimension increases

Penalized M-estimators for logistic regression models have been previous...
research
02/28/2021

On the Subbagging Estimation for Massive Data

This article introduces subbagging (subsample aggregating) estimation ap...
research
03/19/2019

Optimal Bias Correction of the Log-periodogram Estimator of the Fractional Parameter: A Jackknife Approach

We use the jackknife to bias correct the log-periodogram regression (LPR...
research
03/03/2021

Minimax MSE Bounds and Nonlinear VAR Prewhitening for Long-Run Variance Estimation Under Nonstationarity

We establish new mean-squared error (MSE) bounds for long-run variance (...

Please sign up or login with your details

Forgot password? Click here to reset