Data Integration by combining big data and survey sample data for finite population inference

03/26/2020
by   Jae-kwang Kim, et al.
0

The statistical challenges in using big data for making valid statistical inference in the finite population have been well documented in literature. These challenges are due primarily to statistical bias arising from under-coverage in the big data source to represent the population of interest and measurement errors in the variables available in the data set. By stratifying the population into a big data stratum and a missing data stratum, we can estimate the missing data stratum by using a fully responding probability sample, and hence the population as a whole by using a data integration estimator. By expressing the data integration estimator as a regression estimator, we can handle measurement errors in the variables in big data and also in the probability sample. Finally, we develop a two-step regression data integration estimator to deal with non-response in the probability sample. An advantage of the approach advocated in this paper is that we do not have to make unrealistic missing-at-random assumptions for the methods to work. The proposed method is applied to the real data example using 2015-16 Australian Agricultural Census data.

READ FULL TEXT

page 29

page 30

research
07/08/2018

Integration of survey data and big observational data for finite population inference using mass imputation

Multiple data sources are becoming increasingly available for statistica...
research
01/29/2018

Sampling techniques for big data analysis in finite population inference

In analyzing big data for finite population inference, it is critical to...
research
06/06/2023

A Calibrated Data-Driven Approach for Small Area Estimation using Big Data

Where the response variable in a big data set is consistent with the var...
research
09/09/2015

Statistical Inference, Learning and Models in Big Data

The need for new methods to deal with big data is a common theme in most...
research
09/25/2021

Statistical Inference for Data Integration

In the age of big data, data integration is a critical step especially i...
research
10/01/2018

On valid descriptive inference from non-probability sample

We examine the conditions under which descriptive inference can be based...
research
07/21/2023

Multiple bias-calibration for adjusting selection bias of non-probability samples using data integration

Valid statistical inference is challenging when the sample is subject to...

Please sign up or login with your details

Forgot password? Click here to reset