Probability and Non-Probability Samples: Improving Regression Modeling by Using Data from Different Sources

04/03/2022
by   Gerhard Tutz, et al.
0

Non-probability sampling, for example in the form of online panels, has become a fast and cheap method to collect data. While reliable inference tools are available for classical probability samples, non-probability samples can yield strongly biased estimates since the selection mechanism is typically unknown. We propose a general method how to improve statistical inference when in addition to a probability sample data from other sources, which have to be considered non-probability samples, are available. The method uses specifically tailored regression residuals to enlarge the original data set by including observations from other sources that can be considered as stemming from the target population. Measures of accuracy of estimates are obtained by adapted bootstrap techniques. It is demonstrated that the method can improve estimates in a wide range of scenarios. For illustrative purposes, the proposed method is applied to two data sets.

READ FULL TEXT
research
01/09/2020

Statistical Data Integration in Survey Sampling: A Review

Finite population inference is a central goal in survey sampling. Probab...
research
08/28/2021

A robust fusion-extraction procedure with summary statistics in the presence of biased sources

Information from various data sources is increasingly available nowadays...
research
04/20/2021

Data Envelopment Analysis models with imperfect knowledge of input and output values: An application to Portuguese public hospitals

Assessing the technical efficiency of a set of observations requires tha...
research
04/13/2020

Measures of Selection Bias in Regression Coefficients Estimated from Non-Probability Samples

We derive novel measures of selection bias for estimates of the coeffici...
research
08/12/2019

Blending of Probability and Non-Probability Samples: Applications to a Survey of Military Caregivers

Probability samples are the preferred method for providing inferences th...
research
12/17/2019

Mosaic: A Sample-Based Database System for Open World Query Processing

Data scientists have relied on samples to analyze populations of interes...
research
05/18/2021

An Efficient Approach for Statistical Matching of Survey Data Trough Calibration, Optimal Transport and Balanced Sampling

Statistical matching aims to integrate two statistical sources. These so...

Please sign up or login with your details

Forgot password? Click here to reset