Robust Bayesian Inference for Big Data: Combining Sensor-based Records with Traditional Survey Data

01/19/2021
by   Ali Rafei, et al.
0

Big Data often presents as massive non-probability samples. Not only is the selection mechanism often unknown, but larger data volume amplifies the relative contribution of selection bias to total error. Existing bias adjustment approaches assume that the conditional mean structures have been correctly specified for the selection indicator or key substantive measures. In the presence of a reference probability sample, these methods rely on a pseudo-likelihood method to account for the sampling weights of the reference sample, which is parametric in nature. Under a Bayesian framework, handling the sampling weights is an even bigger hurdle. To further protect against model misspecification, we expand the idea of double robustness such that more flexible non-parametric methods as well as Bayesian models can be used for prediction. In particular, we employ Bayesian additive regression trees, which not only capture non-linear associations automatically but permit direct quantification of the uncertainty of point estimates through its posterior predictive draws. We apply our method to sensor-based naturalistic driving data from the second Strategic Highway Research Program using the 2017 National Household Travel Survey as a benchmark.

READ FULL TEXT

page 24

page 28

research
03/27/2022

Robust and Efficient Bayesian Inference for Non-Probability Samples

The declining response rates in probability surveys along with the wides...
research
04/20/2022

Functional Calibration under Non-Probability Survey Sampling

Non-probability sampling is prevailing in survey sampling, but ignoring ...
research
01/29/2018

Sampling techniques for big data analysis in finite population inference

In analyzing big data for finite population inference, it is critical to...
research
06/26/2023

Exploring the big data paradox for various estimands using vaccination data from the global COVID-19 Trends and Impact Survey (CTIS)

Selection bias poses a challenge to statistical inference validity in no...
research
04/21/2021

Adjustment for Biased Sampling Using NHANES Derived Propensity Weights

The Consent-to-Contact (C2C) registry at the University of California, I...
research
07/14/2021

Survey data integration for regression analysis using model calibration

We consider regression analysis in the context of data integration. To c...
research
03/11/2022

Sampling Bias Correction for Supervised Machine Learning: A Bayesian Inference Approach with Practical Applications

Given a supervised machine learning problem where the training set has b...

Please sign up or login with your details

Forgot password? Click here to reset