DeepAI AI Chat
Log In Sign Up

What is the best predictor that you can compute in five minutes using a given Bayesian hierarchical model?

12/10/2019
by   Jonathan R. Bradley, et al.
Florida State University
0

The goal of this paper is to provide a way for statisticians to answer the question posed in the title of this article using any Bayesian hierarchical model of their choosing and without imposing additional restrictive model assumptions. We are motivated by the fact that the rise of “big data” has created difficulties for statisticians to directly apply their methods to big datasets. We introduce a “data subset model” to the popular “data model, process model, and parameter model” framework used to summarize Bayesian hierarchical models. The hyperparameters of the data subset model are specified constructively in that they are chosen such that the implied size of the subset satisfies pre-defined computational constraints. Thus, these hyperparameters effectively calibrates the statistical model to the computer itself to obtain predictions/estimations in a pre-specified amount of time. Several properties of the data subset model are provided including: propriety, partial sufficiency, and semi-parametric properties. Furthermore, we show that subsets of normally distributed data are asymptotically partially sufficient under reasonable constraints. Results from a simulated dataset will be presented across different computers, to show the effect of the computer on the statistical analysis. Additionally, we provide a joint spatial analysis of two different environmental datasets.

READ FULL TEXT

page 20

page 24

05/22/2023

Incorporating Subsampling into Bayesian Models for High-Dimensional Spatial Data

Additive spatial statistical models with weakly stationary process assum...
01/15/2018

Divide and Recombine for Large and Complex Data: Model Likelihood Functions using MCMC

In Divide & Recombine (D&R), big data are divided into subsets, each ana...
10/23/2018

Goodness-of-Fit Tests for Large Datasets

Nowadays, data analysis in the world of Big Data is connected typically ...
01/30/2023

MILO: Model-Agnostic Subset Selection Framework for Efficient Model Training and Tuning

Training deep networks and tuning hyperparameters on large datasets is c...
06/02/2020

Hyperparameter Selection for Subsampling Bootstraps

Massive data analysis becomes increasingly prevalent, subsampling method...