What is the best predictor that you can compute in five minutes using a given Bayesian hierarchical model?

12/10/2019
by   Jonathan R. Bradley, et al.
0

The goal of this paper is to provide a way for statisticians to answer the question posed in the title of this article using any Bayesian hierarchical model of their choosing and without imposing additional restrictive model assumptions. We are motivated by the fact that the rise of “big data” has created difficulties for statisticians to directly apply their methods to big datasets. We introduce a “data subset model” to the popular “data model, process model, and parameter model” framework used to summarize Bayesian hierarchical models. The hyperparameters of the data subset model are specified constructively in that they are chosen such that the implied size of the subset satisfies pre-defined computational constraints. Thus, these hyperparameters effectively calibrates the statistical model to the computer itself to obtain predictions/estimations in a pre-specified amount of time. Several properties of the data subset model are provided including: propriety, partial sufficiency, and semi-parametric properties. Furthermore, we show that subsets of normally distributed data are asymptotically partially sufficient under reasonable constraints. Results from a simulated dataset will be presented across different computers, to show the effect of the computer on the statistical analysis. Additionally, we provide a joint spatial analysis of two different environmental datasets.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset