BayesDB: A probabilistic programming system for querying the probable implications of data

12/15/2015
by   Vikash Mansinghka, et al.
0

Is it possible to make statistical inference broadly accessible to non-statisticians without sacrificing mathematical rigor or inference quality? This paper describes BayesDB, a probabilistic programming platform that aims to enable users to query the probable implications of their data as directly as SQL databases enable them to query the data itself. This paper focuses on four aspects of BayesDB: (i) BQL, an SQL-like query language for Bayesian data analysis, that answers queries by averaging over an implicit space of probabilistic models; (ii) techniques for implementing BQL using a broad class of multivariate probabilistic models; (iii) a semi-parametric Bayesian model-builder that auomatically builds ensembles of factorial mixture models to serve as baselines; and (iv) MML, a "meta-modeling" language for imposing qualitative constraints on the model-builder and combining baseline models with custom algorithmic and statistical models that can be implemented in external software. BayesDB is illustrated using three applications: cleaning and exploring a public database of Earth satellites; assessing the evidence for temporal dependence between macroeconomic indicators; and analyzing a salary survey.

READ FULL TEXT

page 8

page 11

page 16

page 20

page 21

page 22

page 23

research
11/05/2016

Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes

Datasets with hundreds of variables and many missing values are commonpl...
research
04/04/2017

Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

Databases are widespread, yet extracting relevant data can be difficult....
research
08/18/2016

Probabilistic Data Analysis with Probabilistic Programming

Probabilistic techniques are central to data analysis, but different app...
research
11/09/2018

Meet Cyrus - The Query by Voice Mobile Assistant for the Tutoring and Formative Assessment of SQL Learners

Being declarative, SQL stands a better chance at being the programming l...
research
07/07/2017

InferSpark: Statistical Inference at Scale

The Apache Spark stack has enabled fast large-scale data processing. Des...
research
12/02/2020

Complex Coordinate-Based Meta-Analysis with Probabilistic Programming

With the growing number of published functional magnetic resonance imagi...

Please sign up or login with your details

Forgot password? Click here to reset