Probabilistic Data Analysis with Probabilistic Programming

08/18/2016
by   Feras Saad, et al.
0

Probabilistic techniques are central to data analysis, but different approaches can be difficult to apply, combine, and compare. This paper introduces composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques. Examples include hierarchical Bayesian models, multivariate kernel methods, discriminative machine learning, clustering algorithms, dimensionality reduction, and arbitrary probabilistic programs. We also demonstrate the integration of CGPMs into BayesDB, a probabilistic programming platform that can express data analysis tasks using a modeling language and a structured query language. The practical value is illustrated in two ways. First, CGPMs are used in an analysis that identifies satellite data records which probably violate Kepler's Third Law, by composing causal probabilistic programs with non-parametric Bayes in under 50 lines of probabilistic code. Second, for several representative data analysis tasks, we report on lines of code and accuracy measurements of various CGPMs, plus comparisons with standard baseline solutions from Python and MATLAB libraries.

READ FULL TEXT

page 10

page 13

research
07/14/2019

Bayesian Synthesis of Probabilistic Programs for Automatic Data Modeling

We present new techniques for automatically constructing probabilistic p...
research
11/05/2016

Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes

Datasets with hundreds of variables and many missing values are commonpl...
research
04/04/2017

Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

Databases are widespread, yet extracting relevant data can be difficult....
research
12/15/2015

BayesDB: A probabilistic programming system for querying the probable implications of data

Is it possible to make statistical inference broadly accessible to non-s...
research
05/19/2017

Foundations of Declarative Data Analysis Using Limit Datalog Programs

Motivated by applications in declarative data analysis, we study Datalog...
research
09/10/2019

Static Analysis for Probabilistic Programs

Probabilistic programming is a powerful abstraction for statistical mach...
research
02/08/2021

PyAutoFit: A Classy Probabilistic Programming Language for Model Composition and Fitting

A major trend in academia and data science is the rapid adoption of Baye...

Please sign up or login with your details

Forgot password? Click here to reset