Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

04/04/2017
by   Feras Saad, et al.
0

Databases are widespread, yet extracting relevant data can be difficult. Without substantial domain knowledge, multivariate search queries often return sparse or uninformative results. This paper introduces an approach for searching structured data based on probabilistic programming and nonparametric Bayes. Users specify queries in a probabilistic language that combines standard SQL database search operators with an information theoretic ranking function called predictive relevance. Predictive relevance can be calculated by a fast sparse matrix algorithm based on posterior samples from CrossCat, a nonparametric Bayesian model for high-dimensional, heterogeneously-typed data tables. The result is a flexible search technique that applies to a broad class of information retrieval problems, which we integrate into BayesDB, a probabilistic programming platform for probabilistic data analysis. This paper demonstrates applications to databases of US colleges, global macroeconomic indicators of public health, and classic cars. We found that human evaluators often prefer the results from probabilistic search to results from a standard baseline.

READ FULL TEXT

page 8

page 9

page 13

page 14

research
11/05/2016

Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes

Datasets with hundreds of variables and many missing values are commonpl...
research
12/15/2015

BayesDB: A probabilistic programming system for querying the probable implications of data

Is it possible to make statistical inference broadly accessible to non-s...
research
08/18/2016

Probabilistic Data Analysis with Probabilistic Programming

Probabilistic techniques are central to data analysis, but different app...
research
12/28/2021

Monads for Measurable Queries in Probabilistic Databases

We consider a bag (multiset) monad on the category of standard Borel spa...
research
06/08/2017

Securing Databases from Probabilistic Inference

Databases can leak confidential information when users combine query res...
research
12/02/2020

Complex Coordinate-Based Meta-Analysis with Probabilistic Programming

With the growing number of published functional magnetic resonance imagi...

Please sign up or login with your details

Forgot password? Click here to reset