Selectivity Estimation with Deep Likelihood Models

by   Zongheng Yang, et al.
berkeley college

Selectivity estimation has long been grounded in statistical tools for density estimation. To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep likelihood models. However, direct application of these models leads to a limited estimator that is prohibitively expensive to evaluate for range and wildcard predicates. To make a truly usable estimator, we develop a Monte Carlo integration scheme on top of likelihood models that can efficiently handle range queries with dozens of filters or more. Like classical synopses, our estimator summarizes the data without supervision. Unlike previous solutions, our estimator approximates the joint data distribution without any independence assumptions. When evaluated on real-world datasets and compared against real systems and dominant families of techniques, our likelihood model based estimator achieves single-digit multiplicative error at tail, a 40-200× accuracy improvement over the second best method, and is space- and runtime-efficient.


page 1

page 2

page 3

page 4


A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

Cardinality estimation is a fundamental problem in database systems. To ...

Analysis of odds, probability, and hazard ratios: From 2 by 2 tables to two-sample survival data

Analysis of 2 by 2 tables and two-sample survival data has been widely u...

Fast Mean Estimation with Sub-Gaussian Rates

We propose an estimator for the mean of a random vector in R^d that can ...

A Family of Computationally Efficient and Simple Estimators for Unnormalized Statistical Models

We introduce a new family of estimators for unnormalized statistical mod...

Automatic Bayesian Density Analysis

Making sense of a dataset in an automatic and unsupervised fashion is a ...

Generative Learning of Heterogeneous Tail Dependence

We propose a multivariate generative model to capture the complex depend...

Learning to be a Statistician: Learned Estimator for Number of Distinct Values

Estimating the number of distinct values (NDV) in a column is useful for...

Please sign up or login with your details

Forgot password? Click here to reset