Selectivity Estimation with Deep Likelihood Models

05/10/2019
by   Zongheng Yang, et al.
0

Selectivity estimation has long been grounded in statistical tools for density estimation. To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep likelihood models. However, direct application of these models leads to a limited estimator that is prohibitively expensive to evaluate for range and wildcard predicates. To make a truly usable estimator, we develop a Monte Carlo integration scheme on top of likelihood models that can efficiently handle range queries with dozens of filters or more. Like classical synopses, our estimator summarizes the data without supervision. Unlike previous solutions, our estimator approximates the joint data distribution without any independence assumptions. When evaluated on real-world datasets and compared against real systems and dominant families of techniques, our likelihood model based estimator achieves single-digit multiplicative error at tail, a 40-200× accuracy improvement over the second best method, and is space- and runtime-efficient.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2021

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

Cardinality estimation is a fundamental problem in database systems. To ...
research
11/25/2019

Analysis of odds, probability, and hazard ratios: From 2 by 2 tables to two-sample survival data

Analysis of 2 by 2 tables and two-sample survival data has been widely u...
research
02/06/2019

Fast Mean Estimation with Sub-Gaussian Rates

We propose an estimator for the mean of a random vector in R^d that can ...
research
03/15/2012

A Family of Computationally Efficient and Simple Estimators for Unnormalized Statistical Models

We introduce a new family of estimators for unnormalized statistical mod...
research
07/24/2018

Automatic Bayesian Density Analysis

Making sense of a dataset in an automatic and unsupervised fashion is a ...
research
11/26/2020

Generative Learning of Heterogeneous Tail Dependence

We propose a multivariate generative model to capture the complex depend...
research
02/06/2022

Learning to be a Statistician: Learned Estimator for Number of Distinct Values

Estimating the number of distinct values (NDV) in a column is useful for...

Please sign up or login with your details

Forgot password? Click here to reset