Consistent and Flexible Selectivity Estimation for High-dimensional Data

05/20/2020
by   Yaoshu Wang, et al.
0

Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion. Answering this problem accurately and efficiently is essential to many applications, such as density estimation, outlier detection, query optimization, and data integration. The estimation problem is especially challenging for large-scale high-dimensional data due to the curse of dimensionality, the large variance of selectivity across different queries, and the need to make the estimator consistent (i.e., the selectivity is non-decreasing in the threshold). We propose a new deep learning-based model that learns a query-dependent piecewise linear function as selectivity estimator, which is flexible to fit the selectivity curve of any query object and threshold, while guaranteeing that the output is non-decreasing in the threshold. To improve the accuracy for large datasets, we propose to partition the dataset into multiple disjoint subsets and build a local model on each of them. We perform experiments on real datasets and show that the proposed model significantly outperforms state-of-the-art models in accuracy and is competitive in efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2020

Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach

Due to the outstanding capability of capturing underlying data distribut...
research
12/08/2021

Estimating Divergences in High Dimensions

The problem of estimating the divergence between 2 high dimensional dist...
research
09/09/2019

Outlier Detection in High Dimensional Data

High-dimensional data poses unique challenges in outlier detection proce...
research
05/21/2018

Deep Energy Estimator Networks

Density estimation is a fundamental problem in statistical learning. Thi...
research
03/24/2019

Multi-Attribute Selectivity Estimation Using Deep Learning

Selectivity estimation - the problem of estimating the result size of qu...
research
06/05/2020

Overcoming the Curse of Dimensionality in Density Estimation with Mixed Sobolev GANs

We propose a novel GAN framework for non-parametric density estimation w...
research
12/18/2018

Index-based, High-dimensional, Cosine Threshold Querying with Optimality Guarantees

Given a database of vectors, a cosine threshold query returns all vector...

Please sign up or login with your details

Forgot password? Click here to reset