DeepAI
Log In Sign Up

Learning-based Support Estimation in Sublinear Time

06/15/2021
by   Talya Eden, et al.
9

We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the support up to ±ε n from a sample of size O(log^2(1/ε) · n/log n), where n is the data set size. Unfortunately, this bound is known to be tight, limiting further improvements to the complexity of this problem. In this paper we consider estimation algorithms augmented with a machine-learning-based predictor that, given any element, returns an estimation of its frequency. We show that if the predictor is correct up to a constant approximation factor, then the sample complexity can be reduced significantly, to log (1/ε) · n^1-Θ(1/log(1/ε)). We evaluate the proposed algorithms on a collection of data sets, using the neural-network based estimators from Hsu et al, ICLR'19 as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy compared to the state of the art algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/25/2020

An Improved Algorithm for Dynamic Set Cover

We consider the minimum set cover problem in a dynamic setting. Here, we...
02/25/2020

Dynamic Set Cover: Improved Amortized and Worst-Case Update Time

In the dynamic minimum set cover problem, a challenge is to minimize the...
05/18/2018

Fast Multivariate Log-Concave Density Estimation

We present a computational approach to log-concave density estimation. T...
06/09/2020

Sublinear Algorithms and Lower Bounds for Metric TSP Cost Estimation

We consider the problem of designing sublinear time algorithms for estim...
11/11/2022

Õptimal Differentially Private Learning of Thresholds and Quasi-Concave Optimization

The problem of learning threshold functions is a fundamental one in mach...
11/21/2017

Revisiting the Set Cover Conjecture

In the Set Cover problem, the input is a ground set of n elements and a ...
06/14/2020

Support Estimation with Sampling Artifacts and Errors

The problem of estimating the support of a distribution is of great impo...