Learning to Sample: Counting with Complex Queries

06/21/2019
by   Brett Walenz, et al.
0

In this paper we present a suite of methods to efficiently estimate counts for a generalized class of filters and queries (such as user-defined functions, join predicates, or correlated subqueries). For such queries, traditional sampling techniques may not be applicable due to the complexity of the filter preventing sampling over joins, and sampling after the join may not be feasible due to the cost of computing the full join. Our methods are built upon approximating a query's complex filters with a (faster) probabilistic classifier. From one trained classifier, we estimate counts using either weighted or stratified sampling, or directly quantify counts using classifier outputs on test data. We analyze our methods both theoretically and empirically. Theoretical results indicate that a classifier with certain performance guarantees can produce an estimator that produces counts with much tighter confidence intervals than classical simple random sampling or stratified sampling. We evaluate our methods on diverse scenarios using different data sets, counts, and filters, which empirically validates the accuracy and efficiency of our approach.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset