On Efficient Approximate Queries over Machine Learning Models

06/06/2022
by   Dujian Ding, et al.
0

The question of answering queries over ML predictions has been gaining attention in the database community. This question is challenging because the cost of finding high quality answers corresponds to invoking an oracle such as a human expert or an expensive deep neural network model on every single item in the DB and then applying the query. We develop a novel unified framework for approximate query answering by leveraging a proxy to minimize the oracle usage of finding high quality answers for both Precision-Target (PT) and Recall-Target (RT) queries. Our framework uses a judicious combination of invoking the expensive oracle on data samples and applying the cheap proxy on the objects in the DB. It relies on two assumptions. Under the Proxy Quality assumption, proxy quality can be quantified in a probabilistic manner w.r.t. the oracle. This allows us to develop two algorithms: PQA that efficiently finds high quality answers with high probability and no oracle calls, and PQE, a heuristic extension that achieves empirically good performance with a small number of oracle calls. Alternatively, under the Core Set Closure assumption, we develop two algorithms: CSC that efficiently returns high quality answers with high probability and minimal oracle usage, and CSE, which extends it to more general settings. Our extensive experiments on five real-world datasets on both query types, PT and RT, demonstrate that our algorithms outperform the state-of-the-art and achieve high result quality with provable statistical guarantees.

READ FULL TEXT
research
04/02/2020

Approximate Selection with Guarantees using Proxies

Due to the falling costs of data acquisition and storage, researchers an...
research
07/10/2021

NeuroDB: A Neural Network Framework for Answering Range Aggregate Queries and Beyond

Range aggregate queries (RAQs) are an integral part of many real-world a...
research
01/02/2022

Optimizing Machine Learning Inference Queries with Correlative Proxy Models

We consider accelerating machine learning (ML) inference queries on unst...
research
11/20/2017

Relaxed Oracles for Semi-Supervised Clustering

Pairwise "same-cluster" queries are one of the most widely used forms of...
research
08/17/2023

Accelerating Aggregation Queries on Unstructured Streams of Data

Analysts and scientists are interested in querying streams of video, aud...
research
06/02/2023

Fast Interactive Search with a Scale-Free Comparison Oracle

A comparison-based search algorithm lets a user find a target item t in ...
research
09/30/2017

Enabling Quality Control for Entity Resolution: A Human and Machine Cooperative Framework

Even though many machine algorithms have been proposed for entity resolu...

Please sign up or login with your details

Forgot password? Click here to reset