Accelerating Aggregation Queries on Unstructured Streams of Data

08/17/2023
by   Matthew Russo, et al.
0

Analysts and scientists are interested in querying streams of video, audio, and text to extract quantitative insights. For example, an urban planner may wish to measure congestion by querying the live feed from a traffic camera. Prior work has used deep neural networks (DNNs) to answer such queries in the batch setting. However, much of this work is not suited for the streaming setting because it requires access to the entire dataset before a query can be submitted or is specific to video. Thus, to the best of our knowledge, no prior work addresses the problem of efficiently answering queries over multiple modalities of streams. In this work we propose InQuest, a system for accelerating aggregation queries on unstructured streams of data with statistical guarantees on query accuracy. InQuest leverages inexpensive approximation models ("proxies") and sampling techniques to limit the execution of an expensive high-precision model (an "oracle") to a subset of the stream. It then uses the oracle predictions to compute an approximate query answer in real-time. We theoretically analyzed InQuest and show that the expected error of its query estimates converges on stationary streams at a rate inversely proportional to the oracle budget. We evaluated our algorithm on six real-world video and text datasets and show that InQuest achieves the same root mean squared error (RMSE) as two streaming baselines with up to 5.0x fewer oracle invocations. We further show that InQuest can achieve up to 1.9x lower RMSE at a fixed number of oracle invocations than a state-of-the-art batch setting algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2020

Task-agnostic Indexes for Deep Learning-based Queries over Unstructured Data

Unstructured data is now commonly queried by using target deep neural ne...
research
07/21/2020

Static and Streaming Data Structures for Fréchet Distance Queries

Given a curve P with points in ℝ^d in a streaming fashion, and parameter...
research
07/15/2020

VidCEP: Complex Event Processing Framework to Detect Spatiotemporal Patterns in Video Streams

Video data is highly expressive and has traditionally been very difficul...
research
12/04/2021

Efficient Deterministic Quantitative Group Testing for Precise Information Retrieval

The Quantitative Group Testing (QGT) is about learning a (hidden) subset...
research
06/06/2022

On Efficient Approximate Queries over Machine Learning Models

The question of answering queries over ML predictions has been gaining a...
research
04/02/2020

Approximate Selection with Guarantees using Proxies

Due to the falling costs of data acquisition and storage, researchers an...
research
07/30/2020

Bounded-Memory Criteria for Streams with Application Time

Bounded-memory computability continues to be in the focus of those areas...

Please sign up or login with your details

Forgot password? Click here to reset