Reinforced Approximate Exploratory Data Analysis

12/12/2022
by   Shaddy Garg, et al.
0

Exploratory data analytics (EDA) is a sequential decision making process where analysts choose subsequent queries that might lead to some interesting insights based on the previous queries and corresponding results. Data processing systems often execute the queries on samples to produce results with low latency. Different downsampling strategy preserves different statistics of the data and have different magnitude of latency reductions. The optimum choice of sampling strategy often depends on the particular context of the analysis flow and the hidden intent of the analyst. In this paper, we are the first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors. We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact. Evaluations with 3 real datasets show that our technique can preserve the original insight generation flow while improving the interaction latency, compared to baseline methods.

READ FULL TEXT
research
07/30/2018

To Ship or Not to (Function) Ship (Extended version)

Sampling is often used to reduce query latency for interactive big data ...
research
01/28/2022

Electra: Conditional Generative Model based Predicate-Aware Query Approximation

The goal of Approximate Query Processing (AQP) is to provide very fast b...
research
07/12/2017

Foresight: Recommending Visual Insights

Current tools for exploratory data analysis (EDA) require users to manua...
research
05/02/2018

BlazeIt: Fast Exploratory Video Queries using Neural Networks

As video volumes grow, analysts have increasingly turned to deep learnin...
research
07/29/2021

Interactive Region-of-Interest Discovery using Exploratory Feedback

In this paper, we propose a geospatial data management framework called ...
research
03/03/2021

Enhancing the Interactivity of Dataframe Queries by Leveraging Think Time

We propose opportunistic evaluation, a framework for accelerating intera...
research
11/16/2015

How much does your data exploration overfit? Controlling bias via information usage

Modern data is messy and high-dimensional, and it is often not clear a p...

Please sign up or login with your details

Forgot password? Click here to reset