QUIP: Query-driven Missing Value Imputation

03/31/2022
by   Yiming Lin, et al.
0

Missing values widely exist in real-world data sets, and failure to clean the missing data may result in the poor quality of answers to queries. Traditionally, missing value imputation has been studied as an offline process as part of preparing data for analysis. This paper studies query-time missing value imputation and proposes QUIP, which only imputes minimal missing values to answer the query. Specifically, by taking a reasonable good query plan as input, QUIP tries to minimize the missing value imputation cost and query processing overhead. QUIP proposes a new implementation of outer join to preserve missing values in query processing and a bloom filter based index structure to optimize the space and runtime overhead. QUIP also designs a cost-based decision function to automatically guide each operator to impute missing values now or delay imputations. Efficient optimizations are proposed to speed-up aggregate operations in QUIP, such as MAX/MIN operator. Extensive experiments on both real and synthetic data sets demonstrates the effectiveness and efficiency of QUIP, which outperforms the state-of-the-art ImputeDB by 2 to 10 times on different query sets and data sets, and achieves the order-of-magnitudes improvement over the offline approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2019

Efficient Join Processing Over Incomplete Data Streams (Technical Report)

For decades, the join operator over fast data streams has always drawn m...
research
04/08/2020

Fast and Reliable Missing Data Contingency Analysis with Predicate-Constraints

Today, data analysts largely rely on intuition to determine whether miss...
research
10/28/2019

Missing Value Imputation for Mixed Data Through Gaussian Copula

Missing data imputation forms the first critical step of many data analy...
research
06/29/2023

Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach

Numerical data imputation algorithms replace missing values by estimates...
research
11/04/2014

Iterated geometric harmonics for data imputation and reconstruction of missing data

The method of geometric harmonics is adapted to the situation of incompl...
research
09/05/2018

Anomaly Detection in the Presence of Missing Values

Standard methods for anomaly detection assume that all features are obse...

Please sign up or login with your details

Forgot password? Click here to reset