Reptile: Aggregation-level Explanations for Hierarchical Data

03/12/2021
by   Zezhou Huang, et al.
0

Recent query explanation systems help users understand anomalies in aggregation results by proposing predicates that describe input records that, if deleted, would resolve the anomalies. However, it can be difficult for users to understand how a predicate was chosen, and these approaches are limited to errors that can be resolved through deletion. In contrast, data errors may be due to group-wise errors, such as missing records or systematic value errors. This paper presents Reptile, an explanation system for hierarchical data. Given an anomalous aggregate query result, Reptile recommends the next drill-down attribute,and ranks the drill-down groups based on the extent repairing the group's statistics to its expected values resolves the anomaly. Reptile efficiently trains a multi-level model that leverages the data's hierarchy to estimate the expected values, and uses a factorised representation of the feature matrix to remove redundancies due to the data's hierarchical structure. We further extend model training to support factorised data, and develop a suite of optimizations that leverage the data's hierarchical structure. Reptile reduces end-to-end runtimes by more than 6 times compared to a Matlab-based implementation, correctly identifies 21/30 data errors in John Hopkin's COVID-19 data, and correctly resolves 20/22 complaints in a user study using data and researchers from Columbia University's Financial Instruments Sector Team.

READ FULL TEXT
research
09/02/2022

DPXPlain: Privately Explaining Aggregate Query Answers

Differential privacy (DP) is the state-of-the-art and rigorous notion of...
research
07/30/2018

Call Detail Records Driven Anomaly Detection and Traffic Prediction in Mobile Cellular Networks

Mobile networks possess information about the users as well as the netwo...
research
11/03/2020

Machine Learning Framwork for Performance Anomaly in OpenMP Multi-Threaded Systems

Some OpenMP multi-threaded applications increasingly suffer from perform...
research
07/17/2019

Contrastive Explanations for Large Errors in Retail Forecasting Predictions through Monte Carlo Simulations

At Ahold Delhaize, there is an interest in using more complex machine le...
research
10/30/2022

XMD: An End-to-End Framework for Interactive Explanation-Based Debugging of NLP Models

NLP models are susceptible to learning spurious biases (i.e., bugs) that...
research
06/08/2021

Private Multi-Group Aggregation

We study the differentially private multi group aggregation (PMGA) probl...
research
10/03/2022

Aggregator Reuse and Extension for Richer Web Archive Interaction

Memento aggregators enable users to query multiple web archives for capt...

Please sign up or login with your details

Forgot password? Click here to reset