Efficient Discovery of Meaningful Outlier Relationships

10/19/2019
by   Aline Bessa, et al.
0

We propose PODS (Predictable Outliers in Data-trendS), a method that, given a collection of temporal data sets, derives data-driven explanations for outliers by identifying meaningful relationships between them. First, we formalize the notion of meaningfulness, which so far has been informally framed in terms of explainability. Next, since outliers are rare and it is difficult to determine whether their relationships are meaningful, we develop a new criterion that does so by checking if these relationships could have been predicted from non-outliers, i.e., if we could see the outlier relationships coming. Finally, searching for meaningful outlier relationships between every pair of data sets in a large data collection is computationally infeasible. To address that, we propose an indexing strategy that prunes irrelevant comparisons across data sets, making the approach scalable. We present the results of an experimental evaluation using real data sets and different baselines, which demonstrates the effectiveness, robustness, and scalability of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2023

ODIM: an efficient method to detect outliers via inlier-memorization effect of deep generative models

Identifying whether a given sample is an outlier or not is an important ...
research
04/15/2020

Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

Benchmarking unsupervised outlier detection is difficult. Outliers are r...
research
12/12/2017

Outlier Detection by Consistent Data Selection Method

Often the challenge associated with tasks like fraud and spam detection[...
research
12/16/2019

Detecting and Classifying Outliers in Big Functional Data

This paper proposes two new outlier detection methods, which are useful ...
research
05/02/2016

Linear-time Outlier Detection via Sensitivity

Outliers are ubiquitous in modern data sets. Distance-based techniques a...
research
04/05/2019

Outlier Detection for Improved Data Quality and Diversity in Dialog Systems

In a corpus of data, outliers are either errors: mistakes in the data th...
research
10/22/2021

DeepAg: Deep Learning Approach for Measuring the Effects of Outlier Events on Agricultural Production and Policy

Quantitative metrics that measure the global economy's equilibrium have ...

Please sign up or login with your details

Forgot password? Click here to reset