Detecting Outliers in Data with Correlated Measures

08/26/2018
by   Yu-Hsuan Kuo, et al.
0

Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2014

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Outliers are the points which are different from or inconsistent with th...
research
10/24/2022

Are we really making much progress in unsupervised graph outlier detection? Revisiting the problem with new insight and superior method

A large number of studies on Graph Outlier Detection (GOD) have emerged ...
research
04/22/2021

Conditional Selective Inference for Robust Regression and Outlier Detection using Piecewise-Linear Homotopy Continuation

In practical data analysis under noisy environment, it is common to firs...
research
04/21/2022

Fluctuation-based Outlier Detection

Outlier detection is an important topic in machine learning and has been...
research
10/13/2021

C-AllOut: Catching Calling Outliers by Type

Given an unlabeled dataset, wherein we have access only to pairwise simi...
research
10/15/2022

D.MCA: Outlier Detection with Explicit Micro-Cluster Assignments

How can we detect outliers, both scattered and clustered, and also expli...
research
04/05/2019

Outlier Detection for Improved Data Quality and Diversity in Dialog Systems

In a corpus of data, outliers are either errors: mistakes in the data th...

Please sign up or login with your details

Forgot password? Click here to reset