Data Imputation through the Identification of Local Anomalies

09/30/2014
by   Huseyin Ozkan, et al.
0

We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose i) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and ii) a Maximum A Posteriori (MAP) estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous vs normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions; and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions.

READ FULL TEXT
research
10/21/2016

Robust training on approximated minimal-entropy set

In this paper, we propose a general framework to learn a robust large-ma...
research
12/20/2015

ATD: Anomalous Topic Discovery in High Dimensional Discrete Data

We propose an algorithm for detecting patterns exhibited by anomalous cl...
research
02/16/2022

Latent Outlier Exposure for Anomaly Detection with Contaminated Data

Anomaly detection aims at identifying data points that show systematic d...
research
06/20/2020

G2D: Generate to Detect Anomalies

In this paper, we propose a novel method for irregularity detection. Pre...
research
12/28/2020

Detecting Anomalous line-items by Modeling the Legal Case Lifecycle

Anomaly detection continues to be the subject of research and developmen...
research
10/14/2022

G2A2: An Automated Graph Generator with Attributes and Anomalies

Many data-mining applications use dynamic attributed graphs to represent...
research
08/30/2022

Deep Open-Set Recognition for Silicon Wafer Production Monitoring

The chips contained in any electronic device are manufactured over circu...

Please sign up or login with your details

Forgot password? Click here to reset