A flexible outlier detector based on a topology given by graph communities

02/18/2020
by   O. Ramos Terrades, et al.
0

Outlier, or anomaly, detection is essential for optimal performance of machine learning methods and statistical predictive models. It is not just a technical step in a data cleaning process but a key topic in many fields such as fraudulent document detection, in medical applications and assisted diagnosis systems or detecting security threats. In contrast to population-based methods, neighborhood based local approaches are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. However, a main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world data sets show that our approach overall outperforms, both, local and global strategies in multi and single view settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/10/2021

LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks

Many well-established anomaly detection methods use the distance of a sa...
10/02/2018

GLAD: GLocalized Anomaly Detection via Active Feature Space Suppression

We propose an algorithm called GLAD (GLocalized Anomaly Detection) that ...
08/12/2021

Clustering with UMAP: Why and How Connectivity Matters

Topology based dimensionality reduction methods such as t-SNE and UMAP h...
01/12/2015

Navigating the Semantic Horizon using Relative Neighborhood Graphs

This paper is concerned with nearest neighbor search in distributional s...
03/12/2019

Learning Resolution Parameters for Graph Clustering

Finding clusters of well-connected nodes in a graph is an extensively st...
12/20/2021

HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images

Multiple medical institutions collaboratively training a model using fed...
12/02/2021

Joint Characterization of the Cryospheric Spectral Feature Space

Hyperspectral feature spaces are useful for many remote sensing applicat...