A flexible outlier detector based on a topology given by graph communities

by   O. Ramos Terrades, et al.

Outlier, or anomaly, detection is essential for optimal performance of machine learning methods and statistical predictive models. It is not just a technical step in a data cleaning process but a key topic in many fields such as fraudulent document detection, in medical applications and assisted diagnosis systems or detecting security threats. In contrast to population-based methods, neighborhood based local approaches are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. However, a main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world data sets show that our approach overall outperforms, both, local and global strategies in multi and single view settings.


page 1

page 2

page 3

page 4


LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks

Many well-established anomaly detection methods use the distance of a sa...

GLAD: GLocalized Anomaly Detection via Active Feature Space Suppression

We propose an algorithm called GLAD (GLocalized Anomaly Detection) that ...

Clustering with UMAP: Why and How Connectivity Matters

Topology based dimensionality reduction methods such as t-SNE and UMAP h...

Navigating the Semantic Horizon using Relative Neighborhood Graphs

This paper is concerned with nearest neighbor search in distributional s...

Learning Resolution Parameters for Graph Clustering

Finding clusters of well-connected nodes in a graph is an extensively st...

HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images

Multiple medical institutions collaboratively training a model using fed...

Joint Characterization of the Cryospheric Spectral Feature Space

Hyperspectral feature spaces are useful for many remote sensing applicat...