Fast and Exact Outlier Detection in Metric Spaces: A Proximity Graph-based Approach

10/18/2021
by   Daichi Amagata, et al.
0

Distance-based outlier detection is widely adopted in many fields, e.g., data mining and machine learning, because it is unsupervised, can be employed in a generic metric space, and does not have any assumptions of data distributions. Data mining and machine learning applications face a challenge of dealing with large datasets, which requires efficient distance-based outlier detection algorithms. Due to the popularization of computational environments with large memory, it is possible to build a main-memory index and detect outliers based on it, which is a promising solution for fast distance-based outlier detection. Motivated by this observation, we propose a novel approach that exploits a proximity graph. Our approach can employ an arbitrary proximity graph and obtains a significant speed-up against state-of-the-art. However, designing an effective proximity graph raises a challenge, because existing proximity graphs do not consider efficient traversal for distance-based outlier detection. To overcome this challenge, we propose a novel proximity graph, MRPG. Our empirical study using real datasets demonstrates that MRPG detects outliers significantly faster than the state-of-the-art algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2014

Similarity- based approach for outlier detection

This paper presents a new approach for detecting outliers by introducing...
research
03/12/2018

Onion-Peeling Outlier Detection in 2-D data Sets

Outlier Detection is a critical and cardinal research task due its array...
research
02/21/2019

Continuous Outlier Mining of Streaming Data in Flink

In this work, we focus on distance-based outliers in a metric space, whe...
research
08/18/2023

Outlier detection for mixed-type data: A novel approach

Outlier detection can serve as an extremely important tool for researche...
research
11/01/2022

Meta-Learning for Unsupervised Outlier Detection with Optimal Transport

Automated machine learning has been widely researched and adopted in the...
research
10/06/2019

Fast Detection of Outliers in Data Streams with the Q_n Estimator

We present FQN (Fast Q_n), a novel algorithm for fast detection of outli...
research
02/08/2020

SUOD: Toward Scalable Unsupervised Outlier Detection

Outlier detection is a key field of machine learning for identifying abn...

Please sign up or login with your details

Forgot password? Click here to reset