MSD-Kmeans: A Novel Algorithm for Efficient Detection of Global and Local Outliers

10/15/2019
by   Yuanyuan Wei, et al.
0

Outlier detection is a technique in data mining that aims to detect unusual or unexpected records in the dataset. Existing outlier detection algorithms have different pros and cons and exhibit different sensitivity to noisy data such as extreme values. In this paper, we propose a novel cluster-based outlier detection algorithm named MSD-Kmeans that combines the statistical method of Mean and Standard Deviation (MSD) and the machine learning clustering algorithm K-means to detect outliers more accurately with the better control of extreme values. There are two phases in this combination method of MSD-Kmeans: (1) applying MSD algorithm to eliminate as many noisy data to minimize the interference on clusters, and (2) applying K-means algorithm to obtain local optimal clusters. We evaluate our algorithm and demonstrate its effectiveness in the context of detecting possible overcharging of taxi fares, as greedy dishonest drivers may attempt to charge high fares by detouring. We compare the performance indicators of MSD-Kmeans with those of other outlier detection algorithms, such as MSD, K-means, Z-score, MIQR and LOF, and prove that the proposed MSD-Kmeans algorithm achieves the highest measure of precision, accuracy, and F-measure. We conclude that MSD-Kmeans can be used for effective and efficient outlier detection on data of varying quality on IoT devices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2014

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Outliers are the points which are different from or inconsistent with th...
research
06/14/2019

Detecting Network Soft-failures with the Network Link Outlier Factor (NLOF)

In this paper, we describe and experimentally evaluate the performance o...
research
05/15/2019

Automated detection of business-relevant outliers in e-commerce conversion rate

We evaluate how modern outlier detection methods perform in identifying ...
research
05/02/2016

Linear-time Outlier Detection via Sensitivity

Outliers are ubiquitous in modern data sets. Distance-based techniques a...
research
12/13/2022

AWT – Clustering Meteorological Time Series Using an Aggregated Wavelet Tree

Both clustering and outlier detection play an important role for meteoro...
research
05/12/2022

Outlier Detection for Multi-Network Data

It has become routine in neuroscience studies to measure brain networks ...
research
01/05/2018

Clustering with Outlier Removal

Cluster analysis and outlier detection are strongly coupled tasks in dat...

Please sign up or login with your details

Forgot password? Click here to reset