Outlier Detection by Consistent Data Selection Method

12/12/2017
by   Utkarsh Porwal, et al.
0

Often the challenge associated with tasks like fraud and spam detection[1] is the lack of all likely patterns needed to train suitable supervised learning models. In order to overcome this limitation, such tasks are attempted as outlier or anomaly detection tasks. We also hypothesize that out- liers have behavioral patterns that change over time. Limited data and continuously changing patterns makes learning significantly difficult. In this work we are proposing an approach that detects outliers in large data sets by relying on data points that are consistent. The primary contribution of this work is that it will quickly help retrieve samples for both consistent and non-outlier data sets and is also mindful of new outlier patterns. No prior knowledge of each set is required to extract the samples. The method consists of two phases, in the first phase, consistent data points (non- outliers) are retrieved by an ensemble method of unsupervised clustering techniques and in the second phase a one class classifier trained on the consistent data point set is ap- plied on the remaining sample set to identify the outliers. The approach is tested on three publicly available data sets and the performance scores are competitive.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2018

Credit Card Fraud Detection in e-Commerce: An Outlier Detection Approach

Often the challenge associated with tasks like fraud and spam detection ...
research
09/18/2020

Identification of Abnormal States in Videos of Ants Undergoing Social Phase Change

Biology is both an important application area and a source of motivation...
research
08/14/2023

Quantifying Outlierness of Funds from their Categories using Supervised Similarity

Mutual fund categorization has become a standard tool for the investment...
research
11/05/2019

Detecting Point Outliers Using Prune-based Outlier Factor (PLOF)

Outlier detection (also known as anomaly detection or deviation detectio...
research
10/19/2019

Efficient Discovery of Meaningful Outlier Relationships

We propose PODS (Predictable Outliers in Data-trendS), a method that, gi...
research
04/05/2023

PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data

With calls for increasing transparency, governments are releasing greate...
research
10/26/2021

Revisiting randomized choices in isolation forests

Isolation forest or "iForest" is an intuitive and widely used algorithm ...

Please sign up or login with your details

Forgot password? Click here to reset