Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics

09/20/2021
by   Myrto Limnios, et al.
0

The ability to collect and store ever more massive databases has been accompanied by the need to process them efficiently. In many cases, most observations have the same behavior, while a probable small proportion of these observations are abnormal. Detecting the latter, defined as outliers, is one of the major challenges for machine learning applications (e.g. in fraud detection or in predictive maintenance). In this paper, we propose a methodology addressing the problem of outlier detection, by learning a data-driven scoring function defined on the feature space which reflects the degree of abnormality of the observations. This scoring function is learnt through a well-designed binary classification problem whose empirical criterion takes the form of a two-sample linear rank statistics on which theoretical results are available. We illustrate our methodology with preliminary encouraging numerical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2018

Ranking Data with Continuous Labels through Oriented Recursive Partitions

We formulate a supervised learning problem, referred to as continuous ra...
research
04/07/2021

Concentration Inequalities for Two-Sample Rank Processes with Application to Bipartite Ranking

The ROC curve is the gold standard for measuring the performance of a te...
research
02/05/2015

On Anomaly Ranking and Excess-Mass Curves

Learning how to rank multivariate unlabeled observations depending on th...
research
11/05/2020

Anomalous Sound Detection as a Simple Binary Classification Problem with Careful Selection of Proxy Outlier Examples

Unsupervised anomalous sound detection is concerned with identifying sou...
research
05/03/2017

Mass Volume Curves and Anomaly Ranking

This paper aims at formulating the issue of ranking multivariate unlabel...
research
07/30/2019

Prudence When Assuming Normality: an advice for machine learning practitioners

In a binary classification problem the feature vector (predictor) is the...
research
11/20/2022

Finding active galactic nuclei through Fink

We present the Active Galactic Nuclei (AGN) classifier as currently impl...

Please sign up or login with your details

Forgot password? Click here to reset