Mass Volume Curves and Anomaly Ranking

05/03/2017
by   Stephan Clémençon, et al.
0

This paper aims at formulating the issue of ranking multivariate unlabeled observations depending on their degree of abnormality as an unsupervised statistical learning task. In the 1-d situation, this problem is usually tackled by means of tail estimation techniques: univariate observations are viewed as all the more 'abnormal' as they are located far in the tail(s) of the underlying probability distribution. It would be desirable as well to dispose of a scalar valued 'scoring' function allowing for comparing the degree of abnormality of mul-tivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem by means of a novel functional performance criterion, referred to as the Mass Volume curve (MV curve in short), whose optimal elements are strictly increasing transforms of the density. We first study the statistical estimation of the MV curve of a given scoring function and we provide a strategy to build confidence regions using a smoothed bootstrap approach. Optimization of this functional criterion over the set of piecewise constant scoring functions is next tackled. This boils down to estimating a sequence of empirical minimum volume sets whose levels are chosen adaptively from the data, so as to adjust to the variations of the optimal MV curve, while controling the bias of its approximation by a stepwise curve. Generalization bounds are then established for the difference in sup norm between the MV curve of the empirical scoring function thus obtained and the optimal MV curve.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2015

On Anomaly Ranking and Excess-Mass Curves

Learning how to rank multivariate unlabeled observations depending on th...
research
01/17/2018

Ranking Data with Continuous Labels through Oriented Recursive Partitions

We formulate a supervised learning problem, referred to as continuous ra...
research
12/18/2013

Functional Bipartite Ranking: a Wavelet-Based Filtering Approach

It is the main goal of this article to address the bipartite ranking iss...
research
04/07/2021

Concentration Inequalities for Two-Sample Rank Processes with Application to Bipartite Ranking

The ROC curve is the gold standard for measuring the performance of a te...
research
12/17/2020

Binomial Tails for Community Analysis

An important task of community discovery in networks is assessing signif...
research
09/20/2021

Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics

The ability to collect and store ever more massive databases has been ac...
research
01/25/2022

Poisson's CDF applied to Flexible Skylines

The evolution of skyline and ranking queries has created new archetypes ...

Please sign up or login with your details

Forgot password? Click here to reset