Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach

10/15/2018
by   Mastane Achab, et al.
0

Whereas most dimensionality reduction techniques (e.g. PCA, ICA, NMF) for multivariate data essentially rely on linear algebra to a certain extent, summarizing ranking data, viewed as realizations of a random permutation Σ on a set of items indexed by i∈{1,..., n}, is a great statistical challenge, due to the absence of vector space structure for the set of permutations S_n. It is the goal of this article to develop an original framework for possibly reducing the number of parameters required to describe the distribution of a statistical population composed of rankings/permutations, on the premise that the collection of items under study can be partitioned into subsets/buckets, such that, with high probability, items in a certain bucket are either all ranked higher or else all ranked lower than items in another bucket. In this context, Σ's distribution can be hopefully represented in a sparse manner by a bucket distribution, i.e. a bucket ordering plus the ranking distributions within each bucket. More precisely, we introduce a dedicated distortion measure, based on a mass transportation metric, in order to quantify the accuracy of such representations. The performance of buckets minimizing an empirical version of the distortion is investigated through a rate bound analysis. Complexity penalization techniques are also considered to select the shape of a bucket order with minimum expected distortion. Beyond theoretical concepts and results, numerical experiments on real ranking data are displayed in order to provide empirical evidence of the relevance of the approach promoted.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2022

Statistical Depth Functions for Ranking Distributions: Definitions, Statistical Learning and Applications

The concept of median/consensus has been widely investigated in order to...
research
04/12/2021

On the Linear Ordering Problem and the Rankability of Data

In 2019, Anderson et al. proposed the concept of rankability, which refe...
research
03/22/2023

Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues

As the issue of robustness in AI systems becomes vital, statistical lear...
research
06/21/2022

Developing a Ranking Problem Library (RPLIB) from a data-oriented perspective

We present an improved library for the ranking problem called RPLIB. RPL...
research
05/31/2023

Decomposition and Interleaving for Variance Reduction of Post-click Metrics

In this study, we propose an efficient method for comparing the post-cli...
research
03/03/2021

Minimum-Distortion Embedding

We consider the vector embedding problem. We are given a finite set of i...
research
08/29/2022

SemanticAxis: Exploring Multi-attribute Data by Semantics Construction and Ranking Analysis

Mining the distribution of features and sorting items by combined attrib...

Please sign up or login with your details

Forgot password? Click here to reset