On Medians of (Randomized) Pairwise Means

11/01/2022
by   Pierre Laforgue, et al.
0

Tournament procedures, recently introduced in Lugosi Mendelson (2016), offer an appealing alternative, from a theoretical perspective at least, to the principle of Empirical Risk Minimization in machine learning. Statistical learning by Median-of-Means (MoM) basically consists in segmenting the training data into blocks of equal size and comparing the statistical performance of every pair of candidate decision rules on each data block: that with highest performance on the majority of the blocks is declared as the winner. In the context of nonparametric regression, functions having won all their duels have been shown to outperform empirical risk minimizers w.r.t. the mean squared error under minimal assumptions, while exhibiting robustness properties. It is the purpose of this paper to extend this approach in order to address other learning problems, in particular for which the performance criterion takes the form of an expectation over pairs of observations rather than over one single observation, as may be the case in pairwise ranking, clustering or metric learning. Precisely, it is proved here that the bounds achieved by MoM are essentially conserved when the blocks are built by means of independent sampling without replacement schemes instead of a simple segmentation. These results are next extended to situations where the risk is related to a pairwise loss function and its empirical counterpart is of the form of a U-statistic. Beyond theoretical results guaranteeing the performance of the learning/estimation methods proposed, some numerical experiments provide empirical evidence of their relevance in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2020

How Robust is the Median-of-Means? Concentration Bounds in Presence of Outliers

In contrast to the empirical mean, the Median-of-Means (MoM) is an estim...
research
06/21/2019

Trade-offs in Large-Scale Distributed Tuplewise Estimation and Learning

The development of cluster computing frameworks has allowed practitioner...
research
01/12/2015

Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics

In a wide range of statistical learning problems such as ranking, cluste...
research
08/20/2018

The Mismatch Principle: Statistical Learning Under Large Model Uncertainties

We study the learning capacity of empirical risk minimization with regar...
research
06/21/2019

On Tree-based Methods for Similarity Learning

In many situations, the choice of an adequate similarity measure or metr...
research
02/12/2020

Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling

We consider statistical learning problems, when the distribution P' of t...
research
03/06/2023

On Regression in Extreme Regions

In the classic regression problem, the value of a real-valued random var...

Please sign up or login with your details

Forgot password? Click here to reset