Noise tolerance of learning to rank under class-conditional label noise

08/03/2022
by   Dany Haddad, et al.
0

Often, the data used to train ranking models is subject to label noise. For example, in web-search, labels created from clickstream data are noisy due to issues such as insufficient information in item descriptions on the SERP, query reformulation by the user, and erratic or unexpected user behavior. In practice, it is difficult to handle label noise without making strong assumptions about the label generation process. As a result, practitioners typically train their learning-to-rank (LtR) models directly on this noisy data without additional consideration of the label noise. Surprisingly, we often see strong performance from LtR models trained in this way. In this work, we describe a class of noise-tolerant LtR losses for which empirical risk minimization is a consistent procedure, even in the context of class-conditional label noise. We also develop noise-tolerant analogs of commonly used loss functions. The practical implications of our theoretical findings are further supported by experimental results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2023

Binary Classification with Instance and Label Dependent Label Noise

Learning with label dependent label noise has been extensively explored ...
research
05/16/2021

CCMN: A General Framework for Learning with Class-Conditional Multi-Label Noise

Class-conditional noise commonly exists in machine learning tasks, where...
research
10/07/2021

Robustness and reliability when training with noisy labels

Labelling of data for supervised learning can be costly and time-consumi...
research
10/31/2019

Confident Learning: Estimating Uncertainty in Dataset Labels

Learning exists in the context of data, yet notions of confidence typica...
research
10/28/2022

The Fisher-Rao Loss for Learning under Label Noise

Choosing a suitable loss function is essential when learning by empirica...
research
12/06/2018

Theoretical Guarantees of Deep Embedding Losses Under Label Noise

Collecting labeled data to train deep neural networks is costly and even...
research
10/13/2020

Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

Noisy data, crawled from the web or supplied by volunteers such as Mecha...

Please sign up or login with your details

Forgot password? Click here to reset