Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

07/31/2023
by   Charlie Hou, et al.
0

While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem, which often produces vast amounts of unlabeled data. In this work, we study whether unsupervised pretraining can improve LTR performance over GBDTs and other non-pretrained models. Using simple design choices–including SimCLR-Rank, our ranking-specific modification of SimCLR (an unsupervised pretraining method for images)–we produce pretrained deep learning models that soundly outperform GBDTs (and other non-pretrained models) in the case where labeled data is vastly outnumbered by unlabeled data. We also show that pretrained models also often achieve significantly better robustness than non-pretrained models (GBDTs or DL models) in ranking outlier data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2022

Revisiting Pretraining Objectives for Tabular Deep Learning

Recent deep learning models for tabular data currently compete with the ...
research
07/13/2021

Domain-Irrelevant Representation Learning for Unsupervised Domain Generalization

Domain generalization (DG) aims to help models trained on a set of sourc...
research
04/23/2020

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Language models pretrained on text from a wide variety of sources form t...
research
07/29/2020

Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining

This paper studies composer style classification of piano sheet music im...
research
11/25/2021

Unsupervised Feature Ranking via Attribute Networks

The need for learning from unlabeled data is increasing in contemporary ...
research
08/24/2022

Contrastive learning-based pretraining improves representation and transferability of diabetic retinopathy classification models

Self supervised contrastive learning based pretraining allows developmen...
research
05/21/2023

Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio

To perform automatic family audio analysis, past studies have collected ...

Please sign up or login with your details

Forgot password? Click here to reset