TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning

07/26/2023
by   Yury Gorishniy, et al.
0

Deep learning (DL) models for tabular data problems are receiving increasingly more attention, while the algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution. Following the recent trends in other domains, such as natural language processing and computer vision, several retrieval-augmented tabular DL models have been recently proposed. For a given target object, a retrieval-based model retrieves other relevant objects, such as the nearest neighbors, from the available (training) data and uses their features or even labels to make a better prediction. However, we show that the existing retrieval-based tabular DL solutions provide only minor, if any, benefits over the properly tuned simple retrieval-free baselines. Thus, it remains unclear whether the retrieval-based approach is a worthy direction for tabular DL. In this work, we give a strong positive answer to this question. We start by incrementally augmenting a simple feed-forward architecture with an attention-like retrieval component similar to those of many (tabular) retrieval-based models. Then, we highlight several details of the attention mechanism that turn out to have a massive impact on the performance on tabular data problems, but that were not explored in prior work. As a result, we design TabR – a simple retrieval-based tabular DL model which, on a set of public benchmarks, demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed “GBDT-friendly” benchmark (see the first figure).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2021

Revisiting Deep Learning Models for Tabular Data

The necessity of deep learning for tabular data is still an unanswered q...
research
10/16/2020

Semantics of the Black-Box: Can knowledge graphs help make deep learning systems more interpretable and explainable?

The recent series of innovations in deep learning (DL) have shown enormo...
research
03/10/2022

On Embeddings for Numerical Features in Tabular Deep Learning

Recently, Transformer-like deep architectures have shown strong performa...
research
12/13/2018

DeepCruiser: Automated Guided Testing for Stateful Deep Learning Systems

Deep learning (DL) defines a data-driven programming paradigm that autom...
research
07/07/2022

Revisiting Pretraining Objectives for Tabular Deep Learning

Recent deep learning models for tabular data currently compete with the ...
research
04/16/2021

An Analysis of a BERT Deep Learning Strategy on a Technology Assisted Review Task

Document screening is a central task within Evidenced Based Medicine, wh...
research
10/06/2022

Generalization Properties of Retrieval-based Models

Many modern high-performing machine learning models such as GPT-3 primar...

Please sign up or login with your details

Forgot password? Click here to reset