Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning

05/08/2021
by   Benjamin Minixhofer, et al.
0

Transfer learning has become the dominant paradigm for many natural language processing tasks. In addition to models being pretrained on large datasets, they can be further trained on intermediate (supervised) tasks that are similar to the target task. For small Natural Language Inference (NLI) datasets, language modelling is typically followed by pretraining on a large (labelled) NLI dataset before fine-tuning with each NLI subtask. In this work, we explore Gradient Boosted Decision Trees (GBDTs) as an alternative to the commonly used Multi-Layer Perceptron (MLP) classification head. GBDTs have desirable properties such as good performance on dense, numerical features and are effective where the ratio of the number of samples w.r.t the number of features is low. We then introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network. We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model (RoBERTa-large with MNLI pretraining). The FreeGBDT shows a consistent improvement over the MLP classification head.

READ FULL TEXT
research
07/19/2023

Gradient Sparsification For Masked Fine-Tuning of Transformers

Fine-tuning pretrained self-supervised language models is widely adopted...
research
10/30/2022

Parameter-Efficient Tuning Makes a Good Classification Head

In recent years, pretrained models revolutionized the paradigm of natura...
research
03/07/2022

Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models

Recently the prompt-tuning paradigm has attracted significant attention....
research
12/29/2021

Fine-Tuning Transformers: Vocabulary Transfer

Transformers are responsible for the vast majority of recent advances in...
research
01/09/2021

Task Adaptive Pretraining of Transformers for Hostility Detection

Identifying adverse and hostile content on the web and more particularly...
research
02/14/2020

HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing

Computation-intensive pretrained models have been taking the lead of man...
research
08/01/2022

giMLPs: Gate with Inhibition Mechanism in MLPs

This paper presents a new model architecture, gate with inhibition MLP (...

Please sign up or login with your details

Forgot password? Click here to reset