MML: Maximal Multiverse Learning for Robust Fine-Tuning of Language Models

11/05/2019
by   Itzik Malkiel, et al.
0

Recent state-of-the-art language models utilize a two-phase training procedure comprised of (i) unsupervised pre-training on unlabeled text, and (ii) fine-tuning for a specific supervised task. More recently, many studies have been focused on trying to improve these models by enhancing the pre-training phase, either via better choice of hyperparameters or by leveraging an improved formulation. However, the pre-training phase is computationally expensive and often done on private datasets. In this work, we present a method that leverages BERT's fine-tuning phase to its fullest, by applying an extensive number of parallel classifier heads, which are enforced to be orthogonal, while adaptively eliminating the weaker heads during training. Our method allows the model to converge to an optimal number of parallel classifiers, depending on the given dataset at hand. We conduct an extensive inter- and intra-dataset evaluations, showing that our method improves the robustness of BERT, sometimes leading to a +9% gain in accuracy. These results highlight the importance of a proper fine-tuning procedure, especially for relatively smaller-sized datasets. Our code is attached as supplementary and our models will be made completely public.

READ FULL TEXT

page 3

page 7

research
05/25/2022

Memorization in NLP Fine-tuning Methods

Large language models are shown to present privacy risks through memoriz...
research
08/15/2019

Visualizing and Understanding the Effectiveness of BERT

Language model pre-training, such as BERT, has achieved remarkable resul...
research
12/31/2020

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

Deep, heavily overparameterized language models such as BERT, XLNet and ...
research
06/05/2016

What is the Best Feature Learning Procedure in Hierarchical Recognition Architectures?

(This paper was written in November 2011 and never published. It is post...
research
01/19/2015

Statistical-mechanical analysis of pre-training and fine tuning in deep learning

In this paper, we present a statistical-mechanical analysis of deep lear...
research
08/03/2023

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

Mathematical reasoning is a challenging task for large language models (...
research
01/30/2023

CSDR-BERT: a pre-trained scientific dataset match model for Chinese Scientific Dataset Retrieval

As the number of open and shared scientific datasets on the Internet inc...

Please sign up or login with your details

Forgot password? Click here to reset