Selecting Informative Contexts Improves Language Model Finetuning

05/01/2020
by   Richard Antonello, et al.
0

We present a general finetuning meta-method that we call information gain filtration for improving the overall training efficiency and final performance of language model finetuning. This method uses a secondary learner which attempts to quantify the benefit of finetuning the language model on each given example. During the finetuning process, we use this learner to decide whether or not each given example should be trained on or skipped. We show that it suffices for this learner to be simple and that the finetuning process itself is dominated by the relatively trivial relearning of a new unigram frequency distribution over the modelled language domain, a process which the learner aids. Our method trains to convergence using 40 finetuning, and achieves a median perplexity of 54.0 on a books dataset compared to a median perplexity of 57.3 for standard finetuning using the same neural architecture.

READ FULL TEXT
research
03/28/2018

Meta-Learning a Dynamical Language Model

We consider the task of word-level language modeling and study the possi...
research
08/30/2023

Prompting Vision Language Model with Knowledge from Large Language Model for Knowledge-Based VQA

Knowledge-based visual question answering is a very challenging and wide...
research
10/24/2018

Universal Language Model Fine-Tuning with Subword Tokenization for Polish

Universal Language Model for Fine-tuning [arXiv:1801.06146] (ULMFiT) is ...
research
04/12/2022

Mining Logical Event Schemas From Pre-Trained Language Models

We present NESL (the Neuro-Episodic Schema Learner), an event schema lea...
research
08/16/2023

FootGPT : A Large Language Model Development Experiment on a Minimal Setting

With recent empirical observations, it has been argued that the most sig...
research
02/13/2020

CBAG: Conditional Biomedical Abstract Generation

Biomedical research papers use significantly different language and jarg...
research
05/15/2018

Continuous Learning in a Hierarchical Multiscale Neural Network

We reformulate the problem of encoding a multi-scale representation of a...

Please sign up or login with your details

Forgot password? Click here to reset