Fine-Tuning Language Models via Epistemic Neural Networks

11/03/2022
by   Ian Osband, et al.
0

Large language models are now part of a powerful new paradigm in machine learning. These models learn a wide range of capabilities from training on large unsupervised text corpora. In many applications, these capabilities are then fine-tuned through additional training on specialized data to improve performance in that setting. In this paper, we augment these models with an epinet: a small additional network architecture that helps to estimate model uncertainty and form an epistemic neural network (ENN). ENNs are neural networks that can know what they don't know. We show that, using an epinet to prioritize uncertain data, we can fine-tune BERT on GLUE tasks to the same performance while using 2x less data. We also investigate performance in synthetic neural network generative models designed to build understanding. In each setting, using an epinet outperforms heuristic active learning schemes.

READ FULL TEXT
research
03/14/2022

Uncertainty Estimation for Language Reward Models

Language models can learn a range of capabilities from unsupervised trai...
research
10/08/2022

Understanding HTML with Large Language Models

Large language models (LLMs) have shown exceptional performance on a var...
research
06/19/2023

Fine-tuning Large Enterprise Language Models via Ontological Reasoning

Large Language Models (LLMs) exploit fine-tuning as a technique to adapt...
research
12/13/2022

Localized Latent Updates for Fine-Tuning Vision-Language Models

Although massive pre-trained vision-language models like CLIP show impre...
research
09/16/2023

Rethinking Learning Rate Tuning in the Era of Large Language Models

Large Language Models (LLMs) represent the recent success of deep learni...
research
07/19/2023

Generating Mathematical Derivations with Large Language Models

The derivation of mathematical results in specialised fields using Large...
research
03/06/2023

Spelling convention sensitivity in neural language models

We examine whether large neural language models, trained on very large c...

Please sign up or login with your details

Forgot password? Click here to reset