Sharpness-Aware Minimization Improves Language Model Generalization

10/16/2021
by   Dara Bahri, et al.
1

The allure of superhuman-level capabilities has led to considerable interest in language models like GPT-3 and T5, wherein the research has, by and large, revolved around new model architectures, training tasks, and loss objectives, along with substantial engineering efforts to scale up model capacity and dataset size. Comparatively little work has been done to improve the generalization of these models through better optimization. In this work, we show that Sharpness-Aware Minimization (SAM), a recently proposed optimization procedure that encourages convergence to flatter minima, can substantially improve the generalization of language models without much computational overhead. We show that SAM is able to boost performance on SuperGLUE, GLUE, Web Questions, Natural Questions, Trivia QA, and TyDiQA, with particularly large gains when training data for these tasks is limited.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2022

Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

Fine-tuning large pretrained language models on a limited training corpu...
research
08/14/2022

Model Generalization: A Sharpness Aware Optimization Perspective

Sharpness-Aware Minimization (SAM) and adaptive sharpness-aware minimiza...
research
05/25/2022

ER-TEST: Evaluating Explanation Regularization Methods for NLP Models

Neural language models' (NLMs') reasoning processes are notoriously hard...
research
10/13/2022

GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization

Recently, Sharpness-Aware Minimization (SAM) algorithm has shown state-o...
research
07/12/2023

A Comprehensive Overview of Large Language Models

Large Language Models (LLMs) have shown excellent generalization capabil...
research
09/14/2023

Gradient constrained sharpness-aware prompt learning for vision-language models

This paper targets a novel trade-off problem in generalizable prompt lea...
research
03/05/2022

Towards Efficient and Scalable Sharpness-Aware Minimization

Recently, Sharpness-Aware Minimization (SAM), which connects the geometr...

Please sign up or login with your details

Forgot password? Click here to reset