Sparse*BERT: Sparse Models are Robust

05/25/2022
by   Daniel Campos, et al.
0

Large Language Models have become the core architecture upon which most modern natural language processing (NLP) systems build. These models can consistently deliver impressive accuracy and robustness across tasks and domains, but their high computational overhead can make inference difficult and expensive. To make the usage of these models less costly recent work has explored leveraging structured and unstructured pruning, quantization, and distillation as ways to improve inference speed and decrease size. This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks. Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches. We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text. Moreover, we show that SparseBioBERT can match the quality of BioBERT with only 10% of the parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2022

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Pre-trained Transformer-based language models have become a key building...
research
03/30/2023

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

In this paper, we introduce the range of oBERTa language models, an easy...
research
10/10/2019

Structured Pruning of Large Language Models

Large language models have recently achieved state of the art performanc...
research
02/07/2023

ZipLM: Hardware-Aware Structured Pruning of Language Models

The breakthrough performance of large language models (LLMs) comes with ...
research
11/26/2021

How Well Do Sparse Imagenet Models Transfer?

Transfer learning is a classic paradigm by which models pretrained on la...
research
09/17/2021

New Students on Sesame Street: What Order-Aware Matrix Embeddings Can Learn from BERT

Large-scale pretrained language models (PreLMs) are revolutionizing natu...
research
01/26/2023

Break It Down: Evidence for Structural Compositionality in Neural Networks

Many tasks can be described as compositions over subroutines. Though mod...

Please sign up or login with your details

Forgot password? Click here to reset