MicroNet for Efficient Language Modeling

05/16/2020
by   Zhongxia Yan, et al.
0

It is important to design compact language models for efficient deployment. We improve upon recent advances in both the language modeling domain and the model-compression domain to construct parameter and computation efficient language models. We use an efficient transformer-based architecture with adaptive embedding and softmax, differentiable non-parametric cache, Hebbian softmax, knowledge distillation, network pruning, and low-bit quantization. In this paper, we provide the winning solution to the NeurIPS 2019 MicroNet Challenge in the language modeling track. Compared to the baseline language model provided by the MicroNet Challenge, our model is 90 times more parameter-efficient and 36 times more computation-efficient while achieving the required test perplexity of 35 on the Wikitext-103 dataset. We hope that this work will aid future research into efficient language models, and we have released our full source code at https://github.com/mit-han-lab/neurips-micronet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2017

Self-organized Hierarchical Softmax

We propose a new self-organizing hierarchical softmax formulation for ne...
research
10/29/2018

Language Modeling with Sparse Product of Sememe Experts

Most language modeling methods rely on large-scale data to statistically...
research
09/17/2019

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Recent work in unsupervised language modeling demonstrates that training...
research
11/10/2017

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

We formulate language modeling as a matrix factorization problem, and sh...
research
09/12/2023

Do Generative Large Language Models need billions of parameters?

This paper presents novel systems and methodologies for the development ...
research
11/07/2017

Unbounded cache model for online language modeling with open vocabulary

Recently, continuous cache models were proposed as extensions to recurre...
research
07/12/2023

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

The computation necessary for training Transformer-based language models...

Please sign up or login with your details

Forgot password? Click here to reset