Efficient Contextual Representation Learning Without Softmax Layer

02/28/2019
by   Liunian Harold Li, et al.
0

Contextual representation models have achieved great success in improving various downstream tasks. However, these language-model-based encoders are difficult to train due to the large parameter sizes and high computational complexity. By carefully examining the training procedure, we find that the softmax layer (the output layer) causes significant inefficiency due to the large vocabulary size. Therefore, we redesign the learning objective and propose an efficient framework for training contextual representation models. Specifically, the proposed approach bypasses the softmax layer by performing language modeling with dimension reduction, and allows the models to leverage pre-trained word embeddings. Our framework reduces the time spent on the output layer to a negligible level, eliminates almost all the trainable parameters of the softmax layer and performs language modeling without truncating the vocabulary. When applied to ELMo, our method achieves a 4 times speedup and eliminates 80 downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2021

Improving Knowledge Graph Representation Learning by Structure Contextual Pre-training

Representation learning models for Knowledge Graphs (KG) have proven to ...
research
07/26/2017

Self-organized Hierarchical Softmax

We propose a new self-organizing hierarchical softmax formulation for ne...
research
11/10/2017

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

We formulate language modeling as a matrix factorization problem, and sh...
research
12/11/2022

A Study of Slang Representation Methods

Warning: this paper contains content that may be offensive or upsetting....
research
10/19/2018

Real-time Neural-based Input Method

The input method is an essential service on every mobile and desktop dev...
research
06/26/2016

Exact gradient updates in time independent of output size for the spherical loss family

An important class of problems involves training deep neural networks wi...
research
06/11/2021

Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

A well-known limitation in pretrain-finetune paradigm lies in its inflex...

Please sign up or login with your details

Forgot password? Click here to reset