Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models

08/04/2022
by   Vilém Zouhar, et al.
0

Although masked language models are highly performant and widely adopted by NLP practitioners, they can not be easily used for autoregressive language modelling (next word prediction and sequence probability estimation). We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion (e.g. concatenation) to obtain a richer context representation for language modelling. We find that fusion helps reliably in lowering the perplexity (16.74 → 15.80), which is even preserved after a transfer to a dataset from a different domain than the training data. We also evaluate the best-performing fusion model by correlating its next word surprisal estimates with human reading times. Contradicting our expectation, and despite the improvement in perplexity overall, the correlation remains the same as for the baseline model. Lastly, while we focus on language models pre-trained on text as the sources for the fusion, our approach can be possibly extended to fuse any information represented as a fixed-size vector into an auto-regressive language model. These include e.g. sentence external information retrieved for a knowledge base or representations of multi-modal encoders.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2015

Larger-Context Language Modelling

In this work, we propose a novel method to incorporate corpus-level disc...
research
12/10/2020

Multi-Sense Language Modelling

The effectiveness of a language model is influenced by its token represe...
research
08/21/2017

Cold Fusion: Training Seq2Seq Models Together with Language Models

Sequence-to-sequence (Seq2Seq) models with attention have excelled at ta...
research
12/21/2019

Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture

Sequence-to-sequence models have recently become very popular for tackli...
research
02/28/2020

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

We propose to pre-train a unified language model for both autoencoding a...
research
04/22/2023

Transformer-Based LM Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens

Recent psycholinguistic studies have drawn conflicting conclusions about...
research
02/23/2019

Fixed-Size Ordinally Forgetting Encoding Based Word Sense Disambiguation

In this paper, we present our method of using fixed-size ordinally forge...

Please sign up or login with your details

Forgot password? Click here to reset