Multi-timescale representation learning in LSTM Language Models

09/27/2020
by   Shivangi Mahto, et al.
0

Although neural language models are effective at capturing statistics of natural language, their representations are challenging to interpret. In particular, it is unclear how these models retain information over multiple timescales. In this work, we construct explicitly multi-timescale language models by manipulating the input and forget gate biases in a long short-term memory (LSTM) network. The distribution of timescales is selected to approximate power law statistics of natural language through a combination of exponentially decaying memory cells. We then empirically analyze the timescale of information routed through each part of the model using word ablation experiments and forget gate visualizations. These experiments show that the multi-timescale model successfully learns representations at the desired timescales, and that the distribution includes longer timescales than a standard LSTM. Further, information about high-,mid-, and low-frequency words is routed preferentially through units with the appropriate timescales. Thus we show how to construct language models with interpretable representations of different information timescales.

READ FULL TEXT
research
11/15/2018

Multi-cell LSTM Based Neural Language Model

Language models, being at the heart of many NLP problems, are always of ...
research
11/03/2022

Logographic Information Aids Learning Better Representations for Natural Language Inference

Statistical language models conventionally implement representation lear...
research
10/10/2018

Persistence pays off: Paying Attention to What the LSTM Gating Mechanism Persists

Language Models (LMs) are important components in several Natural Langua...
research
10/06/2020

LSTMs Compose (and Learn) Bottom-Up

Recent work in NLP shows that LSTM language models capture hierarchical ...
research
10/24/2022

Characterizing Verbatim Short-Term Memory in Neural Language Models

When a language model is trained to predict natural language sequences, ...
research
12/12/2020

Mapping the Timescale Organization of Neural Language Models

In the human brain, sequences of language input are processed within a d...
research
09/23/2021

LSTM Hyper-Parameter Selection for Malware Detection: Interaction Effects and Hierarchical Selection Approach

Long-Short-Term-Memory (LSTM) networks have shown great promise in artif...

Please sign up or login with your details

Forgot password? Click here to reset