Log In Sign Up

Revisiting the Hierarchical Multiscale LSTM

by   Akos Kadar, et al.

Hierarchical Multiscale LSTM (Chung et al., 2016a) is a state-of-the-art language model that learns interpretable structure from character-level input. Such models can provide fertile ground for (cognitive) computational linguistics studies. However, the high complexity of the architecture, training procedure and implementations might hinder its applicability. We provide a detailed reproduction and ablation study of the architecture, shedding light on some of the potential caveats of re-purposing complex deep-learning architectures. We further show that simplifying certain aspects of the architecture can in fact improve its performance. We also investigate the linguistic units (segments) learned by various levels of the model, and argue that their quality does not correlate with the overall performance of the model on language modeling.


page 1

page 2

page 3

page 4


Finnish Language Modeling with Deep Transformer Models

Transformers have recently taken the center stage in language modeling a...

When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting

Since the seminal work of Mikolov et al. (2013a) and Bojanowski et al. (...

Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis

Recent work using auxiliary prediction task classifiers to investigate t...

Automatic Business Process Structure Discovery using Ordered Neurons LSTM: A Preliminary Study

Automatic process discovery from textual process documentations is highl...

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

Scene text spotting is of great importance to the computer vision commun...

An Entropy-Based Model for Hierarchical Learning

Machine learning is the dominant approach to artificial intelligence, th...