Log In Sign Up

Syntax-Aware Language Modeling with Recurrent Neural Networks

by   Duncan Blythe, et al.

Neural language models (LMs) are typically trained using only lexical features, such as surface forms of words. In this paper, we argue this deprives the LM of crucial syntactic signals that can be detected at high confidence using existing parsers. We present a simple but highly effective approach for training neural LMs using both lexical and syntactic information, and a novel approach for applying such LMs to unparsed text using sequential Monte Carlo sampling. In experiments on a range of corpora and corpus sizes, we show our approach consistently outperforms standard lexical LMs in character-level language modeling; on the other hand, for word-level models the models are on a par with standard language models. These results indicate potential for expanding LMs beyond lexical surface features to higher-level NLP features for character-level models.


page 1

page 2

page 3

page 4


Neural Lattice Language Models

In this work, we propose a new language modeling paradigm that has the a...

Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power

Understanding the vulnerability of linguistic features extracted from no...

A Data-Oriented Model of Literary Language

We consider the task of predicting how literary a text is, with a gold s...

Word-level Lexical Normalisation using Context-Dependent Embeddings

Lexical normalisation (LN) is the process of correcting each word in a d...

Stylistic Fingerprints, POS-tags and Inflected Languages: A Case Study in Polish

In stylometric investigations, frequencies of the most frequent words (M...

Reranking Machine Translation Hypotheses with Structured and Web-based Language Models

In this paper, we investigate the use of linguistically motivated and co...

Incorporating Stylistic Lexical Preferences in Generative Language Models

While recent advances in language modeling have resulted in powerful gen...