Syntax-Aware Language Modeling with Recurrent Neural Networks

03/02/2018
by   Duncan Blythe, et al.
0

Neural language models (LMs) are typically trained using only lexical features, such as surface forms of words. In this paper, we argue this deprives the LM of crucial syntactic signals that can be detected at high confidence using existing parsers. We present a simple but highly effective approach for training neural LMs using both lexical and syntactic information, and a novel approach for applying such LMs to unparsed text using sequential Monte Carlo sampling. In experiments on a range of corpora and corpus sizes, we show our approach consistently outperforms standard lexical LMs in character-level language modeling; on the other hand, for word-level models the models are on a par with standard language models. These results indicate potential for expanding LMs beyond lexical surface features to higher-level NLP features for character-level models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2018

Neural Lattice Language Models

In this work, we propose a new language modeling paradigm that has the a...
research
09/30/2019

Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power

Understanding the vulnerability of linguistic features extracted from no...
research
01/12/2017

A Data-Oriented Model of Literary Language

We consider the task of predicting how literary a text is, with a gold s...
research
11/13/2019

Word-level Lexical Normalisation using Context-Dependent Embeddings

Lexical normalisation (LN) is the process of correcting each word in a d...
research
06/13/2023

Tokenization with Factorized Subword Encoding

In recent years, language models have become increasingly larger and mor...
research
06/05/2022

Stylistic Fingerprints, POS-tags and Inflected Languages: A Case Study in Polish

In stylometric investigations, frequencies of the most frequent words (M...
research
02/28/2023

Information-Restricted Neural Language Models Reveal Different Brain Regions' Sensitivity to Semantics, Syntax and Context

A fundamental question in neurolinguistics concerns the brain regions in...

Please sign up or login with your details

Forgot password? Click here to reset