Grammar Induction with Neural Language Models: An Unusual Replication

08/29/2018
by   Phu Mon Htut, et al.
0

A substantial thread of recent work on latent tree learning has attempted to develop neural network models with parse-valued latent variables and train them on non-parsing tasks, in the hope of having them discover interpretable tree structure. In a recent paper, Shen et al. (2018) introduce such a model and report near-state-of-the-art results on the target task of language modeling, and the first strong latent tree learning result on constituency parsing. In an attempt to reproduce these results, we discover issues that make the original results hard to trust, including tuning and even training on what is effectively the test set. Here, we attempt to reproduce these results in a fair experiment and to extend them to two new datasets. We find that the results of this work are robust: All variants of the model under study outperform all latent tree learning baselines, and perform competitively with symbolic grammar induction systems. We find that this model represents the first empirical success for latent tree learning, and that neural network language modeling warrants further study as a setting for grammar induction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2019

Unsupervised Recurrent Neural Network Grammars

Recurrent neural network grammars (RNNG) are generative models of langua...
research
12/01/2020

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

There are two major classes of natural language grammars – the dependenc...
research
09/22/2019

Inducing Constituency Trees through Neural Machine Translation

Latent tree learning(LTL) methods learn to parse sentences using only in...
research
09/10/2018

Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction

There have been several recent attempts to improve the accuracy of gramm...
research
09/21/2021

Something Old, Something New: Grammar-based CCG Parsing with Transformer Models

This report describes the parsing problem for Combinatory Categorial Gra...
research
11/28/2021

FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding

Inducing latent tree structures from sequential data is an emerging tren...
research
06/05/2023

Improving Grammar-based Sequence-to-Sequence Modeling with Decomposition and Constraints

Neural QCFG is a grammar-based sequence-tosequence (seq2seq) model with ...

Please sign up or login with your details

Forgot password? Click here to reset