Unsupervised Parsing via Constituency Tests

10/07/2020
by   Steven Cao, et al.
0

We propose a method for unsupervised parsing based on the linguistic notion of a constituency test. One type of constituency test involves modifying the sentence via some transformation (e.g. replacing the span with a pronoun) and then judging the result (e.g. checking if it is grammatical). Motivated by this idea, we design an unsupervised parser by specifying a set of transformations and using an unsupervised neural acceptability model to make grammaticality decisions. To produce a tree given a sentence, we score each span by aggregating its constituency test judgments, and we choose the binary tree with the highest total score. While this approach already achieves performance in the range of current methods, we further improve accuracy by fine-tuning the grammaticality model through a refinement procedure, where we alternate between improving the estimated trees and improving the grammaticality model. The refined model achieves 62.8 F1 on the Penn Treebank test set, an absolute improvement of 7.6 points over the previous best published result.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers

Recent advancements in pre-trained language models (PLMs) have demonstra...
research
10/05/2021

Co-training an Unsupervised Constituency Parser with Weak Supervision

We introduce a method for unsupervised parsing that relies on bootstrapp...
research
04/03/2019

Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders

We introduce deep inside-outside recursive autoencoders (DIORA), a fully...
research
08/10/2021

Headed Span-Based Projective Dependency Parsing

We propose a headed span-based method for projective dependency parsing....
research
04/30/2020

A Span-based Linearization for Constituent Trees

We propose a novel linearization of a constituent tree, together with a ...
research
09/10/2021

Improved Latent Tree Induction with Distant Supervision via Span Constraints

For over thirty years, researchers have developed and analyzed methods f...
research
03/01/2022

Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation

Recently CKY-based models show great potential in unsupervised grammar i...

Please sign up or login with your details

Forgot password? Click here to reset