Fine-Grained Prediction of Syntactic Typology: Discovering Latent Structure with Supervised Learning

10/11/2017
by   Dingquan Wang, et al.
0

We show how to predict the basic word-order facts of a novel language given only a corpus of part-of-speech (POS) sequences. We predict how often direct objects follow their verbs, how often adjectives follow their nouns, and in general the directionalities of all dependency relations. Such typological properties could be helpful in grammar induction. While such a problem is usually regarded as unsupervised learning, our innovation is to treat it as supervised learning, using a large collection of realistic synthetic languages as training data. The supervised learner must identify surface features of a language's POS sequence (hand-engineered or neural features) that correlate with the language's deeper structure (latent trees). In the experiment, we show: 1) Given a small set of real languages, it helps to add many synthetic languages to the training data. 2) Our system is robust even when the POS sequences include noise. 3) Our system on this task outperforms a grammar induction baseline by a large margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2018

Bilingual Dictionary Induction for Bantu Languages

We present a method for learning bilingual translation dictionaries betw...
research
08/02/2017

Dependency Grammar Induction with Neural Lexicalization and Big Training Data

We study the impact of big models (in terms of the degree of lexicalizat...
research
05/26/2020

Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities

A novel approach to automated learning of syntactic rules governing natu...
research
05/24/2017

Matroids Hitting Sets and Unsupervised Dependency Grammar Induction

This paper formulates a novel problem on graphs: find the minimal subset...
research
10/05/2020

The Grammar of Emergent Languages

In this paper, we consider the syntactic properties of languages emerged...
research
08/22/2018

TreeGAN: Syntax-Aware Sequence Generation with Generative Adversarial Networks

Generative Adversarial Networks (GANs) have shown great capacity on imag...
research
01/27/2020

Unsupervised Program Synthesis for Images using Tree-Structured LSTM

Program synthesis has recently emerged as a promising approach to the im...

Please sign up or login with your details

Forgot password? Click here to reset