Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion

04/19/2022
by   Shunsuke Kando, et al.
0

Incorporating stronger syntactic biases into neural language models (LMs) is a long-standing goal, but research in this area often focuses on modeling English text, where constituent treebanks are readily available. Extending constituent tree-based LMs to the multilingual setting, where dependency treebanks are more common, is possible via dependency-to-constituency conversion methods. However, this raises the question of which tree formats are best for learning the model, and for which languages. We investigate this question by training recurrent neural network grammars (RNNGs) using various conversion methods, and evaluating them empirically in a multilingual setting. We examine the effect on LM performance across nine conversion methods and five languages through seven types of syntactic tests. On average, the performance of our best model represents a 19 % increase in accuracy over the worst choice across all languages. Our best model shows the advantage over sequential/overparameterized LMs, suggesting the positive effect of syntax injection in a multilingual setting. Our experiments highlight the importance of choosing the right tree formalism, and provide insights into making an informed decision.

READ FULL TEXT
research
03/01/2021

Vyākarana: A Colorless Green Benchmark for Syntactic Evaluation in Indic Languages

While there has been significant progress towards developing NLU dataset...
research
11/11/2020

Multilingual Irony Detection with Dependency Syntax and Neural Models

This paper presents an in-depth investigation of the effectiveness of de...
research
10/02/2020

Syntax Representation in Word Embeddings and Neural Networks – A Survey

Neural networks trained on natural language processing tasks capture syn...
research
05/09/2020

Finding Universal Grammatical Relations in Multilingual BERT

Recent work has found evidence that Multilingual BERT (mBERT), a transfo...
research
09/01/2019

Syntax-aware Multilingual Semantic Role Labeling

Recently, semantic role labeling (SRL) has earned a series of success wi...
research
08/17/2017

Towards Syntactic Iberian Polarity Classification

Lexicon-based methods using syntactic rules for polarity classification ...
research
10/01/2022

CGELBank: CGEL as a Framework for English Syntax Annotation

We introduce the syntactic formalism of the Cambridge Grammar of the Eng...

Please sign up or login with your details

Forgot password? Click here to reset