Cross-Linguistic Syntactic Evaluation of Word Prediction Models

05/01/2020
by   Aaron Mueller, et al.
0

A range of studies have concluded that neural word prediction models can distinguish grammatical from ungrammatical sentences with high accuracy. However, these studies are based primarily on monolingual evidence from English. To investigate how these models' ability to learn syntax varies by language, we introduce CLAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models. CLAMS includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars we develop. We use CLAMS to evaluate LSTM language models as well as monolingual and multilingual BERT. Across languages, monolingual LSTMs achieved high accuracy on dependencies without attractors, and generally poor accuracy on agreement across object relative clauses. On other constructions, agreement accuracy was generally higher in languages with richer morphology. Multilingual models generally underperformed monolingual models. Multilingual BERT showed high syntactic accuracy on English, but noticeable deficiencies in other languages.

READ FULL TEXT
research
05/10/2021

Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models

Multilingual Transformer-based language models, usually pretrained on mo...
research
03/01/2021

Vyākarana: A Colorless Green Benchmark for Syntactic Evaluation in Indic Languages

While there has been significant progress towards developing NLU dataset...
research
03/15/2019

Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages

How do typological properties such as word order and morphological case ...
research
01/26/2021

Attention Can Reflect Syntactic Structure (If You Let It)

Since the popularization of the Transformer as a general-purpose feature...
research
04/06/2020

An analysis of the utility of explicit negative examples to improve the syntactic abilities of neural language models

We explore the utilities of explicit negative examples in training neura...
research
09/21/2021

Are Transformers a Modern Version of ELIZA? Observations on French Object Verb Agreement

Many recent works have demonstrated that unsupervised sentence represent...

Please sign up or login with your details

Forgot password? Click here to reset