Log In Sign Up

Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

by   Ekaterina Taktasheva, et al.

Recent research has adopted a new experimental field centered around the concept of text perturbations which has revealed that shuffled word order has little to no impact on the downstream performance of Transformer-based language models across many NLP tasks. These findings contradict the common understanding of how the models encode hierarchical and structural information and even question if the word order is modeled with position embeddings. To this end, this paper proposes nine probing datasets organized by the type of controllable text perturbation for three Indo-European languages with a varying degree of word order flexibility: English, Swedish and Russian. Based on the probing analysis of the M-BERT and M-BART models, we report that the syntactic sensitivity depends on the language and model pre-training objectives. We also find that the sensitivity grows across layers together with the increase of the perturbation granularity. Last but not least, we show that the models barely use the positional information to induce syntactic trees from their intermediate self-attention and contextualized representations.


page 6

page 15

page 16

page 19


ParsBERT: Transformer-based Model for Persian Language Understanding

The surge of pre-trained language models has begun a new era in the fiel...

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

A possible explanation for the impressive performance of masked language...

Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models

Multilingual Transformer-based language models, usually pretrained on mo...

Demystifying Neural Language Models' Insensitivity to Word-Order

Recent research analyzing the sensitivity of natural language understand...

Inducing Syntactic Trees from BERT Representations

We use the English model of BERT and explore how a deletion of one word ...

The Role of Complex NLP in Transformers for Text Ranking?

Even though term-based methods such as BM25 provide strong baselines in ...