Log In Sign Up

RuSentEval: Linguistic Source, Encoder Force!

by   Vladislav Mikhailov, et al.

The success of pre-trained transformer language models has brought a great deal of interest on how these models work, and what they learn about language. However, prior research in the field is mainly devoted to English, and little is known regarding other languages. To this end, we introduce RuSentEval, an enhanced set of 14 probing tasks for Russian, including ones that have not been explored yet. We apply a combination of complementary probing methods to explore the distribution of various linguistic properties in five multilingual transformers for two typologically contrasting languages – Russian and English. Our results provide intriguing findings that contradict the common understanding of how linguistic knowledge is represented, and demonstrate that some properties are learned in a similar manner despite the language differences.


page 7

page 8

page 21

page 22

page 23


Morph Call: Probing Morphosyntactic Content of Multilingual Transformers

The outstanding performance of transformer-based language models on a gr...

Is neural language acquisition similar to natural? A chronological probing study

The probing methodology allows one to obtain a partial representation of...

Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution

Hateful meme detection is a new research area recently brought out that ...

Transformers in the loop: Polarity in neural models of language

Representation of linguistic phenomena in computational language models ...

On Language Models for Creoles

Creole languages such as Nigerian Pidgin English and Haitian Creole are ...

How BPE Affects Memorization in Transformers

Training data memorization in NLP can both be beneficial (e.g., closed-b...

A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies

The analysis of data in which multiple languages are represented has gai...