RuSentEval: Linguistic Source, Encoder Force!

02/28/2021
by   Vladislav Mikhailov, et al.
0

The success of pre-trained transformer language models has brought a great deal of interest on how these models work, and what they learn about language. However, prior research in the field is mainly devoted to English, and little is known regarding other languages. To this end, we introduce RuSentEval, an enhanced set of 14 probing tasks for Russian, including ones that have not been explored yet. We apply a combination of complementary probing methods to explore the distribution of various linguistic properties in five multilingual transformers for two typologically contrasting languages – Russian and English. Our results provide intriguing findings that contradict the common understanding of how linguistic knowledge is represented, and demonstrate that some properties are learned in a similar manner despite the language differences.

READ FULL TEXT

page 7

page 8

page 21

page 22

page 23

research
04/26/2021

Morph Call: Probing Morphosyntactic Content of Multilingual Transformers

The outstanding performance of transformer-based language models on a gr...
research
05/31/2023

FEED PETs: Further Experimentation and Expansion on the Disambiguation of Potentially Euphemistic Terms

Transformers have been shown to work well for the task of English euphem...
research
07/01/2022

Is neural language acquisition similar to natural? A chronological probing study

The probing methodology allows one to obtain a partial representation of...
research
09/13/2021

On Language Models for Creoles

Creole languages such as Nigerian Pidgin English and Haitian Creole are ...
research
12/15/2020

Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution

Hateful meme detection is a new research area recently brought out that ...
research
01/05/2023

A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies

The analysis of data in which multiple languages are represented has gai...
research
10/23/2022

RuCoLA: Russian Corpus of Linguistic Acceptability

Linguistic acceptability (LA) attracts the attention of the research com...

Please sign up or login with your details

Forgot password? Click here to reset