Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus

09/24/2021
by   Daniela Trotta, et al.
0

The development of automated approaches to linguistic acceptability has been greatly fostered by the availability of the English CoLA corpus, which has also been included in the widely used GLUE benchmark. However, this kind of research for languages other than English, as well as the analysis of cross-lingual approaches, has been hindered by the lack of resources with a comparable size in other languages. We have therefore developed the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments, which has been created following the same approach and the same steps as the English one. In this paper we describe the corpus creation, we detail its content, and we present the first experiments on this new resource. We compare in-domain and out-of-domain classification, and perform a specific evaluation of nine linguistic phenomena. We also present the first cross-lingual experiments, aimed at assessing whether multilingual transformerbased approaches can benefit from using sentences in two languages during fine-tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2022

Czech Dataset for Cross-lingual Subjectivity Classification

In this paper, we introduce a new Czech subjectivity dataset of 10k manu...
research
10/23/2022

RuCoLA: Russian Corpus of Linguistic Acceptability

Linguistic acceptability (LA) attracts the attention of the research com...
research
10/09/2020

Investigating Cross-Linguistic Adjective Ordering Tendencies with a Latent-Variable Model

Across languages, multiple consecutive adjectives modifying a noun (e.g....
research
04/18/2018

Experiments with Universal CEFR Classification

The Common European Framework of Reference (CEFR) guidelines describe la...
research
05/24/2018

A Corpus for Multilingual Document Classification in Eight Languages

Cross-lingual document classification aims at training a document classi...
research
07/16/2019

Language comparison via network topology

Modeling relations between languages can offer understanding of language...
research
03/31/2020

Multilingual Stance Detection: The Catalonia Independence Corpus

Stance detection aims to determine the attitude of a given text with res...

Please sign up or login with your details

Forgot password? Click here to reset