Beyond the English Web: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers

02/15/2021
by   Liina Repo, et al.
0

We explore cross-lingual transfer of register classification for web documents. Registers, that is, text varieties such as blogs or news are one of the primary predictors of linguistic variation and thus affect the automatic processing of language. We introduce two new register annotated corpora, FreCORE and SweCORE, for French and Swedish. We demonstrate that deep pre-trained language models perform strongly in these languages and outperform previous state-of-the-art in English and Finnish. Specifically, we show 1) that zero-shot cross-lingual transfer from the large English CORE corpus can match or surpass previously published monolingual models, and 2) that lightweight monolingual classification requiring very little training data can reach or surpass our zero-shot performance. We further analyse classification results finding that certain registers continue to pose challenges in particular for cross-lingual transfer.

READ FULL TEXT
research
09/15/2021

Cross-lingual Transfer of Monolingual Models

Recent studies in zero-shot cross-lingual learning using multilingual mo...
research
04/29/2022

Czech Dataset for Cross-lingual Subjectivity Classification

In this paper, we introduce a new Czech subjectivity dataset of 10k manu...
research
06/07/2021

Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference

Multilingual transformers (XLM, mT5) have been shown to have remarkable ...
research
03/14/2022

A Neural Pairwise Ranking Model for Readability Assessment

Automatic Readability Assessment (ARA), the task of assigning a reading ...
research
10/05/2021

Analyzing the Effects of Reasoning Types on Cross-Lingual Transfer Performance

Multilingual language models achieve impressive zero-shot accuracies in ...
research
04/27/2022

LyS_ACoruña at SemEval-2022 Task 10: Repurposing Off-the-Shelf Tools for Sentiment Analysis as Semantic Dependency Parsing

This paper addressed the problem of structured sentiment analysis using ...
research
06/14/2021

Modeling Profanity and Hate Speech in Social Media with Semantic Subspaces

Hate speech and profanity detection suffer from data sparsity, especiall...

Please sign up or login with your details

Forgot password? Click here to reset