DeepAI AI Chat
Log In Sign Up

The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus

by   Samia Touileb, et al.

Recent years have seen a rise in interest for cross-lingual transfer between languages with similar typology, and between languages of various scripts. However, the interplay between language similarity and difference in script on cross-lingual transfer is a less studied problem. We explore this interplay on cross-lingual transfer for two supervised tasks, namely part-of-speech tagging and sentiment analysis. We introduce a newly annotated corpus of Algerian user-generated comments comprising parallel annotations of Algerian written in Latin, Arabic, and code-switched scripts, as well as annotations for sentiment and topic categories. We perform baseline experiments by fine-tuning multi-lingual language models. We further explore the effect of script vs. language similarity in cross-lingual transfer by fine-tuning multi-lingual models on languages which are a) typologically distinct, but use the same script, b) typologically similar, but use a distinct script, or c) are typologically similar and use the same script. We find there is a delicate relationship between script and typology for part-of-speech, while sentiment analysis is less sensitive.


Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks

Large multilingual language models typically share their parameters acro...

Exploring the Relationship between Alignment and Cross-lingual Transfer in Multilingual Transformers

Without any explicit cross-lingual training data, multilingual language ...

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

In this paper, we present MasakhaPOS, the largest part-of-speech (POS) d...

Ranking Transfer Languages with Pragmatically-Motivated Features for Multilingual Sentiment Analysis

Cross-lingual transfer learning studies how datasets, annotations, and m...

On the Effect of Word Order on Cross-lingual Sentiment Analysis

Current state-of-the-art models for sentiment analysis make use of word ...

Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences

The patterns in which the syntax of different languages converges and di...

An Empirical Study on Cross-X Transfer for Legal Judgment Prediction

Cross-lingual transfer learning has proven useful in a variety of Natura...