Evaluating Transformer-Based Multilingual Text Classification

by   Sophie Groenwold, et al.

As NLP tools become ubiquitous in today's technological landscape, they are increasingly applied to languages with a variety of typological structures. However, NLP research does not focus primarily on typological differences in its analysis of state-of-the-art language models. As a result, NLP tools perform unequally across languages with different syntactic and morphological structures. Through a detailed discussion of word order typology, morphological typology, and comparative linguistics, we identify which variables most affect language modeling efficacy; in addition, we calculate word order and morphological similarity indices to aid our empirical study. We then use this background to support our analysis of an experiment we conduct using multi-class text classification on eight languages and eight models.



page 1

page 2

page 3

page 4


Evaluating the Role of Language Typology in Transformer-Based Multilingual Text Classification

As NLP tools become ubiquitous in today's technological landscape, they ...

On the Transferability of Neural Models of Morphological Analogies

Analogical proportions are statements expressed in the form "A is to B a...

Comparison of Turkish Word Representations Trained on Different Morphological Forms

Increased popularity of different text representations has also brought ...

Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages

Language models based on the Transformer architecture have achieved stat...

Multi-Task Text Classification using Graph Convolutional Networks for Large-Scale Low Resource Language

Graph Convolutional Networks (GCN) have achieved state-of-art results on...

A Cognitive Regularizer for Language Modeling

The uniform information density (UID) hypothesis, which posits that spea...

Formal Language Theory Meets Modern NLP

NLP is deeply intertwined with the formal study of language, both concep...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.