Overcoming Language Disparity in Online Content Classification with Multimodal Learning

05/19/2022
by   Gaurav Verma, et al.
11

Advances in Natural Language Processing (NLP) have revolutionized the way researchers and practitioners address crucial societal problems. Large language models are now the standard to develop state-of-the-art solutions for text detection and classification tasks. However, the development of advanced computational techniques and resources is disproportionately focused on the English language, sidelining a majority of the languages spoken globally. While existing research has developed better multilingual and monolingual language models to bridge this language disparity between English and non-English languages, we explore the promise of incorporating the information contained in images via multimodal machine learning. Our comparative analyses on three detection tasks focusing on crisis information, fake news, and emotion recognition, as well as five high-resource non-English languages, demonstrate that: (a) detection frameworks based on pre-trained large language models like BERT and multilingual-BERT systematically perform better on the English language compared against non-English languages, and (b) including images via multimodal learning bridges this performance gap. We situate our findings with respect to existing work on the pitfalls of large language models, and discuss their theoretical and practical implications. Resources for this paper are available at https://multimodality-language-disparity.github.io/.

READ FULL TEXT

page 1

page 3

page 5

page 8

research
09/13/2021

Evaluating Transferability of BERT Models on Uralic Languages

Transformer-based language models such as BERT have outperformed previou...
research
06/12/2023

Lost in Translation: Large Language Models in Non-English Content Analysis

In recent years, large language models (e.g., Open AI's GPT-4, Meta's LL...
research
02/18/2020

From English To Foreign Languages: Transferring Pre-trained Language Models

Pre-trained models have demonstrated their effectiveness in many downstr...
research
08/28/2023

Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts

Detecting online sexual predatory behaviours and abusive language on soc...
research
04/19/2022

Multimodal Hate Speech Detection from Bengali Memes and Texts

Numerous works have been proposed to employ machine learning (ML) and de...
research
04/01/2023

Large language models can rate news outlet credibility

Although large language models (LLMs) have shown exceptional performance...
research
06/02/2020

WikiBERT models: deep transfer learning for many languages

Deep neural language models such as BERT have enabled substantial recent...

Please sign up or login with your details

Forgot password? Click here to reset