XLM-T: A Multilingual Language Model Toolkit for Twitter

04/25/2021
by   Francesco Barbieri, et al.
0

Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, current analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, and have relied on clean pre-training and task-specific corpora as multilingual signals. In this paper, we introduce XLM-T, a framework for using and evaluating multilingual language models in Twitter. This framework features two main assets: (1) a strong multilingual baseline consisting of an XLM-R (Conneau et al. 2020) model pre-trained on millions of tweets in over thirty languages, alongside starter code to subsequently fine-tune on a target task; and (2) a set of unified sentiment analysis Twitter datasets in eight different languages. This is a modular framework that can easily be extended to additional tasks, as well as integrated with recent efforts also aimed at the homogenization of Twitter-specific datasets (Barbieri et al. 2020).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2023

A Technical Report for Polyglot-Ko: Open-Source Large-Scale Korean Language Models

Polyglot is a pioneering project aimed at enhancing the non-English lang...
research
02/08/2022

TimeLMs: Diachronic Language Models from Twitter

Despite its importance, the time variable has been largely neglected in ...
research
10/23/2020

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

The experimental landscape in natural language processing for social med...
research
04/14/2023

OPI at SemEval 2023 Task 9: A Simple But Effective Approach to Multilingual Tweet Intimacy Analysis

This paper describes our submission to the SemEval 2023 multilingual twe...
research
06/11/2019

What Kind of Language Is Hard to Language-Model?

How language-agnostic are current state-of-the-art NLP tools? Are there ...
research
05/19/2023

Evaluating task understanding through multilingual consistency: A ChatGPT case study

At the staggering pace with which the capabilities of large language mod...
research
08/16/2015

Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology

Every culture and language is unique. Our work expressly focuses on the ...

Please sign up or login with your details

Forgot password? Click here to reset