Multi-lingual and Multi-cultural Figurative Language Understanding

05/25/2023
by   Anubha Kabra, et al.
5

Figurative language permeates human communication, but at the same time is relatively understudied in NLP. Datasets have been created in English to accelerate progress towards measuring and improving figurative language processing in language models (LMs). However, the use of figurative language is an expression of our cultural and societal experiences, making it difficult for these phrases to be universally applicable. In this work, we create a figurative language inference dataset, , for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba. Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region. We assess multilingual LMs' abilities to interpret figurative language in zero-shot and few-shot settings. All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data, emphasizing the need for LMs to be exposed to a broader range of linguistic and cultural variation during training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2023

BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models

The rapid development of Large Language Models (LLMs) and the emergence ...
research
04/10/2020

Identifying Cultural Differences through Multi-Lingual Wikipedia

Understanding cross-cultural differences is an important application of ...
research
11/29/2022

TyDiP: A Dataset for Politeness Classification in Nine Typologically Diverse Languages

We study politeness phenomena in nine typologically diverse languages. P...
research
05/26/2021

Deception detection in text and its relation to the cultural dimension of individualism/collectivism

Deception detection is a task with many applications both in direct phys...
research
04/19/2023

A Latent Space Theory for Emergent Abilities in Large Language Models

Languages are not created randomly but rather to communicate information...
research
11/04/2022

Query Processing at Snapchat: How we Handle Query Completion, Suggestion and Localization

From the Publisher:Software is a commodity being sold across diverse lan...
research
07/20/2023

Building Socio-culturally Inclusive Stereotype Resources with Community Engagement

With rapid development and deployment of generative language models in g...

Please sign up or login with your details

Forgot password? Click here to reset