TyDiP: A Dataset for Politeness Classification in Nine Typologically Diverse Languages

11/29/2022
by   Anirudh Srinivasan, et al.
0

We study politeness phenomena in nine typologically diverse languages. Politeness is an important facet of communication and is sometimes argued to be cultural-specific, yet existing computational linguistic study is limited to English. We create TyDiP, a dataset containing three-way politeness annotations for 500 examples in each language, totaling 4.5K examples. We evaluate how well multilingual models can identify politeness levels – they show a fairly robust zero-shot transfer ability, yet fall short of estimated human accuracy significantly. We further study mapping the English politeness strategy lexicon into nine languages via automatic translation and lexicon induction, analyzing whether each strategy's impact stays consistent across languages. Lastly, we empirically study the complicated relationship between formality and politeness through transfer experiments. We hope our dataset will support various research questions and applications, from evaluating multilingual models to constructing polite multilingual agents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2021

Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer

Despite their success, large pre-trained multilingual models have not co...
research
05/25/2023

Revisiting non-English Text Simplification: A Unified Multilingual Benchmark

Recent advancements in high-quality, large-scale English resources have ...
research
05/25/2023

Multi-lingual and Multi-cultural Figurative Language Understanding

Figurative language permeates human communication, but at the same time ...
research
10/21/2022

On the Calibration of Massively Multilingual Language Models

Massively Multilingual Language Models (MMLMs) have recently gained popu...
research
10/31/2022

TaTa: A Multilingual Table-to-Text Dataset for African Languages

Existing data-to-text generation datasets are mostly limited to English....
research
11/11/2022

MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection

Event Detection (ED) is the task of identifying and classifying trigger ...
research
02/28/2023

Extending English IR methods to multi-lingual IR

This paper describes our participation in the 2023 WSDM CUP - MIRACL cha...

Please sign up or login with your details

Forgot password? Click here to reset