Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages

01/27/2022
by   Jivnesh Sandhan, et al.
0

Nowadays, code-mixing has become ubiquitous in Natural Language Processing (NLP); however, no efforts have been made to address this phenomenon for Speech Translation (ST) task. This can be solely attributed to the lack of code-mixed ST task labelled data. Thus, we introduce Prabhupadavani, a multilingual code-mixed ST dataset for 25 languages, covering ten language families, containing 94 hours of speech by 130+ speakers, manually aligned with corresponding text in the target language. Prabhupadvani is the first code-mixed ST dataset available in the ST literature to the best of our knowledge. This data also can be used for a code-mixed machine translation task. All the dataset and code can be accessed at: <https://github.com/frozentoad9/CMST>

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2023

Speech Wikimedia: A 77 Language Multilingual Speech Dataset

The Speech Wikimedia Dataset is a publicly available compilation of audi...
research
01/21/2023

Exploring Methods for Building Dialects-Mandarin Code-Mixing Corpora: A Case Study in Taiwanese Hokkien

In natural language processing (NLP), code-mixing (CM) is a challenging ...
research
04/20/2020

PHINC: A Parallel Hinglish Social Media Code-Mixed Corpus for Machine Translation

Code-mixing is the phenomenon of using more than one language in a sente...
research
06/16/2022

PreCogIIITH at HinglishEval : Leveraging Code-Mixing Metrics Language Model Embeddings To Estimate Code-Mix Quality

Code-Mixing is a phenomenon of mixing two or more languages in a speech ...
research
05/26/2023

BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

We present BIG-C (Bemba Image Grounded Conversations), a large multimoda...
research
01/08/2018

Analyzing Roles of Classifiers and Code-Mixed factors for Sentiment Identification

Multilingual speakers often switch between languages to express themselv...
research
06/15/2021

Challenges and Considerations with Code-Mixed NLP for Multilingual Societies

Multilingualism refers to the high degree of proficiency in two or more ...

Please sign up or login with your details

Forgot password? Click here to reset