A Dataset for Building Code-Mixed Goal Oriented Conversation Systems

06/15/2018
by   Suman Banerjee, et al.
0

There is an increasing demand for goal-oriented conversation systems which can assist users in various day-to-day activities such as booking tickets, restaurant reservations, shopping, etc. Most of the existing datasets for building such conversation systems focus on monolingual conversations and there is hardly any work on multilingual and/or code-mixed conversations. Such datasets and systems thus do not cater to the multilingual regions of the world, such as India, where it is very common for people to speak more than one language and seamlessly switch between them resulting in code-mixed conversations. For example, a Hindi speaking user looking to book a restaurant would typically ask, "Kya tum is restaurant mein ek table book karne mein meri help karoge?" ("Can you help me in booking a table at this restaurant?"). To facilitate the development of such code-mixed conversation models, we build a goal-oriented dialog dataset containing code-mixed conversations. Specifically, we take the text from the DSTC2 restaurant reservation dataset and create code-mixed versions of it in Hindi-English, Bengali-English, Gujarati-English and Tamil-English. We also establish initial baselines on this dataset using existing state of the art models. This dataset along with our baseline implementations is made publicly available for research purposes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2020

A New Dataset for Natural Language Inference from Code-mixed Conversations

Natural Language Inference (NLI) is the task of inferring the logical re...
research
02/23/2023

MUTANT: A Multi-sentential Code-mixed Hinglish Dataset

The multi-sentential long sequence textual data unfolds several interest...
research
04/17/2021

GupShup: An Annotated Corpus for Abstractive Summarization of Open-Domain Code-Switched Conversations

Code-switching is the communication phenomenon where speakers switch bet...
research
05/17/2023

Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations

Unlike empathetic dialogues, the system in emotional support conversatio...
research
05/29/2022

Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition

We propose novel AI-empowered chat bots for learning as conversation whe...
research
08/27/2021

Code-switched inspired losses for generic spoken dialog representations

Spoken dialog systems need to be able to handle both multiple languages ...
research
03/15/2023

PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs

Research interest in task-oriented dialogs has increased as systems such...

Please sign up or login with your details

Forgot password? Click here to reset