Code-switched inspired losses for generic spoken dialog representations

08/27/2021
by   Emile Chapuis, et al.
0

Spoken dialog systems need to be able to handle both multiple languages and multilinguality inside a conversation (e.g in case of code-switching). In this work, we introduce new pretraining losses tailored to learn multilingual spoken dialog representations. The goal of these losses is to expose the model to code-switched language. To scale up training, we automatically build a pretraining corpus composed of multilingual conversations in five different languages (French, Italian, English, German and Spanish) from , a huge multilingual corpus composed of 24.3G tokens. We test the generic representations on , a new benchmark composed of five dialog act corpora on the same aforementioned languages as well as on two novel multilingual downstream tasks (i.e multilingual mask utterance retrieval and multilingual inconsistency identification). Our experiments show that our new code switched-inspired losses achieve a better performance in both monolingual and multilingual settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2020

Hierarchical Pre-training for Sequence Labelling in Spoken Dialog

Sequence labelling tasks like Dialog Act and Emotion/Sentiment identific...
research
04/26/2020

GLUECoS : An Evaluation Benchmark for Code-Switched NLP

Code-switching is the use of more than one language in the same conversa...
research
05/20/2022

Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog

Research on (multi-domain) task-oriented dialog (TOD) has predominantly ...
research
08/26/2019

Multi-Granularity Representations of Dialog

Neural models of dialog rely on generalized latent representations of la...
research
10/10/2018

Structured Argument Extraction of Korean Question and Command

Intention identification and slot filling is a core issue in dialog mana...
research
01/16/2013

Conversation as Action Under Uncertainty

Conversations abound with uncetainties of various kinds. Treating conver...
research
06/15/2018

A Dataset for Building Code-Mixed Goal Oriented Conversation Systems

There is an increasing demand for goal-oriented conversation systems whi...

Please sign up or login with your details

Forgot password? Click here to reset