PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs

03/15/2023
by   Rahul Goel, et al.
0

Research interest in task-oriented dialogs has increased as systems such as Google Assistant, Alexa and Siri have become ubiquitous in everyday life. However, the impact of academic research in this area has been limited by the lack of datasets that realistically capture the wide array of user pain points. To enable research on some of the more challenging aspects of parsing realistic conversations, we introduce PRESTO, a public dataset of over 550K contextual multilingual conversations between humans and virtual assistants. PRESTO contains a diverse array of challenges that occur in real-world NLU tasks such as disfluencies, code-switching, and revisions. It is the only large scale human generated conversational parsing dataset that provides structured context such as a user's contacts and lists for each example. Our mT5 model based baselines demonstrate that the conversational phenomenon present in PRESTO are challenging to model, which is further pronounced in a low-resource setup.

READ FULL TEXT
research
04/17/2021

Crossing the Conversational Chasm: A Primer on Multilingual Task-Oriented Dialogue Systems

Despite the fact that natural language conversations with machines repre...
research
12/15/2022

DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue

Modern virtual assistants use internal semantic parsing engines to conve...
research
08/12/2018

Addressee and Response Selection for Multilingual Conversation

Developing conversational systems that can converse in many languages is...
research
05/18/2022

Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation

We introduce a novel setup for low-resource task-oriented semantic parsi...
research
09/28/2020

Conversational Semantic Parsing

The structured representation for semantic parsing in task-oriented assi...
research
06/15/2018

A Dataset for Building Code-Mixed Goal Oriented Conversation Systems

There is an increasing demand for goal-oriented conversation systems whi...
research
02/22/2022

A New Generation of Perspective API: Efficient Multilingual Character-level Transformers

On the world wide web, toxic content detectors are a crucial line of def...

Please sign up or login with your details

Forgot password? Click here to reset