DialogVCS: Robust Natural Language Understanding in Dialogue System Upgrade

05/24/2023
by   Zefan Cai, et al.
0

In the constant updates of the product dialogue systems, we need to retrain the natural language understanding (NLU) model as new data from the real users would be merged into the existent data accumulated in the last updates. Within the newly added data, new intents would emerge and might have semantic entanglement with the existing intents, e.g. new intents that are semantically too specific or generic are actually subset or superset of some existing intents in the semantic space, thus impairing the robustness of the NLU model. As the first attempt to solve this problem, we setup a new benchmark consisting of 4 Dialogue Version Control dataSets (DialogVCS). We formulate the intent detection with imperfect data in the system update as a multi-label classification task with positive but unlabeled intents, which asks the models to recognize all the proper intents, including the ones with semantic entanglement, in the inference. We also propose comprehensive baseline models and conduct in-depth analyses for the benchmark, showing that the semantically entangled intents can be effectively recognized with an automatic workflow.

READ FULL TEXT

page 1

page 9

page 16

research
07/11/2019

Incrementalizing RASA's Open-Source Natural Language Understanding Pipeline

As spoken dialogue systems and chatbots are gaining more widespread adop...
research
10/17/2020

RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling

In order to alleviate the shortage of multi-domain data and to capture d...
research
04/27/2022

NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue

We present NLU++, a novel dataset for natural language understanding (NL...
research
07/03/2018

Intent Generation for Goal-Oriented Dialogue Systems based on Schema.org Annotations

Goal-oriented dialogue systems typically communicate with a backend (e.g...
research
06/20/2019

One-vs-All Models for Asynchronous Training: An Empirical Analysis

Any given classification problem can be modeled using multi-class or One...
research
09/09/2019

Out-of-domain Detection for Natural Language Understanding in Dialog Systems

In natural language understanding components, detecting out-of-domain (O...
research
05/24/2022

When More Data Hurts: A Troubling Quirk in Developing Broad-Coverage Natural Language Understanding Systems

In natural language understanding (NLU) production systems, users' evolv...

Please sign up or login with your details

Forgot password? Click here to reset