Redwood: Using Collision Detection to Grow a Large-Scale Intent Classification Dataset

04/12/2022
by   Stefan Larson, et al.
0

Dialog systems must be capable of incorporating new skills via updates over time in order to reflect new use cases or deployment scenarios. Similarly, developers of such ML-driven systems need to be able to add new training data to an already-existing dataset to support these new skills. In intent classification systems, problems can arise if training data for a new skill's intent overlaps semantically with an already-existing intent. We call such cases collisions. This paper introduces the task of intent collision detection between multiple datasets for the purposes of growing a system's skillset. We introduce several methods for detecting collisions, and evaluate our methods on real datasets that exhibit collisions. To highlight the need for intent collision detection, we show that model performance suffers if new data is added in such a way that does not arbitrate colliding intents. Finally, we use collision detection to construct and benchmark a new dataset, Redwood, which is composed of 451 ntent categories from 13 original intent classification datasets, making it the largest publicly available intent classification benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2020

Benchmarking Intent Detection for Task-Oriented Dialog Systems

Intent detection is a key component of modern goal-oriented dialog syste...
research
09/29/2020

HINT3: Raising the bar for Intent Detection in the Wild

Intent Detection systems in the real world are exposed to complexities o...
research
09/04/2019

An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction

Task-oriented dialog systems need to know when a query falls outside the...
research
05/04/2022

A Framework to Generate High-Quality Datapoints for Multiple Novel Intent Detection

Systems like Voice-command based conversational agents are characterized...
research
09/12/2020

Intent Detection with WikiHow

Modern task-oriented dialog systems need to reliably understand users' i...
research
05/24/2022

Benchmark Data and Evaluation Framework for Intent Discovery Around COVID-19 Vaccine Hesitancy

The COVID-19 pandemic has made a huge global impact and cost millions of...
research
06/19/2017

User Intent Classification using Memory Networks: A Comparative Analysis for a Limited Data Scenario

In this report, we provide a comparative analysis of different technique...

Please sign up or login with your details

Forgot password? Click here to reset