Going beyond research datasets: Novel intent discovery in the industry setting

05/09/2023
by   Aleksandra Chrabrowa, et al.
0

Novel intent discovery automates the process of grouping similar messages (questions) to identify previously unknown intents. However, current research focuses on publicly available datasets which have only the question field and significantly differ from real-life datasets. This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform. We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision. We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv. All our methods combined to fully utilize real-life datasets give up to 33pp performance boost over state-of-the-art Constrained Deep Adaptive Clustering (CDAC) model for question only. By comparison CDAC model for the question data only gives only up to 13pp performance boost over the naive baseline.

READ FULL TEXT
research
06/08/2023

Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training

We consider the task of few-shot intent detection, which involves traini...
research
03/02/2023

QAID: Question Answering Inspired Few-shot Intent Detection

Intent detection with semantically similar fine-grained intents is a cha...
research
04/17/2019

Towards Open Intent Discovery for Conversational Text

Detecting and identifying user intent from text, both written and spoken...
research
01/18/2022

Dialog Intent Induction via Density-based Deep Clustering Ensemble

Existing task-oriented chatbots heavily rely on spoken language understa...
research
10/25/2022

Learning Better Intent Representations for Financial Open Intent Classification

With the recent surge of NLP technologies in the financial domain, banks...
research
02/01/2022

A Flexible Clustering Pipeline for Mining Text Intentions

Mining the latent intentions from large volumes of natural language inpu...
research
01/11/2021

Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

Real-life applications, heavily relying on machine learning, such as dia...

Please sign up or login with your details

Forgot password? Click here to reset