Z-BERT-A: a zero-shot Pipeline for Unknown Intent detection

by   Daniele Comi, et al.

Intent discovery is a fundamental task in NLP, and it is increasingly relevant for a variety of industrial applications (Quarteroni 2018). The main challenge resides in the need to identify from input utterances novel unseen in-tents. Herein, we propose Z-BERT-A, a two-stage method for intent discovery relying on a Transformer architecture (Vaswani et al. 2017; Devlin et al. 2018), fine-tuned with Adapters (Pfeiffer et al. 2020), initially trained for Natural Language Inference (NLI), and later applied for unknown in-tent classification in a zero-shot setting. In our evaluation, we firstly analyze the quality of the model after adaptive fine-tuning on known classes. Secondly, we evaluate its performance casting intent classification as an NLI task. Lastly, we test the zero-shot performance of the model on unseen classes, showing how Z-BERT-A can effectively perform in-tent discovery by generating intents that are semantically similar, if not equal, to the ground truth ones. Our experiments show how Z-BERT-A is outperforming a wide variety of baselines in two zero-shot settings: known intents classification and unseen intent discovery. The proposed pipeline holds the potential to be widely applied in a variety of application for customer care. It enables automated dynamic triage using a lightweight model that, unlike large language models, can be easily deployed and scaled in a wide variety of business scenarios. Especially when considering a setting with limited hardware availability and performance whereon-premise or low resource cloud deployments are imperative. Z-BERT-A, predicting novel intents from a single utterance, represents an innovative approach for intent discovery, enabling online generation of novel intents. The pipeline is available as an installable python package at the following link: https://github.com/GT4SD/zberta.


Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot Setting

Large language models have shown that impressive zero-shot performance c...

A Single Example Can Improve Zero-Shot Data Generation

Sub-tasks of intent classification, such as robustness to distribution s...

Multitask Prompted Training Enables Zero-Shot Task Generalization

Large language models have recently been shown to attain reasonable zero...

Effectiveness of Pre-training for Few-shot Intent Classification

This paper investigates the effectiveness of pre-training for few-shot i...

Template-based Approach to Zero-shot Intent Recognition

The recent advances in transfer learning techniques and pre-training of ...

Learning Disentangled Intent Representations for Zero-shot Intent Detection

Zero-shot intent detection (ZSID) aims to deal with the continuously eme...

Data Augmentation for Intent Classification with Off-the-shelf Large Language Models

Data augmentation is a widely employed technique to alleviate the proble...


Nowadays, Language Models (LM) and more in general NLP methodologies are becoming a pivotal element in modern communication systems. In dialogue systems, understanding the actual intention behind a conversation initiated by a user is fundamental, and so it is understanding user’s intents underlying each dialogue utterance Ni et al. (2021). Recently, there have been a growing interest in such applications and, in this context, multiple datasets, as well as challenges, have been proposed amazon-research (2022); Casanueva et al. (2020); Zhang et al. (2021). Unfortunately, most of the available systems used in production for common business use-cases are based on models trained on a finite-set of possible intents, with limited or no possibility to generalize to novel unseen intents Qi et al. (2020). This is a major blocker, since most applications do not fit this finite set of intents assumption, but they instead require a constant re-evaluation based on users’ inputs and feedback. In order to build better dialogue systems, closely fitting users’ needs, it becomes imperative to dynamically expand the set of available intents. Besides better matching user expectation, this also enables the definition of new common intents that can slowly emerge from a pool of different users, a key aspect to build systems that are more resilient over time and can follow trends appropriately. Current supervised techniques unfortunately fall short in tackling the aforementioned challenge, since they usually lack the capacity to discover novel intents. These models are trained by considering a finite set of classes and can’t generalize to real world application Larson et al. (2019). Here, we propose a system which combines dependency parsing Honnibal and Johnson (2015); Nivre and Nilsson (2005); ExplosionAI (2015) to extract potential intents from a single utterance and a zero-shot classification approach based on a Transformer Vaswani et al. (2017); Devlin et al. (2018) fine-tuned for NLI (Natural Language Inference) Xia et al. (2018); Yin et al. (2019) to select the intent that is best fitting the utterance in a zero-shot setting. In the NLI fine-tuning we leverage Adapters Houlsby et al. (2019); Pfeiffer et al. (2020) to significantly reduce memory and time requirements keeping base model parameters frozen. Our approach is designed keeping in mind a production setting where automatic intent discovery is fundamental, e.g., customer care chatbots, to ensure a smooth interaction with the users. In Figure 1 we show how Z-BERT-A pipeline fits in an intent detection system where classification models being used for intent detection are not able to correctly or confidently enough detect an intent for the input utterance from the user currently querying the system.

Figure 1: Deployment scenario, where the current intent detection (classification) model is initially used to examine user input and based on the classification outcome problematic cases are forwarded to Z-BERT-A for intent discovery.

Related Literature

There have been various efforts aiming at finding novel and unseen intents. A popular approach to address the problem of determining whether an utterance is fitting the existing set of intents consists by casting the task as a binary classification problem, and then applying a zero-shot techniques in order to actually determine the new intent Xia et al. (2018); Siddique et al. (2021); Yan et al. (2020). Liu et al. (2022) proposed an approach leveraging Transformers-based architectures, while the majority relied on RNN architectures like LSTMs Xia et al. (2018). For the problem we are focusing here, i.e., finding novel intents, there are have been two interesting attempts to propose a pipeline for generation and extraction Vedula et al. (2019); Liu et al. (2021). Liu et al. (2021)

addressed the new intent discovery as a clustering problem, proposing an adaptation of K-means clustering. Here, dependency parsing is used to extract from each cluster a mean ACTION-OBJECT pair representing the common emerging intent for that particular cluster. Recently, the increasing attention on Large Language Models (LLMs) that exhibit zero-shot generalization 

Chowdhery et al. (2022); Sanh et al. (2021); Wang and Komatsuzaki (2021), makes them interesting candidates for unknown intent detection. Indeed, a GPT-J Wang and Komatsuzaki (2021)-based approach has been proposed. Leveraging a fine-tuned version of GPT-3 Brown et al. (2020), they have been able to generate the intents directly from the input utterance. While the solution is extremely elegant, it is not an ideal fit for many real-world use-cases where a model of this size (starting from 6B billion parameters and up) is not always deployable in practice, e.g., in an on-premise settings with various hardware constraints. Moreover, the results presented are looking at a few-shot setting Brown et al. (2020) without explicitly investigating the much more challenging zero-shot setting.


In intent discovery, we aim at extracting from a single utterance a set of potentially novel intents and automatically determine the best fitting one for the considered utterance. Intent discovery can be tackled as a Natural Language Inference (NLI) problem where we can rely on a language model to predict the entailment between the utterance () and a set of hypotheses based on the candidate intents (). Where is a function used to extract hypotheses from the set of potential intents . As previously shown by  Xian et al. (2018) using NLI models for zero-shot classification represents an effective approach in problems where the set of candidate intents is known. In practice, the classification problem is cast as an inference task where a combination of a premise and a hypothesis are associated to a selection of three possible classes: entailment, neutral and contradiction. Yin et al. (2019)

shown how this approach allows considering an input hypothesis based on an unseen class and generate from the input premise-hypothesis pair a probability distribution describing the entailment, hence a score that can be linked to the input text to be assigned to the novel class. While this technique is extremely flexible, and in principle can handle any association between utterance and candidate intent, determining good candidates based on the analyzed utterance remains a major challenge.


Figure 2: Z-BERT-A architecture.

Herein, we focus on building a pipeline able to handle unseen classes at inference time. In this context, we have the need to both generating a set of candidate intents from the considered utterance and classify the provided input against the new set of candidate intents. We chose to tackle the problem by implementing a pipeline in two stages. In the first stage, we leverage a dependency parser

Honnibal and Johnson (2015); Nivre and Nilsson (2005) to extract a set of potential intents by exploiting specific arc dependencies between the words in the utterance. In the second stage, we leverage the set of potential intents as candidate classes for the utterance intent classification problem using a zero-shot approach Xian et al. (2018) based on NLI relying on a BERT-based model Devlin et al. (2018). The model is tuned with Adapters Houlsby et al. (2019) for the NLI task (BERT-A), and is then prompted with premise-hypothesis pairs for zero-shot classification on the candidate intents completing the Z-BERT-A pipeline. The full pipeline for zero-shot is implemented using the Hugging Face pipeline API from the transformers library Wolf et al. (2019).

Intent generation

Being able to define a set of potential intents from the input utterance is key in intent discovery. In order to provide this set of potentially unseen candidates, we chose to exploit the dependency parser from spaCy Honnibal and Johnson (2015); Honnibal et al. (2020), relying on model en_core_web_trf from spaCy-transformers ExplosionAI (2019). To generate the set of potential novel intents, we extract from an input sentence through the dependency parser a pair of words obtained by searching for specific Arc-Relations (AR) in the dependency tree of the parsed sentence.

AR/POS tag Description
verb, covering all verbs
except for auxiliary verb
noun, corresponds to all
cases of singular or plural nouns
adjective, covering also
relative and superlative adjectives
PRONM pronoun, all kind of pronouns
direct object, noun phrase which is
the (accusative) object of the verb
adjectival modifier, any adjectival phrase
that serves to modify the meaning
of the noun phrase
compound noun compounds
Table 1: Arc-Relations (AR) and Part-of-Speech (POS) tags used in the intent generation phase.

Since an intent is usually composed by an action-object pair Vedula et al. (2019); Liu et al. (2021), we exploit this pattern in order to search for DOBJ, compound and AMOD arc relations. We perform a four level detection, which means finding the four main relations which can generate a base-intent. Once these relations, or a subset of them, are found, we add the pairs composed by (VERB, NOUN) and (ADJ, PRONM) with the most out/in going arcs. We refer to Table 1 for a complete definition of the AR and Part-of-Speech (POS) tags considered. The extracted potential intents are then lemmatized using NLTK Loper and Bird (2002). Lemmatization is applied to verb and noun independently. The lemmatized intents are then used as classes for the zero-shot classifier based on our model.

Input: utterance
Output: set of potential intents

1:  Let deps = depedency_parser()
2:  for arc in deps do
3:     if arc[’label’] in {’DOBJ’, ’AMOD’, ’compound’} then
4:        Let start = beginning word of the arc relationship
5:        Let end = ending word of the arc relationship
6:        .append(start[’word’] + end[’word’])
7:     end if
8:  end for
9:  Let best_w = set of words with most in/out going arcs
10:  for (word_1, word_2) in best_w as (VERB, NOUN) or (ADJ, PRONM) do
11:     .append(word_1 + word_2)
12:  end for
13:   = {lemmatize() for in }
14:  return
Algorithm 1 Intent generation algorithm

Algorithm 1 details in pseudocode the pipeline for intent generation.

Zero-shot classification

The generated potential intents are fed to the zero-shot BERT-based classifier implemented using NLI that scores the entailment between the utterance used as a premise and the hypothesis based on the intent. The intent related to the pair with the highest score is selected as best fitting for the input utterance. The scores are computed using sentence embedding vectors 

Reimers and Gurevych (2019).

Utterance Extracted key-phrase Wordnet Synset definition (SD) Generated hypothesis
where do you support? support
the activity of providing for or maintaining
by supplying with necessities
this text is about SD
card delivery? delivery
the act of delivering
or distributing something
this text is about SD
last payment? payment
a sum of money paid
or a claim discharged
this text is about SD
Table 2: Transformation of Banking77 dataset for intent-classification into a NLI dataset.


We consider two datasets in our analysis: SNLI 

Bowman et al. (2015) and Banking77-OOSCasanueva et al. (2020); Zhang et al. (2021):

  • The SNLI corpus Bowman et al. (2015)

    is a collection of 570k human-written English sentence pairs manually labeled as entailment, contradiction, and neutral. It is used for natural language inference (NLI), also known as recognizing textual entailment (RTE). The dataset comes with a split: 550152 samples for training, 10000 samples for validation and 10000 samples for testing. Each sample is composed of a premise, a hypothesis and a corresponding label indicating whether the premise and the hypothesis represent an entailment. The label can be set to one of the followings: entailment, contradiction, or neutral.

  • Banking77-OOS Casanueva et al. (2020); Zhang et al. (2021) is an intent classification dataset composed of online banking queries annotated with their corresponding intents. It provides a very fine-grained set of intents in the banking domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection. Of these 77 intents, Banking77-OOS includes 50 in-scope intents, and the ID-OOS queries are built up based on 27 held-out in-scope intents.

We also explore the effects of pretraining of an NLI adaptation of Banking77 Yin et al. (2019). To investigate the impact of pretraining on similar data we have extended the Banking77 dataset casting the intent classification task as NLI. To achieve this, we consider the input utterance as premise and we extract using KeyBERT Sharma and Li (2019) via self-attention the most relevant word associated to it. The word is then used to generate an entailed hypothesis based on the corresponding synset definition from WordNet via NLTK Loper and Bird (2002); Miller (1995). Exemplar samples are reported in Table 2. For the hypotheses that are not considered entailed we simply repeat the procedure for randomly sampled unrelated words. This process enabled us to consider the training split of Banking77-OOS for adaptive fine-tuning of the NLI model component. We call this generated dataset Banking77-OOS-NLI.


We fine-tuned two versions of BERT-A (BERT-based transformer with Adapters). The first version is trained for NLI on the SNLI dataset. The second version also considers the previously introduced Banking77-OOS-NLI. The training procedures are only optimizing the parameters for the added Adapter layers to minimize training time and memory footprint. By freezing all the original layers and letting the model be trained only on the adaptive layers, we end up with 896066 trainable parameters. All training runs relied on the AdamW Loshchilov and Hutter (2017) optimizer with a learning rate set to

and a warm-up scheduler. The models have been fine-tuned for a total of 6 epochs using early stopping.

Model Dataset Accuracy Precision Recall F1-score
BERT-A SNLI 0.866 0.871 0.869 0.870
BERT-A Banking77-OOS-NLI 0.882 0.894 0.894 0.890
Table 3: Accuracy of the BERT-A module for the NLI task when using SNLI and Banking77 datasets
Model Accuracy
BART0 0.147
ZS-DNN-USE 0.156
BERT-A (SNLI) 0.204
BERT-A (Banking77-OOS-NLI) 0.407
Table 4: Accuracy of the BERT-A module for the zero-shot intent classification task on Banking77 compared with the considered baselines.
Model Cosine similarity Prompt
GPT-J 0.04 0.5 prompt 1
GPT-J 0.04 0.5 prompt 2
GPT-J 0.117 0.5 prompt 3
GPT-J 0.098 0.5 prompt 4
T0 0.148 0.5 prompt 1
T0 0.189 0.5 prompt 2
T0 0.465 0.5 prompt 3
T0 0.446 0.5 prompt 4
bart-large-mnli 0.436 0.5 -
Z-BERT-A (SNLI) 0.478 0.003 0.5 -
Z-BERT-A (Banking77-OOS-NLI) 0.492 0.004 0.546 0.011 -
Table 5: Cosine similarity between ground-truth and generated intents for the full Z-BERT-A pipeline and the other baseline models on the ID-OOS set of Banking77-OOS.
Prompt name Prompt text
prompt 1 Considering this utterance: [utterance]. What is the intent that best describes it?
prompt 2
Considering this utterance: [utterance]. What is the intent that best describes it
expressed as a phrase of one or two words?
prompt 3
Given the utterance: [utterance]. What is the best fitting intent, if any,
among the following: [potential intents]?
prompt 4 [utterance] Choose the most suitable intent based on the above utterance. Options: [potential intents]
Table 6: Prompts used for evaluating GPT-J and T0.


In order to evaluate the performance and the results of the Z-BERT-A pipeline, we first analyze the accuracy of the BERT-A component and evaluate its results on an NLI task using accuracy, precision and recall. Afterwards, we compare its result on the zero-shot classification task with other available models on the same Banking77 split using accuracy. In this initial evaluation, the intents are known. The baselines considered in this setting are: BART0 

Lin et al. (2022), a multitask model with 406 million parameters based on Bart-large Lewis et al. (2019) based on prompt training; and two flavours of Zero-Shot DDN (ZS-DNN) Kumar et al. (2017) with both encoders Universal Sentence Encoder (USE) Cer et al. (2018) and SBERT Reimers and Gurevych (2019).

In the unknown intent case, we compare the Z-BERT-A pipeline against a set of zero-shot baselines based on various pretrained transformers. As baselines we include bart-large-mnli Yin et al. (2019) as it has shown interesting performance in zero-shot sequence classification. We used this model as an alternative to our classification method, but in this case we maintain our dependency parsing strategy for the intent generation. As hypothesis for the NLI based classification, we used the phrase: “This example is ”. Furthermore, given that very large LMs have demonstrated remarkable zero-shot capabilities in a plethora of tasks. We added two of the most recent ones, namely T0 Sanh et al. (2021) and GPT-J Wang and Komatsuzaki (2021), to the baseline list. In such models, the utilized template prompt defines the task of interest. We examined whether they can serve the end-to-end intent extraction (intent generation and classification) in a completely unsupervised and zero-shot setting or given already generated intents using our dependency parsing method to classify them. In the former case, the given input is just the utterance of interest, while in the latter case the provided input includes the utterance and the possible intents. In both cases, the generated output is considered the extracted intent.

Since in this setting the intents generated can’t be matched perfectly with the held-out ones, we chose to measure the performance using a semantic similarity metric based on the cosine-similarity between the sentence embeddings of the ground-truth intents and the respective generated ones Vedula et al. (2019). To set a decision boundary, we rely on a threshold based on distributional properties of the computed similarities. The threshold is defined in Equation 3.



is an arbitrary parameter to control the variance impact which we set to 0.5 in our study.

This full pipeline evaluation has been repeated five times to evaluate the stability of the results.


Firstly, we evaluate the performance of BERT-A on the NLI task, see Table 3. The accuracy precision and recall achieved confirm the quality of the pretrained model highlighting the impact of fine-tunining on Banking77-OOS-NLI.

Table 4 shows how the BERT-A component improves results over the majority of the baselines considered in terms of accuracy in the known intent scenario. Remarkably, the BERT-A version fine-tuned on Banking77-OOS-NLI outperforms all the considered baselines.

Finally, we evaluate Z-BERT-A on the unknown intent discovery task. Table 5 reports the performance of BERT-A fine-tuned on SNLI and Banking77-OOS-NLI in comparison with a selection of zero-shot baselines for intent discovery. Both flavours of Z-BERT-A outperforms the considered baselines by a consistent margin.

Table 6) reports the prompts used for GTP-J and T0 inference. For prompts 1 and 2 we let the models generate the intent without providing a set of possible options. Prompts 3 and 4 instead contain the candidate intents extracted using the first stage of the Z-BERT-A pipeline. Figure 3 reports the average cosine similarity between the generated intents for each of the ground-truth intents.

Figure 3: Bar plot showing the average cosine-similarity for each unseen-intent class.

It is interesting to observe how the generated intents are similar to their semantic ground-truth counterparts.

Ground-truth intent Intent from Z-BERT-A
exchange-rate exchange-rate


lost-or-stolen-card lost-card
getting-virtual-card virtual-card
pin-blocked pin-block
Table 7: Five samples of new intent discovery through Z-BERT-A pipeline, full list available at https://github.com/GT4SD/zberta/blob/main/results/preds.csv

To appreaciate the quality of the generated intents, in Table 7 we report some examples of unseen ground-truth intents and the corresponding Z-BERT-A predictions.

Conclusions and Future Work

We proposed Z-BERT-A, a pipeline for zero-shot prediction of unseen intents from utterances. We performed a two-fold evaluation. First, we showed how our BERT-based model fine-tuned with Adapters on NLI is able to outperform a selection of baselines on the prediction of known intents in a zero-shot setting. Secondly, we evaluated the full pipeline capabilities comparing its performance with the results obtained by prompting large language models in an unknown intent setting. Our results prove that Z-BERT-A represent an effective option to extend intent classification systems to handle unseen intents, a key aspect for modern dialogue systems for triage. Moreover, using a relatively lightweight base model and relying on adaptive fine-tuning, the proposed solution can be deployed in context of limited resources scenario, e.g. on-premise solutions or small cloud instances. The main limitation of Z-BERT-A currently lies in the new intent generation stage that is relying extensively on the quality of the dependency parsing. An interesting avenue to explore in the future consists in relying on zero-shot learning approaches even in the intent generation phase Liu et al. (2021) without compromising in terms of model size and inference requirements. Z-BERT-A is available at the following link: https://github.com/GT4SD/zberta.


  • amazon-research (2022) Dstc11-track2-intent-induction. GitHub. Note: https://github.com/amazon-research/dstc11-track2-intent-induction Cited by: Introduction.
  • Bowman, S. R., Angeli, Gabor, Potts, Christopher, Manning, and C. D. (2015) A large annotated corpus for learning natural language inference. In

    Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    Cited by: 1st item, Datasets.
  • T. B. Brown, B. Mann, N. Ryder, M. Subbiah, et al. (2020) Language models are few-shot learners. CoRR abs/2005.14165. External Links: Link, 2005.14165 Cited by: Related Literature.
  • I. Casanueva, T. Temcinas, D. Gerz, M. Henderson, and I. Vulic (2020) Efficient intent detection with dual sentence encoders. CoRR abs/2003.04807. External Links: Link, 2003.04807 Cited by: Introduction, 2nd item, Datasets.
  • D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco, R. St. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, Y. Sung, B. Strope, and R. Kurzweil (2018) Universal sentence encoder. arXiv. External Links: Document, Link Cited by: Evaluation.
  • A. Chowdhery, S. Narang, J. Devlin, et al. (2022) PaLM: scaling language modeling with pathways. arXiv. External Links: Document, Link Cited by: Related Literature.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: Z-BERT-A: a zero-shot pipeline for unknown intent detection, Introduction, Approach.
  • ExplosionAI (2015) SpaCy. GitHub. Note: https://github.com/explosion/spaCy Cited by: Introduction.
  • ExplosionAI (2019) SpaCy-transformers. GitHub. Note: https://github.com/explosion/spacy-transformers Cited by: Intent generation.
  • M. Honnibal and M. Johnson (2015) An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1373–1378. External Links: Link, Document Cited by: Introduction, Intent generation, Approach.
  • M. Honnibal, I. Montani, S. Van Landeghem, and A. Boyd (2020) spaCy: Industrial-strength Natural Language Processing in Python. External Links: Document Cited by: Intent generation.
  • N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly (2019)

    Parameter-efficient transfer learning for nlp


    International Conference on Machine Learning

    pp. 2790–2799. Cited by: Introduction, Approach.
  • A. Kumar, P. R. Muddireddy, M. Dreyer, and B. Hoffmeister (2017) Zero-shot learning across heterogeneous overlapping domains.. In INTERSPEECH, pp. 2914–2918. Cited by: Evaluation.
  • S. Larson, A. Mahendran, J. J. Peper, C. Clarke, A. Lee, P. Hill, J. K. Kummerfeld, K. Leach, M. A. Laurenzano, L. Tang, and J. Mars (2019) An evaluation dataset for intent classification and out-of-scope prediction. arXiv. External Links: Document, Link Cited by: Introduction.
  • M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer (2019) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv. External Links: Document, Link Cited by: Evaluation.
  • B. Y. Lin, K. Tan, C. Miller, B. Tian, and X. Ren (2022) Unsupervised cross-task generalization via retrieval augmentation. arXiv. External Links: Document, Link Cited by: Evaluation.
  • H. Liu, S. Zhao, X. Zhang, F. Zhang, J. Sun, H. Yu, and X. Zhang (2022) A simple meta-learning paradigm for zero-shot intent classification with mixture attention mechanism. arXiv preprint arXiv:2206.02179. Cited by: Related Literature.
  • P. Liu, Y. Ning, K. K. Wu, K. Li, and H. Meng (2021) Open intent discovery through unsupervised semantic clustering and dependency parsing. arXiv preprint arXiv:2104.12114. Cited by: Related Literature, Intent generation, Conclusions and Future Work.
  • E. Loper and S. Bird (2002) NLTK: the natural language toolkit. arXiv. External Links: Document, Link Cited by: Intent generation, Datasets.
  • I. Loshchilov and F. Hutter (2017) Decoupled weight decay regularization. arXiv. External Links: Document, Link Cited by: Training.
  • G. A. Miller (1995) WordNet: a lexical database for english. Communications of the ACM 38 (11), pp. 39–41. Cited by: Datasets.
  • J. Ni, T. Young, V. Pandelea, F. Xue, V. Adiga, and E. Cambria (2021)

    Recent advances in deep learning based dialogue systems: a systematic survey

    arXiv preprint arXiv:2105.04387. Cited by: Introduction.
  • J. Nivre and J. Nilsson (2005) Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, Michigan, pp. 99–106. External Links: Link, Document Cited by: Introduction, Approach.
  • J. Pfeiffer, A. Rücklé, C. Poth, A. Kamath, I. Vulić, S. Ruder, K. Cho, and I. Gurevych (2020) AdapterHub: a framework for adapting transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 46–54. Cited by: Z-BERT-A: a zero-shot pipeline for unknown intent detection, Introduction.
  • H. Qi, L. Pan, A. Sood, A. Shah, L. Kunc, M. Yu, and S. Potdar (2020) Benchmarking commercial intent detection services with practice-driven evaluations. arXiv. External Links: Document, Link Cited by: Introduction.
  • S. Quarteroni (2018) Natural language processing for industry. Informatik-Spektrum 41 (2), pp. 105–112. Cited by: Z-BERT-A: a zero-shot pipeline for unknown intent detection.
  • N. Reimers and I. Gurevych (2019) Sentence-bert: sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084. Cited by: Zero-shot classification, Evaluation.
  • V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. L. Scao, A. Raja, et al. (2021) Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207. Cited by: Related Literature, Evaluation.
  • P. Sharma and Y. Li (2019) Self-supervised contextual keyword and keyphrase retrieval with self-labelling. Preprints.org. External Links: Document, Link Cited by: Datasets.
  • A. Siddique, F. Jamour, L. Xu, and V. Hristidis (2021) Generalized zero-shot intent detection via commonsense knowledge. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1925–1929. Cited by: Related Literature.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. Advances in neural information processing systems 30. Cited by: Z-BERT-A: a zero-shot pipeline for unknown intent detection, Introduction.
  • N. Vedula, N. Lipka, P. Maneriker, and S. Parthasarathy (2019) Towards open intent discovery for conversational text. arXiv. External Links: Document, Link Cited by: Related Literature, Intent generation, Evaluation.
  • B. Wang and A. Komatsuzaki (2021) GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. Note: https://github.com/kingoflolz/mesh-transformer-jax Cited by: Related Literature, Evaluation.
  • T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al. (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771. Cited by: Approach.
  • C. Xia, C. Zhang, X. Yan, Y. Chang, and P. S. Yu (2018)

    Zero-shot user intent detection via capsule neural networks

    arXiv preprint arXiv:1809.00385. Cited by: Introduction, Related Literature.
  • Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata (2018) Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence 41 (9), pp. 2251–2265. Cited by: Background, Approach.
  • G. Yan, L. Fan, Q. Li, H. Liu, X. Zhang, X. Wu, and A. Y. Lam (2020)

    Unknown intent detection using gaussian mixture model with an application to zero-shot intent classification

    In Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 1050–1060. Cited by: Related Literature.
  • W. Yin, J. Hay, and D. Roth (2019) Benchmarking zero-shot text classification: datasets, evaluation and entailment approach. arXiv preprint arXiv:1909.00161. Cited by: Introduction, Background, Datasets, Evaluation.
  • J. Zhang, K. Hashimoto, Y. Wan, Y. Liu, C. Xiong, and P. S. Yu (2021) Are pretrained transformers robust in intent classification? A missing ingredient in evaluation of out-of-scope intent detection. CoRR abs/2106.04564. External Links: Link, 2106.04564 Cited by: Introduction, 2nd item, Datasets.