Subword Semantic Hashing for Intent Classification on Small Datasets

by   Kumar Shridhar, et al.

In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and outperform previous state-of-the-art methods on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based methods are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise by the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: AskUbuntu, Chatbot, and Web Application. Our benchmarks are available online:


Text classification with word embedding regularization and soft similarity measure

Since the seminal work of Mikolov et al., word embeddings have become th...

Hash2Vec, Feature Hashing for Word Embeddings

In this paper we propose the application of feature hashing to create wo...

Question Embeddings Based on Shannon Entropy: Solving intent classification task in goal-oriented dialogue system

Question-answering systems and voice assistants are becoming major part ...

Spoken Language Intent Detection using Confusion2Vec

Decoding speaker's intent is a crucial part of spoken language understan...

Enhancing Chinese Intent Classification by Dynamically Integrating Character Features into Word Embeddings with Ensemble Techniques

Intent classification has been widely researched on English data with de...

Text Classification for Task-based Source Code Related Questions

There is a key demand to automatically generate code for small tasks for...

MORSE: Semantic-ally Drive-n MORpheme SEgment-er

We present in this paper a novel framework for morpheme segmentation whi...

Please sign up or login with your details

Forgot password? Click here to reset