Embedding Convolutions for Short Text Extreme Classification with Millions of Labels

09/13/2021
by   Siddhant Kharbanda, et al.
10

Automatic annotation of short-text data to a large number of target labels, referred to as Short Text Extreme Classification, has recently found numerous applications in prediction of related searches and product recommendation tasks. The conventional usage of Convolutional Neural Network (CNN) to capture n-grams in text-classification relies heavily on uniformity in word-ordering and the presence of long input sequences to convolve over. However, this is missing in short and unstructured text sequences encountered in search and recommendation. In order to tackle this, we propose an orthogonal approach by recasting the convolution operation to capture coupled semantics along the embedding dimensions, and develop a word-order agnostic embedding enhancement module to deal with the lack of structure in such queries. Benefitting from the computational efficiency of the convolution operation, Embedding Convolutions, when applied on the enriched word embeddings, result in a light-weight and yet powerful encoder (InceptionXML) that is robust to the inherent lack of structure in short-text extreme classification. Towards scaling our model to problems with millions of labels, we also propose InceptionXML+, which addresses the shortcomings of the dynamic hard-negative mining framework in the recently proposed LightXML by improving the alignment between the label-shortlister and extreme classifier. On popular benchmark datasets, we empirically demonstrate that the proposed method outperforms state-of-the-art deep extreme classifiers such as Astec by an average of 5 respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 8

page 9

research
11/01/2018

AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks

Extreme multi-label text classification (XMTC) is a task for tagging eac...
research
05/10/2018

Joint Embedding of Words and Labels for Text Classification

Word embeddings are effective intermediate representations for capturing...
research
12/05/2018

Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding

Automatic text classification (TC) research can be used for real-world p...
research
01/15/2020

Extreme Regression for Dynamic Search Advertising

This paper introduces a new learning paradigm called eXtreme Regression ...
research
08/01/2021

DECAF: Deep Extreme Classification with Label Features

Extreme multi-label classification (XML) involves tagging a data point w...
research
12/01/2014

Effective Use of Word Order for Text Categorization with Convolutional Neural Networks

Convolutional neural network (CNN) is a neural network that can make use...
research
02/12/2023

Review of Extreme Multilabel Classification

Extreme multilabel classification or XML, in short, has emerged as a new...

Please sign up or login with your details

Forgot password? Click here to reset