Discriminative Pre-training for Low Resource Title Compression in Conversational Grocery

12/13/2020
by   Snehasish Mukherjee, et al.
0

The ubiquity of smart voice assistants has made conversational shopping commonplace. This is especially true for low consideration segments like grocery. A central problem in conversational grocery is the automatic generation of short product titles that can be read out fast during a conversation. Several supervised models have been proposed in the literature that leverage manually labeled datasets and additional product features to generate short titles automatically. However, obtaining large amounts of labeled data is expensive and most grocery item pages are not as feature-rich as other categories. To address this problem we propose a pre-training based solution that makes use of unlabeled data to learn contextual product representations which can then be fine-tuned to obtain better title compression even in a low resource setting. We use a self-attentive BiLSTM encoder network with a time distributed softmax layer for the title compression task. We overcome the vocabulary mismatch problem by using a hybrid embedding layer that combines pre-trained word embeddings with trainable character level convolutions. We pre-train this network as a discriminator on a replaced-token detection task over a large number of unlabeled grocery product titles. Finally, we fine tune this network, without any modifications, with a small labeled dataset for the title compression task. Experiments on Walmart's online grocery catalog show our model achieves performance comparable to state-of-the-art models like BERT and XLNet. When fine tuned on all of the available training data our model attains an F1 score of 0.8558 which lags the best performing model, BERT-Base, by 2.78 55 times lesser parameters than both. Further, when allowed to fine tune on 5 of the training data only, our model outperforms BERT-Base by 24.3 score.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2023

UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation

In this paper, we introduce UnFuSeD, a novel approach to leverage self-s...
research
11/10/2019

Effectiveness of self-supervised pre-training for speech recognition

We present pre-training approaches for self-supervised representation le...
research
09/08/2019

Multi-Task Bidirectional Transformer Representations for Irony Detection

Supervised deep learning requires large amounts of training data. In the...
research
02/12/2022

Wav2Vec2.0 on the Edge: Performance Evaluation

Wav2Vec2.0 is a state-of-the-art model which learns speech representatio...
research
03/29/2022

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

Training a text-to-speech (TTS) model requires a large scale text labele...
research
04/25/2022

On-demand compute reduction with stochastic wav2vec 2.0

Squeeze and Efficient Wav2vec (SEW) is a recently proposed architecture ...
research
06/13/2016

MITRE at SemEval-2016 Task 6: Transfer Learning for Stance Detection

We describe MITRE's submission to the SemEval-2016 Task 6, Detecting Sta...

Please sign up or login with your details

Forgot password? Click here to reset