BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search

10/20/2020
by   Yunjiang Jiang, et al.
1

Relevance has significant impact on user experience and business profit for e-commerce search platform. In this work, we propose a data-driven framework for search relevance prediction, by distilling knowledge from BERT and related multi-layer Transformer teacher models into simple feed-forward networks with large amount of unlabeled data. The distillation process produces a student model that recovers more than 97% test accuracy of teacher models on new queries, at a serving cost that's several magnitude lower (latency 150x lower than BERT-Base and 15x lower than the most efficient BERT variant, TinyBERT). The applications of temperature rescaling and teacher model stacking further boost model accuracy, without increasing the student model complexity. We present experimental results on both in-house e-commerce search relevance data as well as a public data set on sentiment analysis from the GLUE benchmark. The latter takes advantage of another related public data set of much larger scale, while disregarding its potentially noisy labels. Embedding analysis and case study on the in-house data further highlight the strength of the resulting model. By making the data processing and model training source code public, we hope the techniques presented here can help reduce energy consumption of the state of the art Transformer models and also level the playing field for small organizations lacking access to cutting edge machine learning hardwares.

READ FULL TEXT
research
10/04/2022

Knowledge Distillation based Contextual Relevance Matching for E-commerce Product Search

Online relevance matching is an essential task of e-commerce product sea...
research
01/03/2022

Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

We perform knowledge distillation (KD) benchmark from task-specific BERT...
research
12/28/2022

OVO: One-shot Vision Transformer Search with Online distillation

Pure transformers have shown great potential for vision tasks recently. ...
research
05/24/2023

How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives

Recently, various intermediate layer distillation (ILD) objectives have ...
research
06/29/2022

Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices

Modern search systems use several large ranker models with transformer a...
research
07/11/2019

Privileged Features Distillation for E-Commerce Recommendations

Features play an important role in most prediction tasks of e-commerce r...
research
01/17/2022

Distillation from heterogeneous unlabeled collections

Compressing deep networks is essential to expand their range of applicat...

Please sign up or login with your details

Forgot password? Click here to reset