Practical Knowledge Distillation: Using DNNs to Beat DNNs

02/23/2023
by   Chung-Wei Lee, et al.
0

For tabular data sets, we explore data and model distillation, as well as data denoising. These techniques improve both gradient-boosting models and a specialized DNN architecture. While gradient boosting is known to outperform DNNs on tabular data, we close the gap for datasets with 100K+ rows and give DNNs an advantage on small data sets. We extend these results with input-data distillation and optimized ensembling to help DNN performance match or exceed that of gradient boosting. As a theoretical justification of our practical method, we prove its equivalence to classical cross-entropy knowledge distillation. We also qualitatively explain the superiority of DNN ensembles over XGBoost on small data sets. For an industry end-to-end real-time ML platform with 4M production inferences per second, we develop a model-training workflow based on data sampling that distills ensembles of models into a single gradient-boosting model favored for high-performance real-time inference, without performance loss. Empirical evaluation shows that the proposed combination of methods consistently improves model accuracy over prior best models across several production applications deployed worldwide.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2019

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

Non-autoregressive machine translation (NAT) systems predict a sequence ...
research
08/18/2022

Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification

Compared to traditional learning from scratch, knowledge distillation so...
research
03/07/2020

Explaining Knowledge Distillation by Quantifying the Knowledge

This paper presents a method to interpret the success of knowledge disti...
research
03/25/2021

Spirit Distillation: Precise Real-time Prediction with Insufficient Data

Recent trend demonstrates the effectiveness of deep neural networks (DNN...
research
05/29/2022

A General Multiple Data Augmentation Based Framework for Training Deep Neural Networks

Deep neural networks (DNNs) often rely on massive labelled data for trai...
research
09/14/2021

Exploring the Connection between Knowledge Distillation and Logits Matching

Knowledge distillation is a generalized logits matching technique for mo...
research
08/26/2023

Boosting Residual Networks with Group Knowledge

Recent research understands the residual networks from a new perspective...

Please sign up or login with your details

Forgot password? Click here to reset