Adversarial-Based Knowledge Distillation for Multi-Model Ensemble and Noisy Data Refinement

08/22/2019
by   Zhiqiang Shen, et al.
24

Generic Image recognition is a fundamental and fairly important visual problem in computer vision. One of the major challenges of this task lies in the fact that single image usually has multiple objects inside while the labels are still one-hot, another one is noisy and sometimes missing labels when annotated by humans. In this paper, we focus on tackling these challenges accompanying with two different image recognition problems: multi-model ensemble and noisy data recognition with a unified framework. As is well-known, usually the best performing deep neural models are ensembles of multiple base-level networks, as it can mitigate the variation or noise containing in the dataset. Unfortunately, the space required to store these many networks, and the time required to execute them at runtime, prohibit their use in applications where test sets are large (e.g., ImageNet). In this paper, we present a method for compressing large, complex trained ensembles into a single network, where the knowledge from a variety of trained deep neural networks (DNNs) is distilled and transferred to a single DNN. In order to distill diverse knowledge from different trained (teacher) models, we propose to use adversarial-based learning strategy where we define a block-wise training loss to guide and optimize the predefined student network to recover the knowledge in teacher models, and to promote the discriminator network to distinguish teacher vs. student features simultaneously. Extensive experiments on CIFAR-10/100, SVHN, ImageNet and iMaterialist Challenge Dataset demonstrate the effectiveness of our MEAL method. On ImageNet, our ResNet-50 based MEAL achieves top-1/5 21.79 by 2.06 remarkable improvement of top-3 1.15 baseline model of ResNet-101.

READ FULL TEXT

page 1

page 2

page 4

page 6

page 7

research
12/06/2018

MEAL: Multi-Model Ensemble via Adversarial Learning

Often the best performing deep neural models are ensembles of multiple b...
research
10/22/2021

How and When Adversarial Robustness Transfers in Knowledge Distillation?

Knowledge distillation (KD) has been widely used in teacher-student trai...
research
06/26/2019

Essence Knowledge Distillation for Speech Recognition

It is well known that a speech recognition system that combines multiple...
research
02/28/2020

An Efficient Method of Training Small Models for Regression Problems with Knowledge Distillation

Compressing deep neural network (DNN) models becomes a very important an...
research
11/12/2015

Representational Distance Learning for Deep Neural Networks

Deep neural networks (DNNs) provide useful models of visual representati...
research
12/01/2021

Extrapolating from a Single Image to a Thousand Classes using Distillation

What can neural networks learn about the visual world from a single imag...

Please sign up or login with your details

Forgot password? Click here to reset