Deep geometric knowledge distillation with graphs

11/08/2019
by   Carlos Lassance, et al.
26

In most cases deep learning architectures are trained disregarding the amount of operations and energy consumption. However, some applications, like embedded systems, can be resource-constrained during inference. A popular approach to reduce the size of a deep learning architecture consists in distilling knowledge from a bigger network (teacher) to a smaller one (student). Directly training the student to mimic the teacher representation can be effective, but it requires that both share the same latent space dimensions. In this work, we focus instead on relative knowledge distillation (RKD), which considers the geometry of the respective latent spaces, allowing for dimension-agnostic transfer of knowledge. Specifically we introduce a graph-based RKD method, in which graphs are used to capture the geometry of latent spaces. Using classical computer vision benchmarks, we demonstrate the ability of the proposed method to efficiently distillate knowledge from the teacher to the student, leading to better accuracy for the same budget as compared to existing RKD alternatives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2020

Local Region Knowledge Distillation

Knowledge distillation (KD) is an effective technique to transfer knowle...
research
10/10/2020

Structural Knowledge Distillation

Knowledge distillation is a critical technique to transfer knowledge bet...
research
12/06/2019

LaTeS: Latent Space Distillation for Teacher-Student Driving Policy Learning

We describe a policy learning approach to map visual inputs to driving c...
research
11/14/2020

Representing Deep Neural Networks Latent Space Geometries with Graphs

Deep Learning (DL) has attracted a lot of attention for its ability to r...
research
11/05/2021

AUTOKD: Automatic Knowledge Distillation Into A Student Architecture Family

State-of-the-art results in deep learning have been improving steadily, ...
research
06/29/2022

Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices

Modern search systems use several large ranker models with transformer a...
research
09/28/2020

Kernel Based Progressive Distillation for Adder Neural Networks

Adder Neural Networks (ANNs) which only contain additions bring us a new...

Please sign up or login with your details

Forgot password? Click here to reset