Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

02/14/2023
by   Konrad Zuchniak, et al.
0

Deep learning has contributed greatly to many successes in artificial intelligence in recent years. Today, it is possible to train models that have thousands of layers and hundreds of billions of parameters. Large-scale deep models have achieved great success, but the enormous computational complexity and gigantic storage requirements make it extremely difficult to implement them in real-time applications. On the other hand, the size of the dataset is still a real problem in many domains. Data are often missing, too expensive, or impossible to obtain for other reasons. Ensemble learning is partially a solution to the problem of small datasets and overfitting. However, ensemble learning in its basic version is associated with a linear increase in computational complexity. We analyzed the impact of the ensemble decision-fusion mechanism and checked various methods of sharing the decisions including voting algorithms. We used the modified knowledge distillation framework as a decision-fusion mechanism which allows in addition compressing of the entire ensemble model into a weight space of a single model. We showed that knowledge distillation can aggregate knowledge from multiple teachers in only one student model and, with the same computational complexity, obtain a better-performing model compared to a model trained in the standard manner. We have developed our own method for mimicking the responses of all teachers at the same time, simultaneously. We tested these solutions on several benchmark datasets. In the end, we presented a wide application use of the efficient multi-teacher knowledge distillation framework. In the first example, we used knowledge distillation to develop models that could automate corrosion detection on aircraft fuselage. The second example describes detection of smoke on observation cameras in order to counteract wildfires in forests.

READ FULL TEXT

page 30

page 31

research
06/09/2020

Knowledge Distillation: A Survey

In recent years, deep neural networks have been very successful in the f...
research
04/17/2020

Triplet Loss for Knowledge Distillation

In recent years, deep learning has spread rapidly, and deeper, larger mo...
research
12/26/2022

Prototype-guided Cross-task Knowledge Distillation for Large-scale Models

Recently, large-scale pre-trained models have shown their advantages in ...
research
11/30/2022

Random Copolymer inverse design system orienting on Accurate discovering of Antimicrobial peptide-mimetic copolymers

Antimicrobial resistance is one of the biggest health problem, especiall...
research
12/01/2016

In Teacher We Trust: Learning Compressed Models for Pedestrian Detection

Deep convolutional neural networks continue to advance the state-of-the-...
research
06/18/2020

Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning

We present Shapeshifter Networks (SSNs), a flexible neural network frame...
research
04/03/2023

Domain Generalization for Crop Segmentation with Knowledge Distillation

In recent years, precision agriculture has gradually oriented farming cl...

Please sign up or login with your details

Forgot password? Click here to reset