Meta-Ensemble Parameter Learning

10/05/2022
by   Zhengcong Fei, et al.
0

Ensemble of machine learning models yields improved performance as well as robustness. However, their memory requirements and inference costs can be prohibitively high. Knowledge distillation is an approach that allows a single model to efficiently capture the approximate performance of an ensemble while showing poor scalability as demand for re-training when introducing new teacher models. In this paper, we study if we can utilize the meta-learning strategy to directly predict the parameters of a single model with comparable performance of an ensemble. Hereto, we introduce WeightFormer, a Transformer-based model that can predict student network weights layer by layer in a forward pass, according to the teacher model parameters. The proprieties of WeightFormer are investigated on the CIFAR-10, CIFAR-100, and ImageNet datasets for model structures of VGGNet-11, ResNet-50, and ViT-B/32, where it demonstrates that our method can achieve approximate classification performance of an ensemble and outperforms both the single network and standard knowledge distillation. More encouragingly, we show that WeightFormer results can further exceeds average ensemble with minor fine-tuning. Importantly, our task along with the model and results can potentially lead to a new, more efficient, and scalable paradigm of ensemble networks parameter learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2023

Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning

Multi-Teacher knowledge distillation provides students with additional s...
research
02/25/2022

Learn From the Past: Experience Ensemble Knowledge Distillation

Traditional knowledge distillation transfers "dark knowledge" of a pre-t...
research
10/25/2021

Parameter Prediction for Unseen Deep Architectures

Deep learning has been successful in automating the design of features i...
research
09/17/2019

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

Ensemble models comprising of deep Convolutional Neural Networks (CNN) h...
research
05/14/2021

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Ensembles of machine learning models yield improved system performance a...
research
01/25/2022

Attentive Task Interaction Network for Multi-Task Learning

Multitask learning (MTL) has recently gained a lot of popularity as a le...
research
07/31/2022

Chinese grammatical error correction based on knowledge distillation

In view of the poor robustness of existing Chinese grammatical error cor...

Please sign up or login with your details

Forgot password? Click here to reset