CED: Consistent ensemble distillation for audio tagging

08/23/2023
by   Heinrich Dinkel, et al.
0

Augmentation and knowledge distillation (KD) are well-established techniques employed in audio classification tasks, aimed at enhancing performance and reducing model sizes on the widely recognized Audioset (AS) benchmark. Although both techniques are effective individually, their combined use, called consistent teaching, hasn't been explored before. This paper proposes CED, a simple training framework that distils student models from large teacher ensembles with consistent teaching. To achieve this, CED efficiently stores logits as well as the augmentation methods on disk, making it scalable to large-scale datasets. Central to CED's efficacy is its label-free nature, meaning that only the stored logits are used for the optimization of a student model only requiring 0.3% additional disk space for AS. The study trains various transformer-based models, including a 10M parameter model achieving a 49.0 mean average precision (mAP) on AS. Pretrained models and code are available at https://github.com/RicherMans/CED.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2021

Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation

Knowledge distillation is a method of transferring the knowledge from a ...
research
11/09/2022

Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation

Audio Spectrogram Transformer models rule the field of Audio Tagging, ou...
research
01/18/2022

It's All in the Head: Representation Knowledge Distillation through Classifier Sharing

Representation knowledge distillation aims at transferring rich informat...
research
08/15/2021

Multi-granularity for knowledge distillation

Considering the fact that students have different abilities to understan...
research
08/16/2020

Cascaded channel pruning using hierarchical self-distillation

In this paper, we propose an approach for filter-level pruning with hier...
research
09/03/2020

Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging

Knowledge Distillation (KD) is a popular area of research for reducing t...
research
07/08/2022

SETSum: Summarization and Visualization of Student Evaluations of Teaching

Student Evaluations of Teaching (SETs) are widely used in colleges and u...

Please sign up or login with your details

Forgot password? Click here to reset