Ensemble knowledge distillation of self-supervised speech models

02/24/2023
by   Kuan-Po Huang, et al.
0

Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerwise-average and layerwise-concatenation, to the representations of different teacher models and found that the former was more effective. On top of that, we proposed a multiple prediction head method for student models to predict different layer outputs of multiple teacher models simultaneously. The experimental results show that our method improves the performance of the distilled models on four downstream speech processing tasks, Phoneme Recognition, Speaker Identification, Emotion Recognition, and Automatic Speech Recognition in the hidden-set track of the SUPERB benchmark.

READ FULL TEXT
research
01/13/2022

SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

Feature regression is a simple way to distill large neural network model...
research
07/01/2022

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

Large-scale speech self-supervised learning (SSL) has emerged to the mai...
research
10/05/2022

Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora

Self-supervised speech models have grown fast during the past few years ...
research
10/14/2022

Improving generalizability of distilled self-supervised speech processing models under distorted settings

Self-supervised learned (SSL) speech pre-trained models perform well acr...
research
07/14/2022

Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models

Self-supervised learning (SSL) is seen as a very promising approach with...
research
09/14/2023

CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

Large-scale self-supervised pre-trained speech encoders outperform conve...
research
04/08/2023

Unsupervised Speech Representation Pooling Using Vector Quantization

With the advent of general-purpose speech representations from large-sca...

Please sign up or login with your details

Forgot password? Click here to reset