Distilling HuBERT with LSTMs via Decoupled Knowledge Distillation

09/18/2023
by   Danilo de Oliveira, et al.
0

Much research effort is being applied to the task of compressing the knowledge of self-supervised models, which are powerful, yet large and memory consuming. In this work, we show that the original method of knowledge distillation (and its more recently proposed extension, decoupled knowledge distillation) can be applied to the task of distilling HuBERT. In contrast to methods that focus on distilling internal features, this allows for more freedom in the network architecture of the compressed model. We thus propose to distill HuBERT's Transformer layers into an LSTM-based distilled model that reduces the number of parameters even below DistilHuBERT and at the same time shows improved performance in automatic speech recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2022

Application of Knowledge Distillation to Multi-task Speech Representation Learning

Model architectures such as wav2vec 2.0 and HuBERT have been proposed to...
research
07/14/2022

Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models

Self-supervised learning (SSL) is seen as a very promising approach with...
research
05/18/2023

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

Due to the rapid development of computing hardware resources and the dra...
research
11/16/2022

Yield Evaluation of Citrus Fruits based on the YoloV5 compressed by Knowledge Distillation

In the field of planting fruit trees, pre-harvest estimation of fruit yi...
research
01/07/2022

Microdosing: Knowledge Distillation for GAN based Compression

Recently, significant progress has been made in learned image and video ...
research
01/25/2022

Attentive Task Interaction Network for Multi-Task Learning

Multitask learning (MTL) has recently gained a lot of popularity as a le...
research
03/07/2022

Enhance Language Identification using Dual-mode Model with Knowledge Distillation

In this paper, we propose to employ a dual-mode framework on the x-vecto...

Please sign up or login with your details

Forgot password? Click here to reset