Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

10/27/2022
by   Yujin Wang, et al.
0

Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing. The SSL model is normally pre-trained on a great variety of unlabelled data and a large model size is preferred to increase the modeling capacity. However, this might limit its potential applications due to the expensive computation and memory costs introduced by the oversize model. Miniaturization for SSL models has become an important research direction of practical value. To this end, we explore the effective distillation of HuBERT-based SSL models for automatic speech recognition (ASR). First, in order to establish a strong baseline, a comprehensive study on different student model structures is conducted. On top of this, as a supplement to the regression loss widely adopted in previous works, a discriminative loss is introduced for HuBERT to enhance the distillation performance, especially in low-resource scenarios. In addition, we design a simple and effective algorithm to distill the front-end input from waveform to Fbank feature, resulting in 17 parameter reduction and doubling inference speed, at marginal performance degradation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2023

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

Recent years have witnessed a boom in self-supervised learning (SSL) in ...
research
08/17/2023

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Text recognition methods are gaining rapid development. Some advanced te...
research
05/31/2023

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Automatic Speech Recognition (ASR) has seen remarkable advancements with...
research
01/30/2023

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Large-scale pre-trained language models (PLMs) with powerful language mo...
research
07/01/2022

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

Large-scale speech self-supervised learning (SSL) has emerged to the mai...
research
04/05/2022

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

Self-Supervised Learning (SSL) models have been successfully applied in ...
research
09/14/2023

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Self-supervised learning (SSL) proficiency in speech-related tasks has d...

Please sign up or login with your details

Forgot password? Click here to reset