RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

02/18/2023
by   Heitor R. Guimarães, et al.
0

Self-supervised speech pre-training enables deep neural network models to capture meaningful and disentangled factors from raw waveform signals. The learned universal speech representations can then be used across numerous downstream tasks. These representations, however, are sensitive to distribution shifts caused by environmental factors, such as noise and/or room reverberation. Their large sizes, in turn, make them unfeasible for edge applications. In this work, we propose a knowledge distillation methodology termed RobustDistiller which compresses universal representations while making them more robust against environmental artifacts via a multi-task learning objective. The proposed layer-wise distillation recipe is evaluated on top of three well-established universal representations, as well as with three downstream tasks. Experimental results show the proposed methodology applied on top of the WavLM Base+ teacher model outperforming all other benchmarks across noise types and levels, as well as reverberation times. Oftentimes, the obtained results with the student model (24M parameters) achieved results inline with those of the teacher model (95M).

READ FULL TEXT
research
12/06/2022

Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation

In this work, we present a novel method, named AV2vec, for learning audi...
research
05/09/2023

An Exploration into the Performance of Unsupervised Cross-Task Speech Representations for "In the Wild” Edge Applications

Unsupervised speech models are becoming ubiquitous in the speech and mac...
research
07/16/2021

Representation Consolidation for Training Expert Students

Traditionally, distillation has been used to train a student model to em...
research
10/05/2021

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Self-supervised speech representation learning methods like wav2vec 2.0 ...
research
10/14/2022

Improving generalizability of distilled self-supervised speech processing models under distorted settings

Self-supervised learned (SSL) speech pre-trained models perform well acr...
research
09/29/2022

Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights

Learning representations of neural network weights given a model zoo is ...

Please sign up or login with your details

Forgot password? Click here to reset