Self-Supervised Learning for speech recognition with Intermediate layer supervision

12/16/2021
by   Chengyi Wang, et al.
0

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information. Since the network capacity is limited, we believe the speech recognition performance could be further improved if the model is dedicated to audio content information learning. To this end, we propose Intermediate Layer Supervision for Self-Supervised Learning (ILS-SSL), which forces the model to concentrate on content information as much as possible by adding an additional SSL loss on the intermediate layers. Experiments on LibriSpeech test-other set show that our method outperforms HuBERT significantly, which achieves a 23.5 setting for base/large models. Detailed analysis shows the bottom layers of our model have a better correlation with phonetic units, which is consistent with our intuition and explains the success of our method for ASR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2022

Progressive Multi-Scale Self-Supervised Learning for Speech Recognition

Self-supervised learning (SSL) models have achieved considerable improve...
research
05/18/2023

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

Self-supervised speech representation models have succeeded in various t...
research
10/26/2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Self-supervised learning (SSL) achieves great success in speech recognit...
research
01/25/2020

Multi-task self-supervised learning for Robust Speech Recognition

Despite the growing interest in unsupervised learning, extracting meanin...
research
11/01/2022

Avoid Overthinking in Self-Supervised Models for Speech Recognition

Self-supervised learning (SSL) models reshaped our approach to speech, l...
research
06/22/2022

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement

Speech enhancement has seen great improvement in recent years using end-...
research
10/05/2016

Understanding intermediate layers using linear classifier probes

Neural network models have a reputation for being black boxes. We propos...

Please sign up or login with your details

Forgot password? Click here to reset