Speech separation with large-scale self-supervised learning

11/09/2022
by   Zhuo Chen, et al.
0

Self-supervised learning (SSL) methods such as WavLM have shown promising speech separation (SS) results in small-scale simulation-based experiments. In this work, we extend the exploration of the SSL-based SS by massively scaling up both the pre-training data (more than 300K hours) and fine-tuning data (10K hours). We also investigate various techniques to efficiently integrate the pre-trained model with the SS network under a limited computation budget, including a low frame rate SSL model training setup and a fine-tuning scheme using only the part of the pre-trained model. Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15.9 a simulated far-field speech mixture test set. For conversation transcription on real meeting recordings using continuous speech separation, the proposed model achieves 6.8 supervised baseline on AMI and ICSI evaluation sets, respectively, while reducing the computational cost by 38

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2021

Investigating self-supervised front ends for speech spoofing countermeasures

Self-supervised speech model is a rapid progressing research topic, and ...
research
02/18/2023

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

Recent years have witnessed a boom in self-supervised learning (SSL) in ...
research
05/19/2023

Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

Speech fluency/disfluency can be evaluated by analyzing a range of phone...
research
06/14/2023

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

Large, pre-trained representation models trained using self-supervised l...
research
10/26/2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Self-supervised learning (SSL) achieves great success in speech recognit...
research
10/30/2022

Adaptive Speech Quality Aware Complex Neural Network for Acoustic Echo Cancellation with Supervised Contrastive Learning

Acoustic echo cancellation (AEC) is designed to remove echoes, reverbera...
research
04/07/2022

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

Existing multi-channel continuous speech separation (CSS) models are hea...

Please sign up or login with your details

Forgot password? Click here to reset