Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning

07/19/2021
by   Rohith Aralikatti, et al.
0

We present a novel approach that improves the performance of reverberant speech separation. Our approach is based on an accurate geometric acoustic simulator (GAS) which generates realistic room impulse responses (RIRs) by modeling both specular and diffuse reflections. We also propose three training methods - pre-training, multi-stage training and curriculum learning that significantly improve separation quality in the presence of reverberation. We also demonstrate that mixing the synthetic RIRs with a small number of real RIRs during training enhances separation performance. We evaluate our approach on reverberant mixtures generated from real, recorded data (in several different room configurations) from the VOiCES dataset. Our novel approach (curriculum learning+pre-training+multi-stage training) results in a significant relative improvement over prior techniques based on image source method (ISM).

READ FULL TEXT
research
04/21/2020

Curriculum Pre-training for End-to-End Speech Translation

End-to-end speech translation poses a heavy burden on the encoder, becau...
research
10/28/2019

Unsupervised pre-training for sequence to sequence speech recognition

This paper proposes a novel approach to pre-train encoder-decoder sequen...
research
09/16/2019

Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network

Background noise, interfering speech and room reverberation frequently d...
research
11/14/2021

Curriculum Learning for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a task where an agent navigates ...
research
12/10/2022

Synthetic Wave-Geometric Impulse Responses for Improved Speech Dereverberation

We present a novel approach to improve the performance of learning-based...
research
04/04/2022

GWA: A Large High-Quality Acoustic Dataset for Audio Processing

We present the Geometric-Wave Acoustic (GWA) dataset, a large-scale audi...
research
04/07/2022

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

Existing multi-channel continuous speech separation (CSS) models are hea...

Please sign up or login with your details

Forgot password? Click here to reset