Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network

09/16/2019
by   Ke Tan, et al.
0

Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments. In this study, we address joint speech separation and dereverberation, which aims to separate target speech from background noise, interfering speech and room reverberation. In order to tackle this fundamentally difficult problem, we propose a novel multimodal network that exploits both audio and visual signals. The proposed network architecture adopts a two-stage strategy, where a separation module is employed to attenuate background noise and interfering speech in the first stage and a dereverberation module to suppress room reverberation in the second stage. The two modules are first trained separately, and then integrated for joint training, which is based on a new multi-objective loss function. Our experimental results show that the proposed multimodal network yields consistently better objective intelligibility and perceptual quality than several one-stage and two-stage baselines. We find that our network achieves a 21.10 mixtures. Moreover, our network architecture does not require the knowledge of the number of speakers.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 8

research
04/14/2020

Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environment

In daily listening environments, speech is always distorted by backgroun...
research
03/02/2021

Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss

We present an audio-visual speech separation learning method that consid...
research
03/07/2023

A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments

In noisy and reverberant environments, the performance of deep learning-...
research
05/20/2020

SADDEL: Joint Speech Separation and Denoising Model based on Multitask Learning

Speech data collected in real-world scenarios often encounters two issue...
research
07/19/2021

Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning

We present a novel approach that improves the performance of reverberant...
research
02/11/2019

Learning to Authenticate with Deep Multibiometric Hashing and Neural Network Decoding

In this paper, we propose a novel three-stage multimodal deep hashing ne...
research
03/14/2023

Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments

Real-time single-channel speech separation aims to unmix an audio stream...

Please sign up or login with your details

Forgot password? Click here to reset