Joint Separation and Denoising of Noisy Multi-talker Speech using Recurrent Neural Networks and Permutation Invariant Training

08/31/2017
by   Morten Kolbæk, et al.
0

In this paper we propose to use utterance-level Permutation Invariant Training (uPIT) for speaker independent multi-talker speech separation and denoising, simultaneously. Specifically, we train deep bi-directional Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) using uPIT, for single-channel speaker independent multi-talker speech separation in multiple noisy conditions, including both synthetic and real-life noise signals. We focus our experiments on generalizability and noise robustness of models that rely on various types of a priori knowledge e.g. in terms of noise type and number of simultaneous speakers. We show that deep bi-directional LSTM RNNs trained using uPIT in noisy environments can improve the Signal-to-Distortion Ratio (SDR) as well as the Extended Short-Time Objective Intelligibility (ESTOI) measure, on the speaker independent multi-talker speech separation and denoising task, for various noise types and Signal-to-Noise Ratios (SNRs). Specifically, we first show that LSTM RNNs can achieve large SDR and ESTOI improvements, when evaluated using known noise types, and that a single model is capable of handling multiple noise types with only a slight decrease in performance. Furthermore, we show that a single LSTM RNN can handle both two-speaker and three-speaker noisy mixtures, without a priori knowledge about the exact number of speakers. Finally, we show that LSTM RNNs trained using uPIT generalize well to noise types not seen during training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2021

Guided Training: A Simple Method for Single-channel Speaker Separation

Deep learning has shown a great potential for speech separation, especia...
research
02/09/2022

Time-Frequency Mask Aware Bi-directional LSTM: A Deep Learning Approach for Underwater Acoustic Signal Separation

The underwater acoustic signals separation is a key technique for the un...
research
12/25/2019

Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation

Utterance-level permutation invariant training (uPIT) has achieved promi...
research
11/11/2019

Supervised Initialization of LSTM Networks for Fundamental Frequency Detection in Noisy Speech Signals

Fundamental frequency is one of the most important parameters of human s...
research
12/09/2017

A Deep Recurrent Framework for Cleaning Motion Capture Data

We present a deep, bidirectional, recurrent framework for cleaning noisy...
research
09/07/2020

Toward the pre-cocktail party problem with TasTas+

Deep neural network with dual-path bi-directional long short-term memory...
research
11/16/2021

Single-channel speech separation using Soft-minimum Permutation Invariant Training

The goal of speech separation is to extract multiple speech sources from...

Please sign up or login with your details

Forgot password? Click here to reset