Guided Training: A Simple Method for Single-channel Speaker Separation

03/26/2021
by   Hao Li, et al.
0

Deep learning has shown a great potential for speech separation, especially for speech and non-speech separation. However, it encounters permutation problem for multi-speaker separation where both target and interference are speech. Permutation Invariant training (PIT) was proposed to solve this problem by permuting the order of the multiple speakers. Another way is to use an anchor speech, a short speech of the target speaker, to model the speaker identity. In this paper, we propose a simple strategy to train a long short-term memory (LSTM) model to solve the permutation problem in speaker separation. Specifically, we insert a short speech of target speaker at the beginning of a mixture as guide information. So, the first appearing speaker is defined as the target. Due to the powerful capability on sequence modeling, LSTM can use its memory cells to track and separate target speech from interfering speech. Experimental results show that the proposed training strategy is effective for speaker separation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/25/2019

Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation

Utterance-level permutation invariant training (uPIT) has achieved promi...
research
10/20/2020

Speaker Separation Using Speaker Inventories and Estimated Speech

We propose speaker separation using speaker inventories and estimated sp...
research
08/31/2017

Joint Separation and Denoising of Noisy Multi-talker Speech using Recurrent Neural Networks and Permutation Invariant Training

In this paper we propose to use utterance-level Permutation Invariant Tr...
research
07/01/2016

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

We propose a novel deep learning model, which supports permutation invar...
research
07/23/2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

Deep clustering (DC) and utterance-level permutation invariant training ...
research
10/20/2021

Time-Domain Mapping Based Single-Channel Speech Separation With Hierarchical Constraint Training

Single-channel speech separation is required for multi-speaker speech re...
research
10/27/2021

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Multi-talker conversational speech processing has drawn many interests f...

Please sign up or login with your details

Forgot password? Click here to reset