Speaker-independent Speech Separation with Deep Attractor Network

07/12/2017
by   Yi Luo, et al.
0

Despite the recent success of deep learning for many speech processing tasks, single-microphone, speaker-independent speech separation remains challenging for two main reasons. The first reason is the arbitrary order of the target and masker speakers in the mixture (permutation problem), and the second is the unknown number of speakers in the mixture (output dimension problem). We propose a novel deep learning framework for speech separation that addresses both of these issues. We use a neural network to project the time-frequency representation of the mixture signal into a high-dimensional embedding space. A reference point (attractor) is created in the embedding space to represent each speaker which is defined as the centroid of the speaker in the embedding space. The time-frequency embeddings of each speaker are then forced to cluster around the corresponding attractor point which is used to determine the time-frequency assignment of the speaker. We propose three methods for finding the attractors for each source in the embedding space and compare their advantages and limitations. The objective function for the network is standard signal reconstruction error which enables end-to-end operation during both training and test phases. We evaluated our system using the Wall Street Journal dataset (WSJ0) on two and three speaker mixtures and report comparable or better performance than other state-of-the-art deep learning methods for speech separation.

READ FULL TEXT

page 1

page 4

page 7

page 8

research
07/24/2018

Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures

Speaker-aware source separation methods are promising workarounds for ma...
research
10/10/2021

Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain

The crux of single-channel speech separation is how to encode the mixtur...
research
04/27/2018

Deep Speech Denoising with Vector Space Projections

We propose an algorithm to denoise speakers from a single microphone in ...
research
11/18/2016

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

Deep clustering is the first method to handle general audio separation s...
research
07/25/2022

ConceptBeam: Concept Driven Target Speech Extraction

We propose a novel framework for target speech extraction based on seman...
research
04/26/2018

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

This paper proposes an end-to-end approach for single-channel speaker-in...
research
11/22/2018

Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective

This study investigates phase reconstruction for deep learning based mon...

Please sign up or login with your details

Forgot password? Click here to reset