Cracking the cocktail party problem by multi-beam deep attractor network

03/29/2018
by   Zhuo Chen, et al.
0

While recent progresses in neural network approaches to single-channel speech separation, or more generally the cocktail party problem, achieved significant improvement, their performance for complex mixtures is still not satisfactory. In this work, we propose a novel multi-channel framework for multi-talker separation. In the proposed model, an input multi-channel mixture signal is firstly converted to a set of beamformed signals using fixed beam patterns. For this beamforming, we propose to use differential beamformers as they are more suitable for speech separation. Then each beamformed signal is fed into a single-channel anchored deep attractor network to generate separated signals. And the final separation is acquired by post selecting the separating output for each beams. To evaluate the proposed system, we create a challenging dataset comprising mixtures of 2, 3 or 4 speakers. Our results show that the proposed system largely improves the state of the art in speech separation, achieving 11.5 dB, 11.76 dB and 11.02 dB average signal-to-distortion ratio improvement for 4, 3 and 2 overlapped speaker mixtures, which is comparable to the performance of a minimum variance distortionless response beamformer that uses oracle location, source, and noise information. We also run speech recognition with a clean trained acoustic model on the separated speech, achieving relative word error rate (WER) reduction of 45.76%, 59.40% and 62.80% on fully overlapped speech of 4, 3 and 2 speakers, respectively. With a far talk acoustic model, the WER is further reduced.

READ FULL TEXT
research
07/07/2016

Single-Channel Multi-Speaker Separation using Deep Clustering

Deep clustering is a recently introduced deep learning architecture that...
research
07/01/2022

Distance-Based Sound Separation

We propose the novel task of distance-based sound separation, where soun...
research
01/22/2019

Speech Separation Using Gain-Adapted Factorial Hidden Markov Models

We present a new probabilistic graphical model which generalizes factori...
research
05/25/2023

Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation

Speech separation is very important in real-world applications such as h...
research
08/03/2021

The Performance Evaluation of Attention-Based Neural ASR under Mixed Speech Input

In order to evaluate the performance of the attention based neural ASR u...
research
02/08/2021

Speaker and Direction Inferred Dual-channel Speech Separation

Most speech separation methods, trying to separate all channel sources s...
research
07/24/2022

Source Separation of Unknown Numbers of Single-Channel Underwater Acoustic Signals Based on Autoencoders

The separation of single-channel underwater acoustic signals is a challe...

Please sign up or login with your details

Forgot password? Click here to reset