Real-time binaural speech separation with preserved spatial cues

02/16/2020
by   Cong Han, et al.
0

Deep learning speech separation algorithms have achieved great success in improving the quality and intelligibility of separated speech from mixed audio. Most previous methods focused on generating a single-channel output for each of the target speakers, hence discarding the spatial cues needed for the localization of sound sources in space. However, preserving the spatial information is important in many applications that aim to accurately render the acoustic scene such as in hearing aids and augmented reality (AR). Here, we propose a speech separation algorithm that preserves the interaural cues of separated sound sources and can be implemented with low latency and high fidelity, therefore enabling a real-time modification of the acoustic scene. Based on the time-domain audio separation network (TasNet), a single-channel time-domain speech separation system that can be implemented in real-time, we propose a multi-input-multi-output (MIMO) end-to-end extension of TasNet that takes binaural mixed audio as input and simultaneously separates target speakers in both channels. Experimental results show that the proposed end-to-end MIMO system is able to significantly improve the separation performance and keep the perceived location of the modified sources intact in various acoustic scenes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2018

TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation

Robust speech processing in multitalker acoustic environments requires a...
research
09/02/2020

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

Most existing deep learning based binaural speaker separation systems fo...
research
10/28/2022

UX-NET: Filter-and-Process-based Improved U-Net for Real-time Time-domain Audio Separation

This study presents UX-Net, a time-domain audio separation network (TasN...
research
08/15/2022

LCSM: A Lightweight Complex Spectral Mapping Framework for Stereophonic Acoustic Echo Cancellation

The traditional adaptive algorithms will face the non-uniqueness problem...
research
03/21/2023

End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations

Recent works show that speech separation guided diarization (SSGD) is an...
research
12/09/2019

MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing

Deep learning methods have brought substantial advancements in speech se...
research
10/08/2018

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

The goal of this work is to develop a meeting transcription system that ...

Please sign up or login with your details

Forgot password? Click here to reset