WHAM!: Extending Speech Separation to Noisy Environments

07/02/2019
by   Gordon Wichern, et al.
0

Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2023

Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments

Real-time single-channel speech separation aims to unmix an audio stream...
research
04/10/2018

Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation

We present a joint audio-visual model for isolating a single speech sign...
research
10/22/2019

WHAMR!: Noisy and Reverberant Single-Channel Speech Separation

While significant advances have been made in recent years in the separat...
research
10/23/2020

Training Noisy Single-Channel Speech Separation With Noisy Oracle Sources: A Large Gap and A Small Step

As the performance of single-channel speech separation systems has impro...
research
04/14/2020

Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environment

In daily listening environments, speech is always distorted by backgroun...
research
06/22/2021

Multi-accent Speech Separation with One Shot Learning

Speech separation is a problem in the field of speech processing that ha...
research
05/11/2020

Foreground-Background Ambient Sound Scene Separation

Ambient sound scenes typically comprise multiple short events occurring ...

Please sign up or login with your details

Forgot password? Click here to reset