Deep Speech Denoising with Vector Space Projections

04/27/2018
by   Jeff Hetherly, et al.
0

We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise. Our approach is inspired by the recent success of neural network models separating speakers from other speakers and singers from instrumental accompaniment. Unlike prior art, we leverage embedding spaces produced with source-contrastive estimation, a technique derived from negative sampling techniques in natural language processing, while simultaneously obtaining a continuous inference mask. Our embedding space directly optimizes for the discrimination of speaker and noise by jointly modeling their characteristics. This space is generalizable in that it is not speaker or noise specific and is capable of denoising speech even if the model has not seen the speaker in the training set. Parameters are trained with dual objectives: one that promotes a selective bandpass filter that eliminates noise at time-frequency positions that exceed signal power, and another that proportionally splits time-frequency content between signal and noise. We compare to state of the art algorithms as well as traditional sparse non-negative matrix factorization solutions. The resulting algorithm avoids severe computational burden by providing a more intuitive and easily optimized approach, while achieving competitive accuracy.

READ FULL TEXT
research
05/12/2017

Monaural Audio Speaker Separation with Source Contrastive Estimation

We propose an algorithm to separate simultaneously speaking persons from...
research
07/12/2017

Speaker-independent Speech Separation with Deep Attractor Network

Despite the recent success of deep learning for many speech processing t...
research
03/05/2023

Time-frequency Network for Robust Speaker Recognition

The wide deployment of speech-based biometric systems usually demands hi...
research
09/20/2018

TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation

Robust speech processing in multitalker acoustic environments requires a...
research
05/08/2021

Zero-Shot Personalized Speech Enhancement through Speaker-Informed Model Selection

This paper presents a novel zero-shot learning approach towards personal...
research
04/23/2019

Harmonic-aligned Frame Mask Based on Non-stationary Gabor Transform with Application to Content-dependent Speaker Comparison

We propose harmonic-aligned frame mask for speech signals using non-stat...
research
08/26/2020

FCN Approach for Dynamically Locating Multiple Speakers

In this paper, we present a deep neural network-based online multi-speak...

Please sign up or login with your details

Forgot password? Click here to reset