Deep Network Perceptual Losses for Speech Denoising

11/21/2020
by   Mark R. Saddler, et al.
0

Contemporary speech enhancement predominantly relies on audio transforms that are trained to reconstruct a clean speech waveform. Here we investigate whether deep feature representations learned for audio classification tasks can be used to improve denoising. We first trained deep neural networks to classify either spoken words or environmental sounds from audio. We then trained an audio transform to map noisy speech to an audio waveform that minimized 'perceptual' losses derived from the recognition network. When the transform was trained to minimize the difference in the deep feature representations between the output audio and the corresponding clean audio, it removed noise substantially better than baseline methods trained to reconstruct clean waveforms. The learned deep features were essential for this improvement, as features from untrained networks with random weights did not provide the same benefit. The results suggest the use of deep features as perceptual metrics to guide speech enhancement.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
09/14/2023

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

Speech enhancement systems are typically trained using pairs of clean an...
research
06/27/2018

Speech Denoising with Deep Feature Losses

We present an end-to-end deep learning approach to denoising speech sign...
research
11/20/2022

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

Audio-visual speech enhancement aims to extract clean speech from a nois...
research
09/06/2018

Cycle-Consistent Speech Enhancement

Feature mapping using deep neural networks is an effective approach for ...
research
02/19/2021

Speech enhancement with weakly labelled data from AudioSet

Speech enhancement is a task to improve the intelligibility and perceptu...
research
11/08/2020

Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation

This paper presents a denoising and dereverberation hierarchical neural ...
research
04/16/2019

Audio Denoising with Deep Network Priors

We present a method for audio denoising that combines processing done in...

Please sign up or login with your details

Forgot password? Click here to reset