Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

04/17/2021
by   Haoyu Li, et al.
0

The intelligibility of speech severely degrades in the presence of environmental noise and reverberation. In this paper, we propose a novel deep learning based system for modifying the speech signal to increase its intelligibility under the equal-power constraint, i.e., signal power before and after modification must be the same. To achieve this, we use generative adversarial networks (GANs) to obtain time-frequency dependent amplification factors, which are then applied to the input raw speech to reallocate the speech energy. Instead of optimizing only a single, simple metric, we train a deep neural network (DNN) model to simultaneously optimize multiple advanced speech metrics, including both intelligibility- and quality-related ones, which results in notable improvements in performance and robustness. Our system can not only work in non-realtime mode for offline audio playback but also support practical real-time speech applications. Experimental results using both objective measurements and subjective listening tests indicate that the proposed system significantly outperforms state-ofthe-art baseline systems under various noisy and reverberant listening conditions.

READ FULL TEXT

page 1

page 7

page 8

research
04/02/2020

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

The intelligibility of natural speech is seriously degraded when exposed...
research
06/10/2020

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Real-world audio recordings are often degraded by factors such as noise,...
research
11/22/2022

SkipConvGAN: Monaural Speech Dereverberation using Generative Adversarial Networks via Complex Time-Frequency Masking

With the advancements in deep learning approaches, the performance of sp...
research
04/02/2020

Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection

This paper presents a novel framework for Speech Activity Detection (SAD...
research
03/27/2020

Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems

Mobile and embedded devices are increasingly using microphones and audio...
research
04/16/2019

Expediting TTS Synthesis with Adversarial Vocoding

Recent approaches in text-to-speech (TTS) synthesis employ neural networ...
research
10/03/2022

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration

Denoising diffusion probabilistic models (DDPMs) and generative adversar...

Please sign up or login with your details

Forgot password? Click here to reset