Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

by   Haoyu Li, et al.

The intelligibility of speech severely degrades in the presence of environmental noise and reverberation. In this paper, we propose a novel deep learning based system for modifying the speech signal to increase its intelligibility under the equal-power constraint, i.e., signal power before and after modification must be the same. To achieve this, we use generative adversarial networks (GANs) to obtain time-frequency dependent amplification factors, which are then applied to the input raw speech to reallocate the speech energy. Instead of optimizing only a single, simple metric, we train a deep neural network (DNN) model to simultaneously optimize multiple advanced speech metrics, including both intelligibility- and quality-related ones, which results in notable improvements in performance and robustness. Our system can not only work in non-realtime mode for offline audio playback but also support practical real-time speech applications. Experimental results using both objective measurements and subjective listening tests indicate that the proposed system significantly outperforms state-ofthe-art baseline systems under various noisy and reverberant listening conditions.



There are no comments yet.


page 1

page 7

page 8


iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

The intelligibility of natural speech is seriously degraded when exposed...

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Real-world audio recordings are often degraded by factors such as noise,...

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

Improving speech system performance in noisy environments remains a chal...

Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection

This paper presents a novel framework for Speech Activity Detection (SAD...

Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems

Mobile and embedded devices are increasingly using microphones and audio...

Expediting TTS Synthesis with Adversarial Vocoding

Recent approaches in text-to-speech (TTS) synthesis employ neural networ...

A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems

In recent years, statistical parametric speech synthesis (SPSS) systems ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.