On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network

07/27/2020
by   Farnood Faraji, et al.
0

The advent of learning-based methods in speech enhancement has revived the need for robust and reliable training features that can compactly represent speech signals while preserving their vital information. Time-frequency domain features, such as the Short-Term Fourier Transform (STFT) and Mel-Frequency Cepstral Coefficients (MFCC), are preferred in many approaches. While the MFCC provide for a compact representation, they ignore the dynamics and distribution of energy in each mel-scale subband. In this work, a speech enhancement system based on Generative Adversarial Network (GAN) is implemented and tested with a combination of Audio FingerPrinting (AFP) features obtained from the MFCC and the Normalized Spectral Subband Centroids (NSSC). The NSSC capture the locations of speech formants and complement the MFCC in a crucial way. In experiments with diverse speakers and noise types, GAN-based speech enhancement with the proposed AFP feature combination achieves the best objective performance while reducing memory requirements and training time.

READ FULL TEXT
research
10/21/2019

Perceptual Speech Enhancement via Generative Adversarial Networks

Automatic speech recognition (ASR) systems are of vital importance nowad...
research
03/31/2022

Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

Score-based generative models (SGMs) have recently shown impressive resu...
research
03/30/2021

Time-domain Speech Enhancement with Generative Adversarial Learning

Speech enhancement aims to obtain speech signals with high intelligibili...
research
04/06/2019

Towards Generalized Speech Enhancement with Generative Adversarial Networks

The speech enhancement task usually consists of removing additive noise ...
research
03/28/2017

SEGAN: Speech Enhancement Generative Adversarial Network

Current speech enhancement techniques operate on the spectral domain and...
research
12/18/2017

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

Speech enhancement deep learning systems usually require large amounts o...
research
12/21/2018

Multi-Domain Processing via Hybrid Denoising Networks for Speech Enhancement

We present a hybrid framework that leverages the trade-off between tempo...

Please sign up or login with your details

Forgot password? Click here to reset