A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

11/16/2022
by   Yang Xiang, et al.
0

This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's importance is often ignored in many DNN-based SE algorithms. To obtain a higher quality enhanced speech, we propose a two-stage DRL-based SE method through adversarial training. In the first stage, we disentangle different latent variables because disentangled representations can help DNN generate a better enhanced speech. Specifically, we use the β-variational autoencoder (VAE) algorithm to obtain the speech and noise posterior estimations and related representations from the observed signal. However, since the posteriors and representations are intractable and we can only apply a conditional assumption to estimate them, it is difficult to ensure that these estimations are always pretty accurate, which may potentially degrade the final accuracy of the signal estimation. To further improve the quality of enhanced speech, in the second stage, we introduce adversarial training to reduce the effect of the inaccurate posterior towards signal reconstruction and improve the signal estimation accuracy, making our algorithm more robust for the potentially inaccurate posterior estimations. As a result, better SE performance can be achieved. The experimental results indicate that the proposed strategy can help similar DNN-based SE algorithms achieve higher short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and scale-invariant signal-to-distortion ratio (SI-SDR) scores. Moreover, the proposed algorithm can also outperform recent competitive SE algorithms.

READ FULL TEXT
research
05/11/2022

A deep representation learning speech enhancement method using β-VAE

In previous work, we proposed a variational autoencoder-based (VAE) Baye...
research
01/24/2022

A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder

Recently, variational autoencoder (VAE), a deep representation learning ...
research
11/19/2019

Distributed Microphone Speech Enhancement based on Deep Learning

Speech-related applications deliver inferior performance in complex nois...
research
05/19/2021

Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement

Recently, the standard variational autoencoder has been successfully use...
research
01/21/2016

On Structured Sparsity of Phonological Posteriors for Linguistic Parsing

The speech signal conveys information on different time scales from shor...
research
10/28/2020

Improving Perceptual Quality by Phone-Fortified Perceptual Loss for Speech Enhancement

Speech enhancement (SE) aims to improve speech quality and intelligibili...
research
05/18/2023

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

Diffusion-based speech enhancement (SE) has been investigated recently, ...

Please sign up or login with your details

Forgot password? Click here to reset