A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder

01/24/2022
by   Yang Xiang, et al.
0

Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically, the proposed DRL method can learn to impose speech and noise signal priors to different sets of latent variables for SE. The experimental results show that the proposed method can not only disentangle speech and noise latent variables from the observed signal but also obtain a higher scale-invariant signal-to-distortion ratio and speech quality score than the similar deep neural network-based (DNN) SE method.

READ FULL TEXT
research
05/11/2022

A deep representation learning speech enhancement method using β-VAE

In previous work, we proposed a variational autoencoder-based (VAE) Baye...
research
01/18/2023

A variational autoencoder-based nonnegative matrix factorisation model for deep dictionary learning

Construction of dictionaries using nonnegative matrix factorisation (NMF...
research
11/16/2022

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

This paper focuses on leveraging deep representation learning (DRL) for ...
research
10/13/2021

DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding

Conventional vocoders are commonly used as analysis tools to provide int...
research
03/27/2023

Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise

Human-robot interaction relies on a noise-robust audio processing module...
research
05/05/2022

Unsupervised Mismatch Localization in Cross-Modal Sequential Data

Content mismatch usually occurs when data from one modality is translate...

Please sign up or login with your details

Forgot password? Click here to reset