DeepAI AI Chat
Log In Sign Up

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

by   Xiaoyu Bie, et al.

Dynamical variational auto-encoders (DVAEs) are a class of deep generative models with latent variables, dedicated to time series data modeling. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include the modeling of temporal dependencies between successive observed and/or latent vectors in data sequences. Previous work has shown the interest of DVAEs and their better performance over the VAE for speech signals (spectrogram) modeling. Independently, the VAE has been successfully applied to speech enhancement in noise, in an unsupervised noise-agnostic set-up that does not require the use of a parallel dataset of clean and noisy speech samples for training, but only requires clean speech signals. In this paper, we extend those works to DVAE-based single-channel unsupervised speech enhancement, hence exploiting both speech signals unsupervised representation learning and dynamics modeling. We propose an unsupervised speech enhancement algorithm based on the most general form of DVAEs, that we then adapt to three specific DVAE models to illustrate the versatility of the framework. More precisely, we combine DVAE-based speech priors with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement. Experimental results show that the proposed approach based on DVAEs outperforms its VAE counterpart and a supervised speech enhancement baseline.


Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoder

Variational auto-encoders (VAEs) are deep generative latent variable mod...

A Recurrent Variational Autoencoder for Speech Enhancement

This paper presents a generative approach to speech enhancement based on...

Speech Modeling with a Hierarchical Transformer Dynamical VAE

The dynamical variational autoencoders (DVAEs) are a family of latent-va...

A deep representation learning speech enhancement method using β-VAE

In previous work, we proposed a variational autoencoder-based (VAE) Baye...

Can We Trust Deep Speech Prior?

Recently, speech enhancement (SE) based on deep speech prior has attract...

Speech enhancement with variational autoencoders and alpha-stable distributions

This paper focuses on single-channel semi-supervised speech enhancement....