Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

12/23/2019
by   Mostafa Sadeghi, et al.
0

In this paper, we are interested in unsupervised speech enhancement using latent variable generative models. We propose to learn a generative model for clean speech spectrogram based on a variational autoencoder (VAE) where a mixture of audio and visual networks is used to infer the posterior of the latent variables. This is motivated by the fact that visual data, i.e., lips images of the speaker, provide helpful and complementary information about speech. As such, they can help train a richer inference network. Moreover, during speech enhancement, visual data are used to initialize the latent variables, thus providing a more robust initialization than the noisy speech spectrogram. A variational inference approach is derived to train the proposed VAE. Thanks to the novel inference procedure and the robust initialization, the proposed audio-visual mixture VAE exhibits superior performance on speech enhancement than using the standard audio-only counterpart.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2019

Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders

Recently, an audio-visual speech generative model based on variational a...
research
05/03/2019

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders

Recent studies have explored the use of deep generative models of speech...
research
06/23/2021

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

Dynamical variational auto-encoders (DVAEs) are a class of deep generati...
research
05/10/2021

Learning Robust Latent Representations for Controllable Speech Synthesis

State-of-the-art Variational Auto-Encoders (VAEs) for learning disentang...
research
09/25/2019

Disentangling Speech and Non-Speech Components for Building Robust Acoustic Models from Found Data

In order to build language technologies for majority of the languages, i...
research
05/16/2017

Learning Hard Alignments with Variational Inference

There has recently been significant interest in hard attention models fo...
research
05/11/2022

A deep representation learning speech enhancement method using β-VAE

In previous work, we proposed a variational autoencoder-based (VAE) Baye...

Please sign up or login with your details

Forgot password? Click here to reset