Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders

11/10/2019
by   Mostafa Sadeghi, et al.
0

Recently, an audio-visual speech generative model based on variational autoencoder (VAE) has been proposed, which is combined with a nonnegative matrix factorization (NMF) model for noise variance to perform unsupervised speech enhancement. When visual data is clean, speech enhancement with audio-visual VAE shows a better performance than with audio-only VAE, which is trained on audio-only data. However, audio-visual VAE is not robust against noisy visual data, e.g., when for some video frames, speaker face is not frontal or lips region is occluded. In this paper, we propose a robust unsupervised audio-visual speech enhancement method based on a per-frame VAE mixture model. This mixture model consists of a trained audio-only VAE and a trained audio-visual VAE. The motivation is to skip noisy visual frames by switching to the audio-only VAE model. We present a variational expectation-maximization method to estimate the parameters of the model. Experiments show the promising performance of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2019

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoder

Variational auto-encoders (VAEs) are deep generative latent variable mod...
research
12/23/2019

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

In this paper, we are interested in unsupervised speech enhancement usin...
research
02/12/2021

Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

Audio codecs based on discretized neural autoencoders have recently been...
research
03/27/2023

Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise

Human-robot interaction relies on a noise-robust audio processing module...
research
05/03/2019

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders

Recent studies have explored the use of deep generative models of speech...
research
02/08/2019

Speech enhancement with variational autoencoders and alpha-stable distributions

This paper focuses on single-channel semi-supervised speech enhancement....
research
09/21/2020

Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement

In this paper, we propose a visual embedding approach to improving embed...

Please sign up or login with your details

Forgot password? Click here to reset