Audio-Visual Speech Inpainting with Deep Learning

10/09/2020
by   Giovanni Morrone, et al.
0

In this paper, we present a deep-learning-based framework for audio-visual speech inpainting, i.e., the task of restoring the missing parts of an acoustic speech signal from reliable audio context and uncorrupted visual information. Recent work focuses solely on audio-only methods and generally aims at inpainting music signals, which show highly different structure than speech. Instead, we inpaint speech signals with gaps ranging from 100 ms to 1600 ms to investigate the contribution that vision can provide for gaps of different duration. We also experiment with a multi-task learning approach where a phone recognition task is learned together with speech inpainting. Results show that the performance of audio-only speech inpainting approaches degrades rapidly when gaps get large, while the proposed audio-visual approach is able to plausibly restore missing information. In addition, we show that multi-task learning is effective, although the largest contribution to performance comes from vision.

READ FULL TEXT
research
06/01/2023

Speech inpainting: Context-based speech synthesis guided by video

Audio and visual modalities are inherently connected in speech signals: ...
research
05/11/2020

GACELA – A generative adversarial context encoder for long audio inpainting

We introduce GACELA, a generative adversarial network (GAN) designed to ...
research
05/24/2023

Diffusion-Based Audio Inpainting

Audio inpainting aims to reconstruct missing segments in corrupted recor...
research
10/24/2019

Vision-Infused Deep Audio Inpainting

Multi-modality perception is essential to develop interactive intelligen...
research
11/15/2019

Deep Long Audio Inpainting

Long (> 200 ms) audio inpainting, to recover a long missing part in an a...
research
10/29/2018

A context encoder for audio inpainting

We studied the ability of deep neural networks (DNNs) to restore missing...
research
04/23/2017

Learning weakly supervised multimodal phoneme embeddings

Recent works have explored deep architectures for learning multimodal sp...

Please sign up or login with your details

Forgot password? Click here to reset