Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks

03/25/2019
by   Amanda Duarte, et al.
18

Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or one-hot encoding). Our model is trained in a self-supervised approach by exploiting the audio and visual signals naturally aligned in videos. With the purpose of training from video data, we present a novel dataset collected for this work, with high-quality videos of youtubers with notable expressiveness in both the speech and visual signals.

READ FULL TEXT
research
04/13/2020

From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech

This work seeks the possibility of generating the human face from voice ...
research
07/26/2021

Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

In this paper, we propose an effective method to synthesize speaker-spec...
research
06/14/2019

Video-Driven Speech Reconstruction using Generative Adversarial Networks

Speech is a means of communication which relies on both audio and visual...
research
12/06/2018

Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder

Neural networks based vocoders, typically the WaveNet, have achieved spe...
research
07/18/2022

Audio Input Generates Continuous Frames to Synthesize Facial Video Using Generative Adiversarial Networks

This paper presents a simple method for speech videos generation based o...
research
04/07/2020

Direct Speech-to-image Translation

Direct speech-to-image translation without text is an interesting and us...
research
03/22/2016

Input Aggregated Network for Face Video Representation

Recently, deep neural network has shown promising performance in face im...

Please sign up or login with your details

Forgot password? Click here to reset