Attention-based Residual Speech Portrait Model for Speech to Face Generation

07/09/2020
by   Jianrong Wang, et al.
15

Given a speaker's speech, it is interesting to see if it is possible to generate this speaker's face. One main challenge in this task is to alleviate the natural mismatch between face and speech. To this end, in this paper, we propose a novel Attention-based Residual Speech Portrait Model (AR-SPM) by introducing the ideal of the residual into a hybrid encoder-decoder architecture, where face prior features are merged with the output of speech encoder to form the final face feature. In particular, we innovatively establish a tri-item loss function, which is a weighted linear combination of the L2-norm, L1-norm and negative cosine loss, to train our model by comparing the final face feature and true face feature. Evaluation on AVSpeech dataset shows that our proposed model accelerates the convergence of training, outperforms the state-of-the-art in terms of quality of the generated face, and achieves superior recognition accuracy of gender and age compared with the ground truth.

READ FULL TEXT
research
04/01/2022

Residual-guided Personalized Speech Synthesis based on Face Image

Previous works derive personalized speech features by training the model...
research
05/18/2023

Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor

This paper proposes a novel Attention-based Encoder-Decoder network for ...
research
07/26/2021

Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

In this paper, we propose an effective method to synthesize speaker-spec...
research
02/27/2023

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

The goal of this work is zero-shot text-to-speech synthesis, with speaki...
research
09/13/2022

Just Noticeable Difference Modeling for Face Recognition System

High-quality face images are required to guarantee the stability and rel...
research
08/02/2019

Retrosynthesis with Attention-Based NMT Model and Chemical Analysis of the "Wrong" Predictions

We cast retrosynthesis as a machine translation problem by introducing a...
research
07/26/2023

Exploring the Interactions between Target Positive and Negative Information for Acoustic Echo Cancellation

Acoustic echo cancellation (AEC) aims to remove interference signals whi...

Please sign up or login with your details

Forgot password? Click here to reset