Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics

10/29/2018
by   Fuming Fang, et al.
0

An audiovisual speaker conversion method is presented for simultaneously transforming the facial expressions and voice of a source speaker into those of a target speaker. Transforming the facial and acoustic features together makes it possible for the converted voice and facial expressions to be highly correlated and for the generated target speaker to appear and sound natural. It uses three neural networks: a conversion network that fuses and transforms the facial and acoustic features, a waveform generation network that produces the waveform from both the converted facial and acoustic features, and an image reconstruction network that outputs an RGB facial image also based on both the converted features. The results of experiments using an emotional audiovisual database showed that the proposed method achieved significantly higher naturalness compared with one that separately transformed acoustic and facial features.

READ FULL TEXT

page 2

page 4

page 7

page 8

page 9

page 10

page 11

research
08/05/2020

Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning

This paper presents an adversarial learning method for recognition-synth...
research
02/24/2023

Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion

Voice conversion (VC) techniques can be abused by malicious parties to t...
research
01/17/2023

ExpresSense: Exploring a Standalone Smartphone to Sense Engagement of Users from Facial Expressions Using Acoustic Sensing

Facial expressions have been considered a metric reflecting a person's e...
research
09/30/2021

Audio-Visual Evaluation of Oratory Skills

What makes a talk successful? Is it the content or the presentation? We ...
research
10/27/2022

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

Voice conversion (VC) can be achieved by first extracting source content...
research
10/27/2016

Voice Conversion using Convolutional Neural Networks

The human auditory system is able to distinguish the vocal source of tho...

Please sign up or login with your details

Forgot password? Click here to reset