Cross-modal Face- and Voice-style Transfer

02/27/2023
by   Naoya Takahashi, et al.
2

Image-to-image translation and voice conversion enable the generation of a new facial image and voice while maintaining some of the semantics such as a pose in an image and linguistic content in audio, respectively. They can aid in the content-creation process in many applications. However, as they are limited to the conversion within each modality, matching the impression of the generated face and voice remains an open question. We propose a cross-modal style transfer framework called XFaVoT that jointly learns four tasks: image translation and voice conversion tasks with audio or image guidance, which enables the generation of “face that matches given voice" and “voice that matches given face", and intra-modality translation tasks with a single framework. Experimental results on multiple datasets show that XFaVoT achieves cross-modal style translation of image and voice, outperforming baselines in terms of quality, diversity, and face-voice correspondence.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

page 10

page 11

research
10/08/2019

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

Traditional voice conversion methods rely on parallel recordings of mult...
research
11/21/2019

Voice-Face Cross-modal Matching and Retrieval: A Benchmark

Cross-modal associations between voice and face from a person can be lea...
research
04/01/2018

Seeing Voices and Hearing Faces: Cross-modal biometric matching

We introduce a seemingly impossible task: given only an audio clip of so...
research
10/05/2021

Voice Aging with Audio-Visual Style Transfer

Face aging techniques have used generative adversarial networks (GANs) a...
research
08/19/2020

Slide-free MUSE Microscopy to H E Histology Modality Conversion via Unpaired Image-to-Image Translation GAN Models

MUSE is a novel slide-free imaging technique for histological examinatio...
research
04/21/2021

Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices

This work focuses on the analysis that whether 3D face models can be lea...
research
05/08/2023

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

The speech-to-singing (STS) voice conversion task aims to generate singi...

Please sign up or login with your details

Forgot password? Click here to reset