Voice Impersonation using Generative Adversarial Networks

02/19/2018
by   Yang Gao, et al.
0

Voice impersonation is not the same as voice transformation, although the latter is an essential element of it. In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker. In this paper, we propose a novel neural network based speech quality- and style- mimicry framework for the synthesis of impersonated voices. The framework is built upon a fast and accurate generative adversarial network model. Given spectrographic representations of source and target speakers' voices, the model learns to mimic the target speaker's voice quality and style, regardless of the linguistic content of either's voice, generating a synthetic spectrogram from which the time domain signal is reconstructed using the Griffin-Lim method. In effect, this model reframes the well-known problem of style-transfer for images as the problem of style-transfer for speech signals, while intrinsically addressing the problem of durational variability of speech sounds. Experiments demonstrate that the model can generate extremely convincing samples of impersonated speech. It is even able to impersonate voices across different genders effectively. Results are qualitatively evaluated using standard procedures for evaluating synthesized voices.

READ FULL TEXT
research
10/05/2021

Voice Aging with Audio-Visual Style Transfer

Face aging techniques have used generative adversarial networks (GANs) a...
research
12/22/2017

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Inspired by recent work on neural network image generation which rely on...
research
03/28/2019

Adversarial Approximate Inference for Speech to Electroglottograph Conversion

Speech produced by human vocal apparatus conveys substantial non-semanti...
research
08/26/2022

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

In this paper, we propose a model to perform style transfer of speech to...
research
09/23/2021

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

One-shot voice cloning aims to transform speaker voice and speaking styl...
research
09/15/2023

Speech-dependent Modeling of Own Voice Transfer Characteristics for In-ear Microphones in Hearables

Many hearables contain an in-ear microphone, which may be used to captur...
research
03/26/2019

WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

We present a deep neural network based singing voice synthesizer, inspir...

Please sign up or login with your details

Forgot password? Click here to reset