Voice Conversion using Convolutional Neural Networks

10/27/2016
by   Shariq Mobin, et al.
0

The human auditory system is able to distinguish the vocal source of thousands of speakers, yet not much is known about what features the auditory system uses to do this. Fourier Transforms are capable of capturing the pitch and harmonic structure of the speaker but this alone proves insufficient at identifying speakers uniquely. The remaining structure, often referred to as timbre, is critical to identifying speakers but we understood little about it. In this paper we use recent advances in neural networks in order to manipulate the voice of one speaker into another by transforming not only the pitch of the speaker, but the timbre. We review generative models built with neural networks as well as architectures for creating neural networks that learn analogies. Our preliminary results converting voices from one speaker to another are encouraging.

READ FULL TEXT
research
04/30/2019

Many-to-Many Voice Conversion with Out-of-Dataset Speaker Support

We present a Cycle-GAN based many-to-many voice conversion method that c...
research
07/11/2021

Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder

Voice conversion is a challenging task which transforms the voice charac...
research
04/10/2019

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

Recently, voice conversion (VC) without parallel data has been successfu...
research
03/14/2018

Speaker Verification using Convolutional Neural Networks

In this paper, a novel Convolutional Neural Network architecture has bee...
research
08/15/2023

Anaphoric Structure Emerges Between Neural Networks

Pragmatics is core to natural language, enabling speakers to communicate...
research
12/05/2017

Multi-speaker Recognition in Cocktail Party Problem

This paper proposes an original statistical decision theory to accomplis...
research
10/29/2018

Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics

An audiovisual speaker conversion method is presented for simultaneously...

Please sign up or login with your details

Forgot password? Click here to reset