Rhythm Modeling for Voice Conversion

07/12/2023
by   Benjamin van Niekerk, et al.
0

Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce Urhythmic-an unsupervised method for rhythm conversion that does not require parallel data or text transcriptions. Using self-supervised representations, we first divide source audio into segments approximating sonorants, obstruents, and silences. Then we model rhythm by estimating speaking rate or the duration distribution of each segment type. Finally, we match the target speaking rate or rhythm by time-stretching the speech segments. Experiments show that Urhythmic outperforms existing unsupervised methods in terms of quality and prosody. Code and checkpoints: https://github.com/bshall/urhythmic. Audio demo page: https://ubisoft-laforge.github.io/speech/urhythmic.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2023

Voice Conversion With Just Nearest Neighbors

Any-to-any voice conversion aims to transform source speech into a targe...
research
11/03/2021

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

The goal of voice conversion is to transform source speech into a target...
research
04/12/2018

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

We present the Voice Conversion Challenge 2018, designed as a follow up ...
research
05/09/2023

Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

Voice conversion (VC), as a voice style transfer technology, is becoming...
research
02/27/2022

Learning the Beauty in Songs: Neural Singing Voice Beautifier

We are interested in a novel task, singing voice beautifying (SVB). Give...
research
05/10/2022

Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts

Adapting one's voice to different ambient environments and social intera...
research
05/04/2023

Idiolect: A Reconfigurable Voice Coding Assistant

This paper presents Idiolect, an open source (https://github.com/OpenASR...

Please sign up or login with your details

Forgot password? Click here to reset