Speaking Style Conversion With Discrete Self-Supervised Units

12/19/2022
by   Gallil Maimon, et al.
0

Voice Conversion (VC) is the task of making a spoken utterance by one speaker sound as if uttered by a different speaker, while keeping other aspects like content unchanged. Current VC methods, focus primarily on spectral features like timbre, while ignoring the unique speaking style of people which often impacts prosody. In this study, we introduce a method for converting not only the timbre, but also prosodic information (i.e., rhythm and pitch changes) to those of the target speaker. The proposed approach is based on a pretrained, self-supervised, model for encoding speech to discrete units, which make it simple, effective, and easy to optimise. We consider the many-to-many setting with no paired data. We introduce a suite of quantitative and qualitative evaluation metrics for this setup, and empirically demonstrate the proposed approach is significantly superior to the evaluated baselines. Code and samples can be found under https://pages.cs.huji.ac.il/adiyoss-lab/dissc/ .

READ FULL TEXT

page 1

page 8

research
11/03/2021

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

The goal of voice conversion is to transform source speech into a target...
research
11/12/2022

A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units

We present a unified system to realize one-shot voice conversion (VC) on...
research
09/06/2023

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

While many recent any-to-any voice conversion models succeed in transfer...
research
08/10/2023

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

Recent work has shown that it is possible to resynthesize high-quality s...
research
03/03/2023

WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions

Recognizing whispered speech and converting it to normal speech creates ...
research
11/14/2021

Textless Speech Emotion Conversion using Decomposed and Discrete Representations

Speech emotion conversion is the task of modifying the perceived emotion...

Please sign up or login with your details

Forgot password? Click here to reset