Global Rhythm Style Transfer Without Text Transcriptions

06/16/2021
by   Kaizhi Qian, et al.
6

Prosody plays an important role in characterizing the style of a speaker or an emotion, but most non-parallel voice or emotion style transfer algorithms do not convert any prosody information. Two major components of prosody are pitch and rhythm. Disentangling the prosody information, particularly the rhythm component, from the speech is challenging because it involves breaking the synchrony between the input speech and the disentangled speech representation. As a result, most existing prosody style transfer algorithms would need to rely on some form of text transcriptions to identify the content information, which confines their application to high-resource languages only. Recently, SpeechSplit has made sizeable progress towards unsupervised prosody style transfer, but it is unable to extract high-level global prosody style in an unsupervised manner. In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions. AutoPST is an Autoencoder-based Prosody Style Transfer framework with a thorough rhythm removal module guided by the self-expressive representation learning. Experiments on different style transfer tasks show that AutoPST can effectively convert prosody that correctly reflects the styles of the target domains.

READ FULL TEXT

page 3

page 4

page 5

page 7

page 9

page 17

research
05/15/2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis

Style transfer for out-of-domain (OOD) speech synthesis aims to generate...
research
04/23/2020

Unsupervised Speech Decomposition via Triple Information Bottleneck

Speech information can be roughly decomposed into four components: langu...
research
02/09/2021

Emotion Transfer Using Vector-Valued Infinite Task Learning

Style transfer is a significant problem of machine learning with numerou...
research
05/09/2023

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing

Automatic dubbing, which generates a corresponding version of the input ...
research
05/15/2020

Challenges in Emotion Style Transfer: An Exploration with a Lexical Substitution Pipeline

We propose the task of emotion style transfer, which is particularly cha...
research
08/15/2022

tile2tile: Learning Game Filters for Platformer Style Transfer

We present tile2tile, an approach for style transfer between levels of t...
research
02/24/2022

Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

This paper addresses the unsupervised learning of content-style decompos...

Please sign up or login with your details

Forgot password? Click here to reset