PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

02/24/2023
by   Junhyeok Lee, et al.
0

Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech. To address this issue, we propose PITS, an end-to-end pitch-controllable TTS model that utilizes variational inference to model pitch. Based on VITS, PITS incorporates the Yingram encoder, the Yingram decoder, and adversarial training of pitch-shifted synthesis to achieve pitch-controllability. Experiments demonstrate that PITS generates high-quality speech that is indistinguishable from ground truth speech and has high pitch-controllability without quality degradation. Code and audio samples will be available at https://github.com/anonymous-pits/pits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2022

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech

Some recent studies have demonstrated the feasibility of single-stage ne...
research
06/08/2023

VIFS: An End-to-End Variational Inference for Foley Sound Synthesis

The goal of DCASE 2023 Challenge Task 7 is to generate various sound cli...
research
07/08/2022

End-to-End Binaural Speech Synthesis

In this work, we present an end-to-end binaural speech synthesis system ...
research
10/28/2022

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis

Several fully end-to-end text-to-speech (TTS) models have been proposed ...
research
10/18/2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses

We propose a training method for spontaneous speech synthesis models tha...
research
08/07/2020

Controllable Neural Prosody Synthesis

Speech synthesis has recently seen significant improvements in fidelity,...
research
07/30/2018

Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis

Generating versatile and appropriate synthetic speech requires control o...

Please sign up or login with your details

Forgot password? Click here to reset