Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch

04/12/2022
by   Hanbin Bae, et al.
0

The recently developed pitch-controllable text-to-speech (TTS) model, i.e. FastPitch, was conditioned for the pitch contours. However, the quality of the synthesized speech degraded considerably for pitch values that deviated significantly from the average pitch; i.e. the ability to control pitch was limited. To address this issue, we propose two algorithms to improve the robustness of FastPitch. First, we propose a novel timbre-preserving pitch-shifting algorithm for natural pitch augmentation. Pitch-shifted speech samples sound more natural when using the proposed algorithm because the speaker's vocal timbre is maintained. Moreover, we propose a training algorithm that defines FastPitch using pitch-augmented speech datasets with different pitch ranges for the same sentence. The experimental results demonstrate that the proposed algorithms improve the pitch controllability of FastPitch.

READ FULL TEXT

page 2

page 3

research
06/05/2023

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Regressive Text-to-Speech (TTS) system utilizes attention mechanism to g...
research
04/06/2019

Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data

This paper introduces Taco-VC, a novel architecture for voice conversion...
research
11/03/2020

Training Wake Word Detection with Synthesized Speech Data on Confusion Words

Confusing-words are commonly encountered in real-life keyword spotting a...
research
11/14/2022

The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement

With the advances in deep learning, speech enhancement systems benefited...
research
11/06/2022

Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

Speech data on the Internet are proliferating exponentially because of t...
research
05/30/2023

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

This paper introduces a new speech dataset called “LibriTTS-R” designed ...
research
04/27/2015

Detection and Recognition of Malaysian Special License Plate Based On SIFT Features

Automated car license plate recognition systems are developed and applie...

Please sign up or login with your details

Forgot password? Click here to reset