Controllable Neural Prosody Synthesis

08/07/2020
by   Max Morrison, et al.
0

Speech synthesis has recently seen significant improvements in fidelity, driven by the advent of neural vocoders and neural prosody generators. However, these systems lack intuitive user controls over prosody, making them unable to rectify prosody errors (e.g., misplaced emphases and contextually inappropriate emotions) or generate prosodies with diverse speaker excitement levels and emotions. We address these limitations with a user-controllable, context-aware neural prosody generator. Given a real or synthesized speech recording, our model allows a user to input prosody constraints for certain time frames and generates the remaining time frames from input text and contextual prosody. We also propose a pitch-shifting neural vocoder to modify input speech to match the synthesized prosody. Through objective and subjective evaluations we show that we can successfully incorporate user control into our prosody generation model without sacrificing the overall naturalness of the synthesized speech.

READ FULL TEXT
research
06/05/2023

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Regressive Text-to-Speech (TTS) system utilizes attention mechanism to g...
research
04/13/2023

Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis

Recent advances in text-to-speech have significantly improved the expres...
research
11/21/2019

Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features

This paper presents a simple yet effective method to achieve prosody tra...
research
02/24/2023

PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

Previous pitch-controllable text-to-speech (TTS) models rely on directly...
research
05/06/2021

Learning Controllable Content Generators

It has recently been shown that reinforcement learning can be used to tr...
research
07/11/2006

Interactive Hatching and Stippling by Example

We describe a system that lets a designer interactively draw patterns of...
research
08/13/2020

Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

Recent neural speech synthesis systems have gradually focused on the con...

Please sign up or login with your details

Forgot password? Click here to reset