Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis

03/02/2022
by   Pengyu Cheng, et al.
0

In this paper, we propose a method of speaker adaption with intuitive prosodic features for statistical parametric speech synthesis. The intuitive prosodic features employed in this method include pitch, pitch range, speech rate and energy considering that they are directly related with the overall prosodic characteristics of different speakers. The intuitive prosodic features are extracted at utterance-level or speaker-level, and are further integrated into the existing speaker-encoding-based and speaker-embedding-based adaptation frameworks respectively. The acoustic models are sequence-to-sequence ones based on Tacotron2. Intuitive prosodic features are concatenated with text encoder outputs and speaker vectors for decoding acoustic features.Experimental results have demonstrated that our proposed methods can achieve better objective and subjective performance than the baseline methods without intuitive prosodic features. Besides, the proposed speaker adaption method with utterance-level prosodic features has achieved the best similarity of synthetic speech among all compared methods.

READ FULL TEXT
research
06/18/2019

A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation

By representing speaker characteristic as a single fixed-length vector e...
research
06/21/2022

Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS

This paper proposes a human-in-the-loop speaker-adaptation method for mu...
research
03/20/2022

ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

In recent years, neural network based methods for multi-speaker text-to-...
research
11/13/2022

OverFlow: Putting flows on top of neural transducers for better TTS

Neural HMMs are a type of neural transducer recently proposed for sequen...
research
11/19/2021

Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis

This paper presents a method for controlling the prosody at the phoneme ...
research
03/29/2022

DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning

Most text-to-speech (TTS) methods use high-quality speech corpora record...
research
11/29/2022

Controllable speech synthesis by learning discrete phoneme-level prosodic representations

In this paper, we present a novel method for phoneme-level prosody contr...

Please sign up or login with your details

Forgot password? Click here to reset