Multi-Scale Spectrogram Modelling for Neural Text-to-Speech

06/29/2021
by   Ammar Abbas, et al.
0

We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to synthesise speech with an improved coarse and fine-grained prosody. We present a generic multi-scale spectrogram prediction mechanism where the system first predicts coarser scale mel-spectrograms that capture the suprasegmental information in speech, and later uses these coarser scale mel-spectrograms to predict finer scale mel-spectrograms capturing fine-grained prosody. We present details for two specific versions of MSS called Word-level MSS and Sentence-level MSS where the scales in our system are motivated by the linguistic units. The Word-level MSS models word, phoneme, and frame-level spectrograms while Sentence-level MSS models sentence-level spectrogram in addition. Subjective evaluations show that Word-level MSS performs statistically significantly better compared to the baseline on two voices.

READ FULL TEXT

page 4

page 5

research
04/06/2022

Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

Previous works on expressive speech synthesis focus on modelling the mon...
research
09/13/2019

Neural Architectures for Fine-Grained Propaganda Detection in News

This paper describes our system (MIC-CIS) details and results of partici...
research
05/17/2022

Tackling Math Word Problems with Fine-to-Coarse Abstracting and Reasoning

Math Word Problems (MWP) is an important task that requires the ability ...
research
07/29/2023

MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

Expressive speech synthesis is crucial for many human-computer interacti...
research
05/03/2023

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis

Conversational text-to-speech (TTS) aims to synthesize speech with prope...
research
06/29/2022

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody

Generating expressive and contextually appropriate prosody remains a cha...
research
06/09/2016

Sentence Similarity Measures for Fine-Grained Estimation of Topical Relevance in Learner Essays

We investigate the task of assessing sentence-level prompt relevance in ...

Please sign up or login with your details

Forgot password? Click here to reset