PoeticTTS – Controllable Poetry Reading for Literary Studies

07/11/2022
by   Julia Koch, et al.
0

Speech synthesis for poetry is challenging due to specific intonation patterns inherent to poetic speech. In this work, we propose an approach to synthesise poems with almost human like naturalness in order to enable literary scholars to systematically examine hypotheses on the interplay between text, spoken realisation, and the listener's perception of poems. To meet these special requirements for literary studies, we resynthesise poems by cloning prosodic values from a human reference recitation, and afterwards make use of fine-grained prosody control to manipulate the synthetic speech in a human-in-the-loop setting to alter the recitation w.r.t. specific phenomena. We find that finetuning our TTS model on poetry captures poetic intonation patterns to a large extent which is beneficial for prosody cloning and manipulation and verify the success of our approach both in an objective evaluation as well as in human studies.

READ FULL TEXT

page 2

page 4

research
11/22/2019

DLGAN: Disentangling Label-Specific Fine-Grained Features for Image Manipulation

Several recent studies have shown how disentangling images into content ...
research
02/15/2022

Unsupervised word-level prosody tagging for controllable speech synthesis

Although word-level prosody modeling in neural text-to-speech (TTS) has ...
research
11/08/2020

Fine-grained style modelling and transfer in text-to-speech synthesis via content-style disentanglement

This paper presents a novel neural model for fine-grained style modeling...
research
08/28/2023

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

Recently, there has been a growing interest in the field of controllable...
research
02/22/2022

Wavebender GAN: An architecture for phonetically meaningful speech manipulation

Deep learning has revolutionised synthetic speech quality. However, it h...
research
05/27/2021

Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling

Generating natural speech with diverse and smooth prosody pattern is a c...

Please sign up or login with your details

Forgot password? Click here to reset