Exploring emotional prototypes in a high dimensional TTS latent space

05/05/2021
by   Pol van Rijn, et al.
18

Recent TTS systems are able to generate prosodically varied and realistic speech. However, it is unclear how this prosodic variation contributes to the perception of speakers' emotional states. Here we use the recent psychological paradigm 'Gibbs Sampling with People' to search the prosodic latent space in a trained GST Tacotron model to explore prototypes of emotional prosody. Participants are recruited online and collectively manipulate the latent space of the generative speech model in a sequentially adaptive way so that the stimulus presented to one group of participants is determined by the response of the previous groups. We demonstrate that (1) particular regions of the model's latent space are reliably associated with particular emotions, (2) the resulting emotional prototypes are well-recognized by a separate group of human raters, and (3) these emotional prototypes can be effectively transferred to new sentences. Collectively, these experiments demonstrate a novel approach to the understanding of emotional speech by providing a tool to explore the relation between the latent space of generative models and human semantics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2022

Continuous Emotional Intensity Controllable Speech Synthesis using Semi-supervised Learning

With the rapid development of the speech synthesis system, recent text-t...
research
03/29/2022

VoiceMe: Personalized voice generation in TTS

Novel text-to-speech systems can generate entirely new voices that were ...
research
12/03/2018

Exploring galaxy evolution with generative models

Context. Generative models open up the possibility to interrogate scient...
research
11/04/2021

Generating Diverse Realistic Laughter for Interactive Art

We propose an interactive art project to make those rendered invisible b...
research
09/30/2019

Imagine That! Leveraging Emergent Affordances for Tool Synthesis in Reaching Tasks

In this paper we investigate an artificial agent's ability to perform ta...
research
11/23/2021

A Contextual Latent Space Model: Subsequence Modulation in Melodic Sequence

Some generative models for sequences such as music and text allow us to ...
research
08/17/2023

A Novel Loss Function Utilizing Wasserstein Distance to Reduce Subject-Dependent Noise for Generalizable Models in Affective Computing

Emotions are an essential part of human behavior that can impact thinkin...

Please sign up or login with your details

Forgot password? Click here to reset