Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis

05/29/2023
by   Erik Ekstedt, et al.
0

Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues. Using the recently proposed Voice Activity Projection model, we propose an automatic evaluation approach to measure these aspects for conversational speech synthesis. We investigate the ability of three commercial, and two open-source, Text-To-Speech (TTS) systems ability to generate turn-taking cues over simulated turns. By varying the stimuli, or controlling the prosody, we analyze the models performances. We show that while commercial TTS largely provide appropriate cues, they often produce ambiguous signals, and that further improvements are possible. TTS, trained on read or spontaneous speech, produce strong turn-hold but weak turn-yield cues. We argue that this approach, that focus on functional aspects of interaction, provides a useful addition to other important speech metrics, such as intelligibility and naturalness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2023

What makes a good pause? Investigating the turn-holding effects of fillers

Filled pauses (or fillers), such as "uh" and "um", are frequent in spont...
research
08/29/2022

Turn-Taking Prediction for Natural Conversational Speech

While a streaming voice assistant system has been used in many applicati...
research
06/29/2018

Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs

For spoken dialog systems to conduct fluid conversational interactions w...
research
08/31/2018

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

In human conversational interactions, turn-taking exchanges can be coord...
research
09/13/2021

Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios

Capitalization and punctuation are important cues for comprehending writ...
research
05/19/2022

Voice Activity Projection: Self-supervised Learning of Turn-taking Events

The modeling of turn-taking in dialog can be viewed as the modeling of t...
research
09/15/2020

Pardon the Interruption: An Analysis of Gender and Turn-Taking in U.S. Supreme Court Oral Arguments

This study presents a corpus of turn changes between speakers in U.S. Su...

Please sign up or login with your details

Forgot password? Click here to reset