Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs

06/29/2018
by   Matthew Roddy, et al.
0

For spoken dialog systems to conduct fluid conversational interactions with users, the systems must be sensitive to turn-taking cues produced by a user. Models should be designed so that effective decisions can be made as to when it is appropriate, or not, for the system to speak. Traditional end-of-turn models, where decisions are made at utterance end-points, are limited in their ability to model fast turn-switches and overlap. A more flexible approach is to model turn-taking in a continuous manner using RNNs, where the system predicts speech probability scores for discrete frames within a future window. The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection. In this paper, we investigate optimal speech-related feature sets for making predictions at pauses and overlaps in conversation. We find that while traditional acoustic features perform well, part-of-speech features generally perform worse than word features. We show that our current models outperform previously reported baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2018

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

In human conversational interactions, turn-taking exchanges can be coord...
research
05/03/2023

Response-conditioned Turn-taking Prediction

Previous approaches to turn-taking and response generation in conversati...
research
05/29/2023

Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis

Turn-taking is a fundamental aspect of human communication where speaker...
research
08/29/2022

Turn-Taking Prediction for Natural Conversational Speech

While a streaming voice assistant system has been used in many applicati...
research
05/03/2023

What makes a good pause? Investigating the turn-holding effects of fillers

Filled pauses (or fillers), such as "uh" and "um", are frequent in spont...
research
10/21/2020

TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog

Syntactic and pragmatic completeness is known to be important for turn-t...
research
10/13/2015

Improved Deep Learning Baselines for Ubuntu Corpus Dialogs

This paper presents results of our experiments for the next utterance ra...

Please sign up or login with your details

Forgot password? Click here to reset