Improving End-of-turn Detection in Spoken Dialogues by Detecting Speaker Intentions as a Secondary Task

05/09/2018
by   Zakaria Aldeneh, et al.
0

This work focuses on the use of acoustic cues for modeling turn-taking in dyadic spoken dialogues. Previous work has shown that speaker intentions (e.g., asking a question, uttering a backchannel, etc.) can influence turn-taking behavior and are good predictors of turn-transitions in spoken dialogues. However, speaker intentions are not readily available for use by automated systems at run-time; making it difficult to use this information to anticipate a turn-transition. To this end, we propose a multi-task neural approach for predicting turn- transitions and speaker intentions simultaneously. Our results show that adding the auxiliary task of speaker intention prediction improves the performance of turn-transition prediction in spoken dialogues, without relying on additional input features during run-time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2023

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

While standard speaker diarization attempts to answer the question "who ...
research
05/22/2023

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

Speaker diarization(SD) is a classic task in speech processing and is cr...
research
10/21/2020

TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog

Syntactic and pragmatic completeness is known to be important for turn-t...
research
05/03/2023

What makes a good pause? Investigating the turn-holding effects of fillers

Filled pauses (or fillers), such as "uh" and "um", are frequent in spont...
research
05/19/2023

MultiTurnCleanup: A Benchmark for Multi-Turn Spoken Conversational Transcript Cleanup

Current disfluency detection models focus on individual utterances each ...
research
02/10/2023

Spoken language change detection inspired by speaker change detection

Spoken language change detection (LCD) refers to identifying the languag...
research
08/31/2018

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

In human conversational interactions, turn-taking exchanges can be coord...

Please sign up or login with your details

Forgot password? Click here to reset