Cross-Attention End-to-End ASR for Two-Party Conversations

07/24/2019
by   Suyoun Kim, et al.
0

We present an end-to-end speech recognition model that learns interaction between two speakers based on the turn-changing information. Unlike conventional speech recognition models, our model exploits two speakers' history of conversational-context information that spans across multiple turns within an end-to-end framework. Specifically, we propose a speaker-specific cross-attention mechanism that can look at the output of the other speaker side as well as the one of the current speaker for better at recognizing long conversations. We evaluated the models on the Switchboard conversational speech corpus and show that our model outperforms standard end-to-end speech recognition models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2019

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion

We present a novel conversational-context aware end-to-end speech recogn...
research
04/24/2022

Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization

This paper investigates a method for simulating natural conversation in ...
research
10/30/2020

Comparison of Speaker Role Recognition and Speaker Enrollment Protocol for conversational Clinical Interviews

Conversations between a clinician and a patient, in natural conditions, ...
research
08/20/2020

Dyadic Speech-based Affect Recognition using DAMI-P2C Parent-child Multimodal Interaction Dataset

Automatic speech-based affect recognition of individuals in dyadic conve...
research
02/06/2019

End-to-end Anchored Speech Recognition

Voice-controlled house-hold devices, like Amazon Echo or Google Home, fa...
research
05/29/2023

An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings

We performed an experimental review of current diarization systems for t...
research
08/04/2018

Triplet Network with Attention for Speaker Diarization

In automatic speech processing systems, speaker diarization is a crucial...

Please sign up or login with your details

Forgot password? Click here to reset