Conversational Speech Recognition By Learning Conversation-level Characteristics

02/16/2022
by   Kun Wei, et al.
0

Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition

TV subtitles are a rich source of transcriptions of many types of speech...
research
06/13/2021

Cross-sentence Neural Language Models for Conversational Speech Recognition

An important research direction in automatic speech recognition (ASR) ha...
research
03/03/2021

The Spatial Selective Auditory Attention of Cochlear Implant Users in Different Conversational Sound Levels

In multi speakers environments, cochlear implant (CI) users may attend t...
research
03/28/2017

Learning Similarity Functions for Pronunciation Variations

A significant source of errors in Automatic Speech Recognition (ASR) sys...
research
06/26/2018

Contextual ASR Adaptation for Conversational Agents

Statistical language models (LM) play a key role in Automatic Speech Rec...
research
01/31/2019

Exploring the context of recurrent neural network based conversational agents

Conversational agents have begun to rise both in the academic (in terms ...
research
08/19/2023

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Automatic speech recognition (ASR) based on transducers is widely used. ...

Please sign up or login with your details

Forgot password? Click here to reset