Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

07/08/2019
by   Felix Weninger, et al.
0

Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the-art performances while having clear advantages in terms of simplicity. However, comparisons are mostly done on speaker independent (SI) ASR systems, though speaker adapted conventional systems are commonly used in practice for improving robustness to speaker and environment variations. In this paper, we apply speaker adaptation to seq2seq models with the goal of matching the performance of conventional ASR adaptation. Specifically, we investigate Kullback-Leibler divergence (KLD) as well as Linear Hidden Network (LHN) based adaptation for seq2seq ASR, using different amounts (up to 20 hours) of adaptation data per speaker. Our SI models are trained on large amounts of dictation data and achieve state-of-the-art results. We obtained 25 word error rate (WER) improvement with KLD adaptation of the seq2seq model vs. 18.7 show that the WER of the seq2seq model decreases log-linearly with the amount of adaptation data. Finally, we analyze adaptation based on the minimum WER criterion and adapting the language model (LM) for score fusion with the speaker adapted seq2seq model, which result in further improvements of the seq2seq system performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2022

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

A key challenge for automatic speech recognition (ASR) systems is to mod...
research
02/15/2023

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

Speaker adaptation techniques provide a powerful solution to customise a...
research
06/23/2022

Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

Fundamental modelling differences between hybrid and end-to-end (E2E) au...
research
05/28/2018

Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks

While there has been substantial amount of work in speaker diarization r...
research
06/26/2023

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

Rich sources of variability in natural speech present significant challe...
research
02/06/2017

DNN adaptation by automatic quality estimation of ASR hypotheses

In this paper we propose to exploit the automatic Quality Estimation (QE...
research
11/11/2022

Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting

In this paper, we present a novel approach to adapt a sequence-to-sequen...

Please sign up or login with your details

Forgot password? Click here to reset