Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

06/26/2023
by   Jiajun Deng, et al.
0

Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies. To model both speaker and environment level diversity, this paper proposes a novel Bayesian factorised speaker-environment adaptive training and test time adaptation approach for Conformer ASR models. Speaker and environment level characteristics are separately modeled using compact hidden output transforms, which are then linearly or hierarchically combined to represent any speaker-environment combination. Bayesian learning is further utilized to model the adaptation parameter uncertainty. Experiments on the 300-hr WHAM noise corrupted Switchboard data suggest that factorised adaptation consistently outperforms the baseline and speaker label only adapted Conformers by up to 3.1 shows the proposed method offers potential for rapid adaption to unseen speaker-environment conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2022

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

A key challenge for automatic speech recognition (ASR) systems is to mod...
research
02/15/2023

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

Speaker adaptation techniques provide a powerful solution to customise a...
research
05/18/2023

Use of Speech Impairment Severity for Dysarthric Speech Recognition

A key challenge in dysarthric speech recognition is the speaker-level di...
research
06/12/2023

Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation

In dysarthric speech recognition, data scarcity and the vast diversity b...
research
03/28/2022

On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition

Automatic recognition of dysarthric and elderly speech highly challengin...
research
04/02/2022

Speaker adaptation for Wav2vec2 based dysarthric ASR

Dysarthric speech recognition has posed major challenges due to lack of ...
research
07/08/2019

Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the...

Please sign up or login with your details

Forgot password? Click here to reset