Investigation of Speaker-adaptation methods in Transformer based ASR

08/07/2020
by   Vishwas M. Shetty, et al.
0

End-to-end models are fast replacing conventional hybrid models in automatic speech recognition. A transformer is a sequence-to-sequence framework solely based on attention, that was initially applied to machine translation task. This end-to-end framework has been shown to give promising results when used for automatic speech recognition as well. In this paper, we explore different ways of incorporating speaker information while training a transformer-based model to improve its performance. We present speaker information in the form of speaker embeddings for each of the speakers. Two broad categories of speaker embeddings are used: (i)fixed embeddings, and (ii)learned embeddings. We experiment using speaker embeddings learned along with the model training, as well as one-hot vectors and x-vectors. Using these different speaker embeddings, we obtain an average relative improvement of 1 error rate. We report results on the NPTEL lecture database. NPTEL is an open-source e-learning portal providing content from top Indian universities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2019

End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning

This paper presents our latest investigation on end-to-end automatic spe...
research
01/02/2020

Speaker-aware speech-transformer

Recently, end-to-end (E2E) models become a competitive alternative to th...
research
11/16/2022

Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

We analyze the impact of speaker adaptation in end-to-end architectures ...
research
04/05/2021

End-to-End Speaker-Attributed ASR with Transformer

This paper presents our recent effort on end-to-end speaker-attributed a...
research
10/08/2020

Gender domain adaptation for automatic speech recognition task

This paper is focused on the finetuning of acoustic models for speaker a...
research
06/06/2023

Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering

The challenge of fairness arises when Automatic Speech Recognition (ASR)...
research
06/14/2019

Cumulative Adaptation for BLSTM Acoustic Models

This paper addresses the robust speech recognition problem as an adaptat...

Please sign up or login with your details

Forgot password? Click here to reset