Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR

02/14/2020
by   Leda Sarı, et al.
0

We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR). The proposed model contains a memory block that holds speaker i-vectors extracted from the training data and reads relevant i-vectors from the memory through an attention mechanism. The resulting memory vector (M-vector) is concatenated to the acoustic features or to the hidden layer activations of an E2E neural network model. The E2E ASR system is based on the joint connectionist temporal classification and attention-based encoder-decoder architecture. M-vector and i-vector results are compared for inserting them at different layers of the encoder neural network using the WSJ and TED-LIUM2 ASR benchmarks. We show that M-vectors, which do not require an auxiliary speaker embedding extraction system at test time, achieve similar word error rates (WERs) compared to i-vectors for single speaker utterances and significantly lower WERs for utterances in which there are speaker changes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2018

Back-Translation-Style Data Augmentation for End-to-End ASR

In this paper we propose a novel data augmentation method for attention-...
research
10/12/2017

Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR

This thesis introduces the sequence to sequence model with Luong's atten...
research
08/15/2023

Improving CTC-AED model with integrated-CTC and auxiliary loss regularization

Connectionist temporal classification (CTC) and attention-based encoder ...
research
11/09/2019

Speaker Adaptation for Attention-Based End-to-End Speech Recognition

We propose three regularization-based speaker adaptation approaches to a...
research
03/15/2018

Advancing Connectionist Temporal Classification With Attention Modeling

In this study, we propose advancing all-neural speech recognition by dir...
research
12/02/2019

An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios

A speaker naming task, which finds and identifies the active speaker in ...
research
01/11/2023

Improving And Analyzing Neural Speaker Embeddings for ASR

Neural speaker embeddings encode the speaker's speech characteristics th...

Please sign up or login with your details

Forgot password? Click here to reset