Improving Robustness In Speaker Identification Using A Two-Stage Attention Model

09/24/2019
by   Yanpei Shi, et al.
0

In this paper a novel framework to tackle speaker recognition using a two-stage attention model is proposed. In recent years, the use of deep neural networks, such as time delay neural network (TDNN), and attention model have boosted speaker recognition performance. However, it is still a challenging task to tackle speaker recognition in severe acoustic environments. To build a robust speaker recognition system against noise, we employ a two-stage attention model and combine it with a TDNN model. In this framework, the attention mechanism is used in two aspects: embedding space and temporal space. The embedding attention model built in embedding space is to highlight the importance of each embedding element by weighting them using self attention. The frame attention model built in temporal space aims to find which frames are significant for speaker recognition. To evaluate the effectiveness and robustness of our approach, we use the TIMIT dataset and test our approach in the condition of five kinds of noise and different signal-noise-ratios (SNRs). In comparison with three strong baselines, CNN, TDNN and TDNN+attention, the experimental results show that the use of our approach outperforms them in different conditions. The correct recognition rate obtained using our approach can still reach 49.1 white Noise and the SNR is 0dB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2020

Robust Speaker Recognition Using Speech Enhancement And Attention Model

In this paper, a novel architecture for speaker recognition is proposed ...
research
10/21/2020

Improving Audio Anomalies Recognition Using Temporal Convolutional Attention Network

Anomalous audio in speech recordings is often caused by speaker voice di...
research
09/25/2018

Attention Mechanism in Speaker Recognition: What Does It Learn in Deep Speaker Embedding?

This paper presents an experimental study on deep speaker embedding with...
research
01/28/2020

Masked cross self-attention encoding for deep speaker embedding

In general, speaker verification tasks require the extraction of speaker...
research
04/26/2018

On deep speaker embeddings for text-independent speaker recognition

We investigate deep neural network performance in the textindependent sp...
research
05/18/2023

Validation of an ECAPA-TDNN system for Forensic Automatic Speaker Recognition under case work conditions

Different variants of a Forensic Automatic Speaker Recognition (FASR) sy...
research
10/01/2019

Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks

We present an approach to tackle the speaker recognition problem using T...

Please sign up or login with your details

Forgot password? Click here to reset