Speaker verification using attentive multi-scale convolutional recurrent network

06/01/2023
by   Yanxiong Li, et al.
0

In this paper, we propose a speaker verification method by an Attentive Multi-scale Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local spatial information and global sequential information from the input speech recordings. In the proposed method, logarithm Mel spectrum is extracted from each speech recording and then fed to the proposed AMCRN for learning speaker embedding. Afterwards, the learned speaker embedding is fed to the back-end classifier (such as cosine similarity metric) for scoring in the testing stage. The proposed method is compared with state-of-the-art methods for speaker verification. Experimental data are three public datasets that are selected from two large-scale speech corpora (VoxCeleb1 and VoxCeleb2). Experimental results show that our method exceeds baseline methods in terms of equal error rate and minimal detection cost function, and has advantages over most of baseline methods in terms of computational complexity and memory requirement. In addition, our method generalizes well across truncated speech segments with different durations, and the speaker embedding learned by the proposed AMCRN has stronger generalization ability across two back-end classifiers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2021

Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech

Speaker verification has been studied mostly under the single-talker con...
research
10/24/2020

Raw-x-vector: Multi-scale Time Domain Speaker Embedding Network

State-of-the-art text-independent speaker verification systems typically...
research
11/01/2018

Deep Segment Attentive Embedding for Duration Robust Speaker Verification

LSTM-based speaker verification usually uses a fixed-length local segmen...
research
07/23/2019

LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

More and more neural network approaches have achieved considerable impro...
research
04/24/2022

Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

Although few-shot learning has attracted much attention from the fields ...
research
11/30/2020

Look who's not talking

The objective of this work is speaker diarisation of speech recordings '...
research
03/10/2022

Parameter-Free Attentive Scoring for Speaker Verification

This paper presents a novel study of parameter-free attentive scoring fo...

Please sign up or login with your details

Forgot password? Click here to reset