S-vectors: Speaker Embeddings based on Transformer's Encoder for Text-Independent Speaker Verification

08/11/2020
by   Metilda Sagaya Mary N J, et al.
0

X-vectors have become the standard for speaker-embeddings in automatic speaker verification. X-vectors are obtained using a Time-delay Neural Network (TDNN) with context over several frames. We have explored the use of an architecture built on self-attention which attends to all the features over the entire utterance, and hence better capture speaker-level characteristics. We have used the encoder structure of Transformers, which is built on self-attention, as the base architecture and trained it to do a speaker classification task. In this paper, we have proposed to derive speaker embeddings from the output of the trained Transformer encoder structure after appropriate statistics pooling to obtain utterance level features. We have named the speaker embeddings from this structure as s-vectors. s-vectors outperform x-vectors with a relative improvement of 10 trained on Voxceleb-1 only and Voxceleb-1+2 datasets. We have also investigated the effect of deriving s-vectors from different layers of the model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2021

Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding

This paper proposes a serialized multi-layer multi-head attention for ne...
research
05/26/2022

DT-SV: A Transformer-based Time-domain Approach for Speaker Verification

Speaker verification (SV) aims to determine whether the speaker's identi...
research
09/13/2019

Probing the Information Encoded in x-vectors

Deep neural network based speaker embeddings, such as x-vectors, have be...
research
05/24/2023

P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker Verification

Typically, the Time-Delay Neural Network (TDNN) and Transformer can serv...
research
02/17/2023

Improving Transformer-based Networks With Locality For Automatic Speaker Verification

Recently, Transformer-based architectures have been explored for speaker...
research
07/07/2021

MACCIF-TDNN: Multi aspect aggregation of channel and context interdependence features in TDNN-based speaker verification

Most of the recent state-of-the-art results for speaker verification are...
research
10/29/2020

T-vectors: Weakly Supervised Speaker Identification Using Hierarchical Transformer Model

Identifying multiple speakers without knowing where a speaker's voice is...

Please sign up or login with your details

Forgot password? Click here to reset