P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker Verification

05/24/2023
by   Xiyuan Wang, et al.
0

Typically, the Time-Delay Neural Network (TDNN) and Transformer can serve as a backbone for Speaker Verification (SV). Both of them have advantages and disadvantages from the perspective of global and local feature modeling. How to effectively integrate these two style features is still an open issue. In this paper, we explore a Parallel-coupled TDNN/Transformer Network (p-vectors) to replace the serial hybrid networks. The p-vectors allows TDNN and Transformer to learn the complementary information from each other through Soft Feature Alignment Interaction (SFAI) under the premise of preserving local and global features. Also, p-vectors uses the Spatial Frequency-channel Attention (SFA) to enhance the spatial interdependence modeling for input features. Finally, the outputs of dual branches of p-vectors are combined by Embedding Aggregation Layer (EAL). Experiments show that p-vectors outperforms MACCIF-TDNN and MFA-Conformer with relative improvements of 11.5 VoxCeleb1-O.

READ FULL TEXT

page 1

page 2

research
08/11/2020

S-vectors: Speaker Embeddings based on Transformer's Encoder for Text-Independent Speaker Verification

X-vectors have become the standard for speaker-embeddings in automatic s...
research
10/10/2021

Poformer: A simple pooling transformer for speaker verification

Most recent speaker verification systems are based on extracting speaker...
research
07/07/2021

MACCIF-TDNN: Multi aspect aggregation of channel and context interdependence features in TDNN-based speaker verification

Most of the recent state-of-the-art results for speaker verification are...
research
02/17/2023

Improving Transformer-based Networks With Locality For Automatic Speaker Verification

Recently, Transformer-based architectures have been explored for speaker...
research
09/15/2021

Hybrid Local-Global Transformer for Image Dehazing

Recently, the Vision Transformer (ViT) has shown impressive performance ...
research
03/20/2023

Dual-stream Time-Delay Neural Network with Dynamic Global Filter for Speaker Verification

The time-delay neural network (TDNN) is one of the state-of-the-art mode...
research
05/20/2023

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

In this paper, we propose ACA-Net, a lightweight, global context-aware s...

Please sign up or login with your details

Forgot password? Click here to reset