ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

05/20/2023
by   Jia Qi Yip, et al.
0

In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling. ACA is able to distill large, variable-length sequences into small, fixed-sized latents by attending a small query to large key and value matrices. In ACA-Net, we build a Multi-Layer Aggregation (MLA) block using ACA to generate fixed-sized identity vectors from variable-length inputs. Through global attention, ACA-Net acts as an efficient global feature extractor that adapts to temporal variability unlike existing SV models that apply a fixed function for pooling over the temporal dimension which may obscure information about the signal's non-stationary temporal variability. Our experiments on the WSJ0-1talker show ACA-Net outperforms a strong baseline by 5% relative improvement in EER using only 1/5 of the parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2020

Double Multi-Head Attention for Speaker Verification

Most state-of-the-art Deep Learning systems for speaker verification are...
research
01/28/2020

Masked cross self-attention encoding for deep speaker embedding

In general, speaker verification tasks require the extraction of speaker...
research
05/14/2020

ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

Current speaker verification techniques rely on a neural network to extr...
research
10/09/2021

Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification

With the development of deep learning, automatic speaker verification ha...
research
07/27/2020

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

One of the most important parts of an end-to-end speaker verification sy...
research
11/03/2022

Dynamic Kernels and Channel Attention with Multi-Layer Embedding Aggregation for Speaker Verification

State-of-the-art speaker verification frameworks have typically focused ...
research
05/24/2023

P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker Verification

Typically, the Time-Delay Neural Network (TDNN) and Transformer can serv...

Please sign up or login with your details

Forgot password? Click here to reset