Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification

10/31/2022
by   Jingyu Li, et al.
0

Deep convolutional neural networks (CNNs) have been applied to extracting speaker embeddings with significant success in speaker verification. Incorporating the attention mechanism has shown to be effective in improving the model performance. This paper presents an efficient two-dimensional convolution-based attention module, namely C2D-Att. The interaction between the convolution channel and frequency is involved in the attention calculation by lightweight convolution layers. This requires only a small number of parameters. Fine-grained attention weights are produced to represent channel and frequency-specific information. The weights are imposed on the input features to improve the representation ability for speaker modeling. The C2D-Att is integrated into a modified version of ResNet for speaker embedding extraction. Experiments are conducted on VoxCeleb datasets. The results show that C2DAtt is effective in generating discriminative attention maps and outperforms other attention methods. The proposed model shows robust performance with different scales of model size and achieves state-of-the-art results.

READ FULL TEXT
research
07/10/2022

Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning

Recently, attention mechanisms have been applied successfully in neural ...
research
09/13/2021

Studying squeeze-and-excitation used in CNN for speaker verification

In speaker verification, the extraction of voice representations is main...
research
07/20/2022

Fine-grained Early Frequency Attention for Deep Speaker Recognition

Attention mechanisms have emerged as important tools that boost the perf...
research
03/01/2023

PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification

ECAPA-TDNN is currently the most popular TDNN-series model for speaker v...
research
04/03/2022

Selective Kernel Attention for Robust Speaker Verification

Recent state-of-the-art speaker verification architectures adopt multi-s...
research
10/13/2021

Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning

The use of channel-wise attention in CNN based speaker representation ne...
research
12/22/2020

FcaNet: Frequency Channel Attention Networks

Attention mechanism, especially channel attention, has gained great succ...

Please sign up or login with your details

Forgot password? Click here to reset