Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning

10/13/2021
by   Li Zhang, et al.
0

The use of channel-wise attention in CNN based speaker representation networks has achieved remarkable performance in speaker verification (SV). But these approaches do simple averaging on time and frequency feature maps before channel-wise attention learning and ignore the essential mutual interaction among temporal, channel as well as frequency scales. To address this problem, we propose the Duality Temporal-Channel-Frequency (DTCF) attention to re-calibrate the channel-wise features with aggregation of global context on temporal and frequency dimensions. Specifically, the duality attention - time-channel (T-C) attention as well as frequency-channel (F-C) attention - aims to focus on salient regions along the T-C and F-C feature maps that may have more considerable impact on the global context, leading to more discriminative speaker representations. We evaluate the effectiveness of the proposed DTCF attention on the CN-Celeb and VoxCeleb datasets. On the CN-Celeb evaluation set, the EER/minDCF of ResNet34-DTCF are reduced by 0.63 compared with those of ResNet34-SE. On VoxCeleb1-O, VoxCeleb1-E and VoxCeleb1-H evaluation sets, the EER/minDCF of ResNet34-DTCF achieve 0.36 0.39

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2022

Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning

Recently, attention mechanisms have been applied successfully in neural ...
research
09/02/2020

Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations

In this study, we propose the global context guided channel and time-fre...
research
08/04/2022

Data-driven Attention and Data-independent DCT based Global Context Modeling for Text-independent Speaker Recognition

Learning an effective speaker representation is crucial for achieving re...
research
10/31/2022

Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification

Deep convolutional neural networks (CNNs) have been applied to extractin...
research
09/13/2021

Studying squeeze-and-excitation used in CNN for speaker verification

In speaker verification, the extraction of voice representations is main...
research
03/01/2023

PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification

ECAPA-TDNN is currently the most popular TDNN-series model for speaker v...
research
07/05/2022

Backend Ensemble for Speaker Verification and Spoofing Countermeasure

This paper describes the NPU system submitted to Spoofing Aware Speaker ...

Please sign up or login with your details

Forgot password? Click here to reset