PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification

03/01/2023
by   Zhenduo Zhao, et al.
0

ECAPA-TDNN is currently the most popular TDNN-series model for speaker verification, which refreshed the state-of-the-art(SOTA) performance of TDNN models. However, one-dimensional convolution has a global receptive field over the feature channel. It destroys the time-frequency relevance of the spectrogram. Besides, as ECAPA-TDNN only has five layers, a much shallower structure compared to ResNet restricts the capability to generate deep representations. To further improve ECAPA-TDNN, we propose a progressive channel fusion strategy that splits the spectrogram across the feature channel and gradually expands the receptive field through the network. Secondly, we enlarge the model by extending the depth and adding branches. Our proposed model achieves EER with 0.718 and minDCF(0.01) with 0.0858 on vox1o, relatively improved 16.1% and 19.5% compared with ECAPA-TDNN-large.

READ FULL TEXT
research
09/02/2020

Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations

In this study, we propose the global context guided channel and time-fre...
research
10/31/2022

Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification

Deep convolutional neural networks (CNNs) have been applied to extractin...
research
06/09/2023

An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention

We present an efficient speech separation neural network, ARFDCN, which ...
research
08/04/2022

Data-driven Attention and Data-independent DCT based Global Context Modeling for Text-independent Speaker Recognition

Learning an effective speaker representation is crucial for achieving re...
research
10/13/2021

Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning

The use of channel-wise attention in CNN based speaker representation ne...
research
09/13/2021

Studying squeeze-and-excitation used in CNN for speaker verification

In speaker verification, the extraction of voice representations is main...
research
06/05/2019

Progressive NAPSAC: sampling from gradually growing neighborhoods

We propose Progressive NAPSAC, P-NAPSAC in short, which merges the advan...

Please sign up or login with your details

Forgot password? Click here to reset