Dynamic Kernels and Channel Attention with Multi-Layer Embedding Aggregation for Speaker Verification

11/03/2022
by   Anna Ollerenshaw, et al.
0

State-of-the-art speaker verification frameworks have typically focused on speech enhancement techniques with increasingly deeper (more layers) and wider (number of channels) models to improve their verification performance. Instead, this paper proposes an approach to increase the model resolution capability using attention-based dynamic kernels in a convolutional neural network to adapt the model parameters to be feature-conditioned. The attention weights on the kernels are further distilled by channel attention and multi-layer feature aggregation to learn global features from speech. This approach provides an efficient solution to improving representation capacity with lower data resources. This is due to the self-adaptation to inputs of the structures of the model parameters. The proposed dynamic convolutional model achieved 1.62% EER and 0.18 miniDCF on the VoxCeleb1 test set and has a 17% relative improvement compared to the ECAPA-TDNN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2020

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

One of the most important parts of an end-to-end speaker verification sy...
research
02/10/2022

Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge

This paper describes the Royalflush speaker diarization system submitted...
research
10/19/2021

Rep Works in Speaker Verification

Multi-branch convolutional neural network architecture has raised lots o...
research
02/14/2020

Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

This paper investigates a self-adaptation method for speech enhancement ...
research
01/15/2021

AMFFCN: Attentional Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement

Audio-visual speech enhancement system is regarded to be one of promisin...
research
10/19/2020

Attention-based scaling adaptation for target speech extraction

The target speech extraction has attracted widespread attention in recen...
research
05/20/2023

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

In this paper, we propose ACA-Net, a lightweight, global context-aware s...

Please sign up or login with your details

Forgot password? Click here to reset