Multi-Scale Aggregation Using Feature Pyramid Module for Text-Independent Speaker Verification

04/07/2020
by   Youngmoon Jung, et al.
0

Currently, the most widely used approach for speaker verification is the deep speaker embedding learning. In this approach, convolutional neural networks are mainly used as a frame-level feature extractor, and speaker embeddings are extracted from the last layer of the feature extractor. Multi-scale aggregation (MSA), which utilizes multi-scale features from different layers of the feature extractor, has recently been introduced into the approach and has shown improved performance for both short and long utterances. This paper improves the MSA by using a feature pyramid module, which enhances speaker-discriminative information of features at multiple layers via a top-down pathway and lateral connections. We extract speaker embeddings using the enhanced features that contain rich speaker information at different resolutions. Experiments on the VoxCeleb dataset show that the proposed module improves previous MSA methods with a smaller number of parameters, providing better performance than state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2020

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

Currently, the most widely used approach for speaker verification is the...
research
05/16/2021

X-Vectors with Multi-Scale Aggregation for Speaker Diarization

Speaker diarization is the process of labeling different speakers in a s...
research
05/26/2022

DT-SV: A Transformer-based Time-domain Approach for Speaker Verification

Speaker verification (SV) aims to determine whether the speaker's identi...
research
08/30/2021

RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification

The convolutional neural network (CNN) based approaches have shown great...
research
03/13/2018

Deep CNN based feature extractor for text-prompted speaker recognition

Deep learning is still not a very common tool in speaker verification fi...
research
07/06/2020

ResNeXt and Res2Net Structure for Speaker Verification

ResNet-based architecture has been widely adopted as the speaker embeddi...
research
11/05/2018

How to Improve Your Speaker Embeddings Extractor in Generic Toolkits

Recently, speaker embeddings extracted with deep neural networks became ...

Please sign up or login with your details

Forgot password? Click here to reset