Phoneme-aware and Channel-wise Attentive Learning for Text DependentSpeaker Verification

06/25/2021
by   Yan Liu, et al.
0

This paper proposes a multi-task learning network with phoneme-aware and channel-wise attentive learning strategies for text-dependent Speaker Verification (SV). In the proposed structure, the frame-level multi-task learning along with the segment-level adversarial learning is adopted for speaker embedding extraction. The phoneme-aware attentive pooling is exploited on frame-level features in the main network for speaker classifier, with the corresponding posterior probability for the phoneme distribution in the auxiliary subnet. Further, the introduction of Squeeze and Excitation (SE-block) performs dynamic channel-wise feature recalibration, which improves the representational ability. The proposed method exploits speaker idiosyncrasies associated with pass-phrases, and is further improved by the phoneme-aware attentive pooling and SE-block from temporal and channel-wise aspects, respectively. The experiments conducted on RSR2015 Part 1 database confirm that the proposed system achieves outstanding results for textdependent SV.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2018

Attentive Statistics Pooling for Deep Speaker Embedding

This paper proposes attentive statistics pooling for deep speaker embedd...
research
08/13/2020

Cross attentive pooling for speaker verification

The goal of this paper is text-independent speaker verification where ut...
research
07/07/2021

MACCIF-TDNN: Multi aspect aggregation of channel and context interdependence features in TDNN-based speaker verification

Most of the recent state-of-the-art results for speaker verification are...
research
12/23/2021

Graph attentive feature aggregation for text-independent speaker verification

The objective of this paper is to combine multiple frame-level features ...
research
06/26/2022

Transport-Oriented Feature Aggregation for Speaker Embedding Learning

Pooling is needed to aggregate frame-level features into utterance-level...
research
01/14/2020

An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales

This paper presents an improved deep embedding learning method based on ...
research
04/06/2021

Speaker embeddings by modeling channel-wise correlations

Speaker embeddings extracted with deep 2D convolutional neural networks ...

Please sign up or login with your details

Forgot password? Click here to reset