Transport-Oriented Feature Aggregation for Speaker Embedding Learning

06/26/2022
by   Yusheng Tian, et al.
0

Pooling is needed to aggregate frame-level features into utterance-level representations for speaker modeling. Given the success of statistics-based pooling methods, we hypothesize that speaker characteristics are well represented in the statistical distribution over the pre-aggregation layer's output, and propose to use transport-oriented feature aggregation for deriving speaker embeddings. The aggregated representation encodes the geometric structure of the underlying feature distribution, which is expected to contain valuable speaker-specific information that may not be represented by the commonly used statistical measures like mean and variance. The original transport-oriented feature aggregation is also extended to a weighted-frame version to incorporate the attention mechanism. Experiments on speaker verification with the Voxceleb dataset show improvement over statistics pooling and its attentive variant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2018

Attentive Statistics Pooling for Deep Speaker Embedding

This paper proposes attentive statistics pooling for deep speaker embedd...
research
12/23/2021

Graph attentive feature aggregation for text-independent speaker verification

The objective of this paper is to combine multiple frame-level features ...
research
07/14/2021

Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding

This paper proposes a serialized multi-layer multi-head attention for ne...
research
08/13/2020

Cross attentive pooling for speaker verification

The goal of this paper is text-independent speaker verification where ut...
research
10/10/2021

Poformer: A simple pooling transformer for speaker verification

Most recent speaker verification systems are based on extracting speaker...
research
06/25/2021

Phoneme-aware and Channel-wise Attentive Learning for Text DependentSpeaker Verification

This paper proposes a multi-task learning network with phoneme-aware and...
research
07/27/2020

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

One of the most important parts of an end-to-end speaker verification sy...

Please sign up or login with your details

Forgot password? Click here to reset