Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation

08/09/2023
by   Zirui Ge, et al.
0

The emergence of self-supervised representation (i.e., wav2vec 2.0) allows speaker-recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub-optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self-supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self-supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self-supervised representation.

READ FULL TEXT

page 3

page 6

research
10/28/2022

A comprehensive study on self-supervised distillation for speaker representation learning

In real application scenarios, it is often challenging to obtain a large...
research
04/19/2021

Self-supervised Representation Learning With Path Integral Clustering For Speaker Diarization

Automatic speaker diarization techniques typically involve a two-stage p...
research
10/15/2022

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

Self-supervised learning of speech representations from large amounts of...
research
04/08/2023

Unsupervised Speech Representation Pooling Using Vector Quantization

With the advent of general-purpose speech representations from large-sca...
research
02/11/2023

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

Auditory attention decoding (AAD) is a technique used to identify and am...
research
08/21/2023

Implicit Self-supervised Language Representation for Spoken Language Diarization

In a code-switched (CS) scenario, the use of spoken language diarization...
research
04/11/2022

How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision

Attention mechanism in graph neural networks is designed to assign large...

Please sign up or login with your details

Forgot password? Click here to reset