Unsupervised Speech Representation Pooling Using Vector Quantization

04/08/2023
by   Jeongkyun Park, et al.
0

With the advent of general-purpose speech representations from large-scale self-supervised models, applying a single model to multiple downstream tasks is becoming a de-facto approach. However, the pooling problem remains; the length of speech representations is inherently variable. The naive average pooling is often used, even though it ignores the characteristics of speech, such as differently lengthed phonemes. Hence, we design a novel pooling method to squash acoustically similar representations via vector quantization, which does not require additional training, unlike attention-based pooling. Further, we evaluate various unsupervised pooling methods on various self-supervised models. We gather diverse methods scattered around speech and text to evaluate on various tasks: keyword spotting, speaker identification, intent classification, and emotion recognition. Finally, we quantitatively and qualitatively analyze our method, comparing it with supervised pooling methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2022

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

Self-supervised learning of speech representations from large amounts of...
research
02/24/2023

Ensemble knowledge distillation of self-supervised speech models

Distilled self-supervised models have shown competitive performance and ...
research
10/03/2021

Multi-task Voice Activated Framework using Self-supervised Learning

Self-supervised learning methods such as wav2vec 2.0 have shown promisin...
research
05/09/2023

An Exploration into the Performance of Unsupervised Cross-Task Speech Representations for "In the Wild” Edge Applications

Unsupervised speech models are becoming ubiquitous in the speech and mac...
research
08/09/2023

Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation

The emergence of self-supervised representation (i.e., wav2vec 2.0) allo...
research
10/05/2022

Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora

Self-supervised speech models have grown fast during the past few years ...
research
02/03/2021

General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

This paper presents a self-supervised learning framework, named MGF, for...

Please sign up or login with your details

Forgot password? Click here to reset