Exploration on HuBERT with Multiple Resolutions

06/01/2023
by   Jiatong Shi, et al.
0

Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in speech processing. However, we argue that its fixed 20ms resolution for hidden representations would not be optimal for various speech-processing tasks since their attributes (e.g., speaker characteristics and semantics) are based on different time scales. To address this limitation, we propose utilizing HuBERT representations at multiple resolutions for downstream tasks. We explore two approaches, namely the parallel and hierarchical approaches, for integrating HuBERT features with different resolutions. Through experiments, we demonstrate that HuBERT with multiple resolutions outperforms the original model. This highlights the potential of utilizing multiple resolutions in SSL models like HuBERT to capture diverse information from speech signals.

READ FULL TEXT
research
05/18/2020

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation

For self-supervised speech processing, it is crucial to use pretrained m...
research
12/20/2022

Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models

Self-supervised learning (SSL) has achieved great success in various are...
research
02/19/2016

Uniresolution representations of white-matter data from CoCoMac

Tracing data as collated by CoCoMac, a seminal neuroinformatics database...
research
09/07/2023

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

The choice of the objective function is crucial in emerging high-quality...
research
02/24/2023

Phone and speaker spatial organization in self-supervised speech representations

Self-supervised representations of speech are currently being widely use...
research
02/03/2021

General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

This paper presents a self-supervised learning framework, named MGF, for...
research
04/06/2019

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Learning good representations without supervision is still an open issue...

Please sign up or login with your details

Forgot password? Click here to reset