Self-Supervised Speech Representation Learning: A Review

05/21/2022
by   Abdelrahman Mohamed, et al.
0

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Such methods have shown success in natural language processing and computer vision domains, achieving new levels of performance while reducing the number of labels required for many downstream scenarios. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. Other approaches rely on multi-modal data for pre-training, mixing text or visual data streams with speech. Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years. This review presents approaches for self-supervised speech representation learning and their connection to other research areas. Since many current methods focus solely on automatic speech recognition as a downstream task, we review recent efforts on benchmarking learned representations to extend the application beyond speech recognition.

READ FULL TEXT
research
10/14/2021

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Representation learning from unlabeled data has been of major interest i...
research
03/01/2022

A Brief Overview of Unsupervised Neural Speech Representation Learning

Unsupervised representation learning for speech processing has matured g...
research
07/01/2021

Pretext Tasks selection for multitask self-supervised speech representation learning

Through solving pretext tasks, self-supervised learning leverages unlabe...
research
04/21/2022

PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map

Deep learning has recently achieved significant progress in trajectory f...
research
12/27/2021

Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech

Learning to understand grounded language, which connects natural languag...
research
05/03/2021

SUPERB: Speech processing Universal PERformance Benchmark

Self-supervised learning (SSL) has proven vital for advancing research i...
research
07/20/2023

MASR: Metadata Aware Speech Representation

In the recent years, speech representation learning is constructed prima...

Please sign up or login with your details

Forgot password? Click here to reset