Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

04/06/2019
by   Santiago Pascual, et al.
0

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some recent works, however, have shown that it is possible to derive useful speech representations by employing a self-supervised encoder-discriminator approach. This paper proposes an improved self-supervised method, where a single neural encoder is followed by multiple workers that jointly solve different self-supervised tasks. The needed consensus across different tasks naturally imposes meaningful constraints to the encoder, contributing to discover general representations and to minimize the risk of learning superficial ones. Experiments show that the proposed approach can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues. In addition, a number of design choices make the encoder easily exportable, facilitating its direct usage or adaptation to different problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2020

Multi-task self-supervised learning for Robust Speech Recognition

Despite the growing interest in unsupervised learning, extracting meanin...
research
08/02/2023

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

While FastSpeech2 aims to integrate aspects of speech such as pitch, ene...
research
10/22/2020

Similarity Analysis of Self-Supervised Speech Representations

Self-supervised speech representation learning has recently been a prosp...
research
07/02/2023

Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters

Speech representations learned in a self-supervised fashion from massive...
research
07/25/2023

Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations

Self-supervised speech representations (SSSRs) have been successfully ap...
research
06/01/2023

Exploration on HuBERT with Multiple Resolutions

Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL...
research
09/12/2023

Self-supervised Extraction of Human Motion Structures via Frame-wise Discrete Features

The present paper proposes an encoder-decoder model for extracting the s...

Please sign up or login with your details

Forgot password? Click here to reset