UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

01/19/2021
by   Chengyi Wang, et al.
21

In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner. The resultant representations can capture information more correlated with phonetic structures and improve the generalization across languages and domains. We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus. The results show that UniSpeech outperforms self-supervised pretraining and supervised transfer learning for speech recognition by a maximum of 13.4 reductions respectively (averaged over all testing languages). The transferability of UniSpeech is also demonstrated on a domain-shift speech recognition task, i.e., a relative word error rate reduction of 6 previous approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2020

Unsupervised Cross-lingual Representation Learning for Speech Recognition

This paper presents XLSR which learns cross-lingual speech representatio...
research
06/27/2022

Wav2Vec-Aug: Improved self-supervised training with limited data

Self-supervised learning (SSL) of speech representations has received mu...
research
07/15/2021

CLSRIL-23: Cross Lingual Speech Representations for Indic Languages

We present a CLSRIL-23, a self supervised learning based audio pre-train...
research
11/02/2022

data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

In this paper, we propose a new Self-Supervised Learning (SSL) algorithm...
research
03/09/2021

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Wav2vec-C introduces a novel representation learning technique combining...
research
10/12/2021

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Self-supervised learning (SSL) is a long-standing goal for speech proces...
research
11/14/2022

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

In this paper, we provide a new perspective on self-supervised speech mo...

Please sign up or login with your details

Forgot password? Click here to reset