Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

05/25/2023
by   Wangyou Zhang, et al.
0

Self-supervised learning (SSL) based speech pre-training has attracted much attention for its capability of extracting rich representations learned from massive unlabeled data. On the other hand, the use of weakly-supervised data is less explored for speech pre-training. To fill this gap, we propose a weakly-supervised speech pre-training method based on speaker-aware speech data. It adopts a similar training procedure to the widely-used masked speech prediction based SSL framework, while incorporating additional target-speaker enrollment information as an auxiliary input. In this way, the learned representation is steered towards the target speaker even in the presence of highly overlapping interference, allowing potential applications to tasks such as target speech recognition. Our experiments on Libri2Mix and WSJ0-2mix datasets show that the proposed model achieves significantly better ASR performance compared to WavLM, the state-of-the-art SSL model with denoising capability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2020

Weakly Supervised Construction of ASR Systems with Massive Video Data

Building Automatic Speech Recognition (ASR) systems from scratch is sign...
research
04/11/2022

Unified Speech-Text Pre-training for Speech Translation and Recognition

We describe a method to jointly pre-train speech and text in an encoder-...
research
10/08/2021

A study on the efficacy of model pre-training in developing neural text-to-speech system

In the development of neural text-to-speech systems, model pre-training ...
research
10/26/2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Self-supervised learning (SSL) achieves great success in speech recognit...
research
01/11/2019

Advanced Rich Transcription System for Estonian Speech

This paper describes the current TTÜ speech transcription system for Est...
research
11/01/2022

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Self-supervised learning (SSL) methods which learn representations of da...
research
03/20/2023

Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech

Self-supervised learning leverages unlabeled data effectively, improving...

Please sign up or login with your details

Forgot password? Click here to reset