End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

11/19/2019
by   Gabriel Synnaeve, et al.
0

We study ResNet-, Time-Depth Separable ConvNets-, and Transformer-based acoustic models, trained with CTC or Seq2Seq criterions. We perform experiments on the LibriSpeech dataset, with and without LM decoding, optionally with beam rescoring. We reach 5.18 rescoring. Additionally, we leverage the unlabeled data from LibriVox by doing semi-supervised training and show that it is possible to reach 5.29 test-other without decoding, and 4.11 only the standard 960 hours from LibriSpeech as labeled data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2019

Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition

We propose a novel approach to semi-supervised automatic speech recognit...
research
05/19/2020

Iterative Pseudo-Labeling for Speech Recognition

Pseudo-labeling has recently shown promise in end-to-end automatic speec...
research
05/18/2022

Automatic Rule Induction for Efficient Semi-Supervised Learning

Semi-supervised learning has shown promise in allowing NLP models to gen...
research
07/11/2020

How Does GAN-based Semi-supervised Learning Work?

Generative adversarial networks (GANs) have been widely used and have ac...
research
05/04/2023

Towards End-to-End Semi-Supervised Table Detection with Deformable Transformer

Table detection is the task of classifying and localizing table objects ...
research
03/13/2023

The System Description of dun_oscar team for The ICPR MSR Challenge

This paper introduces the system submitted by dun_oscar team for the ICP...
research
12/05/2019

Collective Learning

In this paper, we introduce the concept of collective learning (CL) whic...

Please sign up or login with your details

Forgot password? Click here to reset