Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition

08/10/2020
by   Egor Lakomkin, et al.
0

Subwords are the most widely used output units in end-to-end speech recognition. They combine the best of two worlds by modeling the majority of frequent words directly and at the same time allow open vocabulary speech recognition by backing off to shorter units or characters to construct words unseen during training. However, mapping text to subwords is ambiguous and often multiple segmentation variants are possible. Yet, many systems are trained using only the most likely segmentation. Recent research suggests that sampling subword segmentations during training acts as a regularizer for neural machine translation and speech recognition models, leading to performance improvements. In this work, we conduct a principled investigation on the regularizing effect of the subword segmentation sampling method for a streaming end-to-end speech recognition task. In particular, we evaluate the subword regularization contribution depending on the size of the training dataset. Our results suggest that subword regularization provides a consistent improvement of (2-8 with datasets up to a size of 20k hours. Further, we analyze the effect of subword regularization on recognition of unseen words and its implications on beam diversity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2018

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units

In this paper, we present an end-to-end automatic speech recognition sys...
research
06/21/2019

Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models

Contextual automatic speech recognition, i.e., biasing recognition towar...
research
12/06/2018

The USTC-NEL Speech Translation system at IWSLT 2018

This paper describes the USTC-NEL system to the speech translation task ...
research
08/22/2022

Low-Level Physiological Implications of End-to-End Learning of Speech Recognition

Current speech recognition architectures perform very well from the poin...
research
03/06/2020

Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning

Data-driven segmentation of words into subword units has been used in va...
research
04/09/2018

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

Describes an audio dataset of spoken words designed to help train and ev...
research
10/18/2022

Personalization of CTC Speech Recognition Models

End-to-end speech recognition models trained using joint Connectionist T...

Please sign up or login with your details

Forgot password? Click here to reset