Integrating Categorical Features in End-to-End ASR

10/06/2021
by   Rongqing Huang, et al.
0

All-neural, end-to-end ASR systems gained rapid interest from the speech recognition community. Such systems convert speech input to text units using a single trainable neural network model. E2E models require large amounts of paired speech text data that is expensive to obtain. The amount of data available varies across different languages and dialects. It is critical to make use of all these data so that both low resource languages and high resource languages can be improved. When we want to deploy an ASR system for a new application domain, the amount of domain specific training data is very limited. To be able to leverage data from existing domains is important for ASR accuracy in the new domain. In this paper, we treat all these aspects as categorical information in an ASR system, and propose a simple yet effective way to integrate categorical features into E2E model. We perform detailed analysis on various training strategies, and find that building a joint model that includes categorical features can be more accurate than multiple independently trained models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2022

Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition

In the recent years end to end (E2E) automatic speech recognition (ASR) ...
research
10/12/2020

Improving Low Resource Code-switched ASR using Augmented Code-switched TTS

Building Automatic Speech Recognition (ASR) systems for code-switched sp...
research
10/19/2020

Reduce and Reconstruct: Improving Low-resource End-to-end ASR Via Reconstruction Using Reduced Vocabularies

End-to-end automatic speech recognition (ASR) systems are increasingly b...
research
07/10/2020

Class LM and word mapping for contextual biasing in End-to-End ASR

In recent years, all-neural, end-to-end (E2E) ASR systems gained rapid i...
research
08/09/2020

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

Speech synthesis (text to speech, TTS) and recognition (automatic speech...
research
02/27/2023

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

We propose an end-to-end ASR system that can be trained on transcribed s...
research
08/26/2022

Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages

End-to-end (E2E) models have become the default choice for state-of-the-...

Please sign up or login with your details

Forgot password? Click here to reset