Toward Cross-Domain Speech Recognition with End-to-End Models

03/09/2020
by   Thai-Son Nguyen, et al.
0

In the area of multi-domain speech recognition, research in the past focused on hybrid acoustic models to build cross-domain and domain-invariant speech recognition systems. In this paper, we empirically examine the difference in behavior between hybrid acoustic models and neural end-to-end systems when mixing acoustic training data from several domains. For these experiments we composed a multi-domain dataset from public sources, with the different domains in the corpus covering a wide variety of topics and acoustic conditions such as telephone conversations, lectures, read speech and broadcast news. We show that for the hybrid models, supplying additional training data from other domains with mismatched acoustic conditions does not increase the performance on specific domains. However, our end-to-end models optimized with sequence-based criterion generalize better than the hybrid models on diverse domains. In term of word-error-rate performance, our experimental acoustic-to-word and attention-based models trained on multi-domain dataset reach the performance of domain-specific long short-term memory (LSTM) hybrid models, thus resulting in multi-domain speech recognition systems that do not suffer in performance over domain specific ones. Moreover, the use of neural end-to-end models eliminates the need of domain-adapted language models during recognition, which is a great advantage when the input domain is unknown.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2016

End-to-end attention-based distant speech recognition with Highway LSTM

End-to-end attention-based models have been shown to be competitive alte...
research
08/16/2018

Toward domain-invariant speech recognition via large scale training

Current state-of-the-art automatic speech recognition systems are traine...
research
07/09/2019

Teach an all-rounder with experts in different domains

In many automatic speech recognition (ASR) tasks, an ideal model has to ...
research
03/17/2020

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

While the community keeps promoting end-to-end models over conventional ...
research
01/05/2021

Domain-aware Neural Language Models for Speech Recognition

As voice assistants become more ubiquitous, they are increasingly expect...
research
08/02/2020

Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages

State-of-the-art spoken language identification (LID) systems, which are...
research
09/08/2015

Unsupervised Domain Discovery using Latent Dirichlet Allocation for Acoustic Modelling in Speech Recognition

Speech recognition systems are often highly domain dependent, a fact wid...

Please sign up or login with your details

Forgot password? Click here to reset