Improving RNN-T ASR Performance with Date-Time and Location Awareness

06/11/2021
by   Swayambhu Nath Ray, et al.
3

In this paper, we explore the benefits of incorporating context into a Recurrent Neural Network (RNN-T) based Automatic Speech Recognition (ASR) model to improve the speech recognition for virtual assistants. Specifically, we use meta information extracted from the time at which the utterance is spoken and the approximate location information to make ASR context aware. We show that these contextual information, when used individually, improves overall performance by as much as 3.48 are combined, the model learns complementary features and the recognition improves by 4.62 improvements as high as 11.5 We ran experiments with models trained on data of sizes 30K hours and 10K hours. We show that the scale of improvement with the 10K hours dataset is much higher than the one obtained with 30K hours dataset. Our results indicate that with limited data to train the ASR model, contextual signals can improve the performance significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

Contextual-Utterance Training for Automatic Speech Recognition

Recent studies of streaming automatic speech recognition (ASR) recurrent...
research
11/05/2021

Context-Aware Transformer Transducer for Speech Recognition

End-to-end (E2E) automatic speech recognition (ASR) systems often have d...
research
09/11/2023

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus

Multi-Modal automatic speech recognition (ASR) techniques aim to leverag...
research
09/05/2023

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

Automatic Speech Recognition (ASR) models need to be optimized for speci...
research
02/12/2021

Multimodal Punctuation Prediction with Contextual Dropout

Automatic speech recognition (ASR) is widely used in consumer electronic...
research
11/26/2018

Improving Gated Recurrent Unit Based Acoustic Modeling with Batch Normalization and Enlarged Context

The use of future contextual information is typically shown to be helpfu...
research
04/09/2021

Accented Speech Recognition Inspired by Human Perception

While improvements have been made in automatic speech recognition perfor...

Please sign up or login with your details

Forgot password? Click here to reset